Lead Site Reliability Engineer, Datastores (Hybrid)

  • Full Time
Job expired!

Who We Are

ThousandEyes is a company born out of two principal ideas: the ability to see what's typically impossible to see, and the capability to gather information from various global and diverse vantage points, just like the Internet. As organizations rely more heavily on cloud services, the Internet has become the default network that connects cloud applications to users. Our platform for Internet and cloud intelligence operates similarly to a 'Google maps of the Internet,' offering a collective insight into digital experiences from start to finish. We allow our clients, made up of the world's biggest and fastest-growing brands, to identify problems before they impact revenue, brand reputation, or employee productivity.

In August 2020, Cisco Systems successfully acquired ThousandEyes. The company now operates as the ThousandEyes Business Unit within Cisco's Network Services Business Group and is a crucial part of Cisco's expanding Observability business.

About The Role

The Datastores team is chiefly concerned with our platform's key datastores such as ElasticSearch, Kafka, MongoDB, and MySQL. The team handles all aspects of our platform's datastores, like availability, performance, change management, capacity planning, monitoring, and emergency response. As a Site Reliability Engineer on the team, you will assist in managing the company's core datastore services, maintaining a steadily growing infrastructure capable of handling a high volume of data daily.

We're seeking talented engineers with software or operations experience, who are skilled at designing, analyzing, and troubleshooting large-scale high-availability datastore systems. You must be committed to collaborating with our application development teams to ensure the reliability and performance of our infrastructure.

What You'll Do

  • Collaborate and work closely with software engineers to ensure that the ThousandEyes platform datastores infrastructure and services are designed and optimized for availability, latency, and performance.
  • Experience in building and supporting critical services, with a focus on automation, availability, and performance.
  • Design, implement, and maintain elastic and resilient datastores that can support our platform as we expand to a multi-region scale.
  • Drive and build automation where possible, facilitating effortless scaling of our datastores. Think self-service.
  • Participate in and contribute to improving our 24x7 incident response and on-call rotation.

About You

  • Ability to design and implement scalable and well-tested solutions, focusing on datastores.
  • Ability to write high-quality code in Python, Go, or equivalent languages.
  • Strong Infrastructure as Code skills, ideally with Terraform and Kubernetes.
  • Good knowledge of cloud provider managed services (ideally AWS), and how they can be leveraged in our context.
  • Sound understanding of Unix/Linux systems, the kernel, system libraries, file systems, and client-server protocols.
  • Strong communication and documentation skills.
  • Experience operating high-performance and high-availability databases such as MySQL, Aurora, MongoDB, DynamoDB, and/or Apache Druid.

Cisco is an Affirmative Action and Equal Opportunity Employer, and all qualified applicants will be considered for employment without regard to their race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other protected legal basis. Cisco will consider for employment, on a case-by-case basis, qualified applicants with arrest and conviction records.

Why Cisco

We are Cisco, where everyone is unique but we work together as a team to create a difference, driving an inclusive future for all.

We understand and embrace the digital era, aiding our customers in implementing changes within their digital businesses. Some might think of us as an 'old' company (36 years in the business) and solely focused on hardware, but we are also a software and security firm. We have even created an intuitive network that adapts, forecasts, learns, and protects. No other company can do what we do – you can't put us in a box! However, the term "Digital Transformation" remains an empty buzzword without a culture that encourages innovation, creativity, and even failure (as long as we learn from it).

On a day-to-day basis, we promote a give-and-take approach. We give our best, set our egos aside, and offer ourselves freely (as giving back is part of our makeup). We take responsibility, take bold steps, and embrace differences. We value diversity of thought and a commitment to equality because without these, there is no progress.

So, do you have colorful hair? We don’t mind. Tattoos? Feel free to show them off. Fond of polka dots? That's ok. Are you a pop culture geek? Many of us are. Have a passion for technology and making a global impact? Be you, with us.

We acknowledge that diverse teams create the strongest teams, and we invite people from all backgrounds to apply.