Tech Lead Manager (TLM) - Supercomputing Scalability

Job expired!

Join Our Trailblazing Supercomputing Team at OpenAI

About the Team

Supercomputers are the powerhouse enabling major breakthroughs in technology, and at OpenAI, we push the boundary of what these giants can do. Our Supercomputing Scalability Pillar is dedicated to redefining the infrastructure of cluster-scale supercomputers. We focus on enhancing scalability beyond existing node counts, deploying cluster-wide updates swiftly, integrating comprehensive telemetry, and incorporating cutting-edge hardware to support world-class scaling endeavors.

About the Role

As a Tech Lead Manager (TLM) within the Scalability Pillar at OpenAI based in vibrant San Francisco, CA, you will spearhead efforts to streamline and expand the capabilities of DC-scale supercomputers. With your leadership, our team will use and exceed the limits of widely available tools, tackle novel hardware challenges, and manage growing supercomputer clusters and user bases. This is a hybrid role requiring three days of on-site collaboration weekly, with relocation assistance available for successful candidates.

Key Responsibilities

  • Direct management of individual contributors (ICs) crafting the foundations of our supercomputing scalability.
  • Foster a high-performing, agile team environment to advance our supercomputing technology globally while prioritizing safety and reliability.
  • Enhance orchestration and monitoring frameworks for top-tier supercomputer performance.
  • Champion automation to command an unprecedented control over our technological stack.
  • Deepen our understanding of supercomputer health and utility for groundbreaking research and model training.

Who Thrives in this Role?

You are ideal for this position if you:

  • Possess a robust grasp of Kubernetes (k8s) and cluster orchestration systems.
  • Excel in software development and have a history of working with public clouds, particularly Azure.
  • Are not only adept at technical leadership but are also passionate about mentoring and leading teams to excellence.
  • Value and foster a diverse, inclusive workplace that promotes open dialogue and challenges conventional thinking.
  • Show initiative, willingness to learn new skills, and a commitment to team success.
  • Have outstanding communication skills with an ability to dialog transparently and effectively.
  • Understanding AI/ML workloads is a plus!

About OpenAI

At OpenAI, we are committed to advancing artificial intelligence technology that can profoundly impact the world in positive ways. Our work focuses on developing AI that encompasses global perspectives and addresses human needs. We ensure that the development of AI is safe and its benefits, universal.

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We encourage applicants of different backgrounds, cultures, genders, experiences, abilities and perspectives to apply.

Join Us

Ready to be at the forefront of supercomputing technology? Apply now to become part of a team committed to technological innovation and making a significant impact. Your journey in shaping the future of AI starts here at OpenAI.

For more information on applying, including accommodations for disabilities, and to read our global privacy policy, please visit our career page.