HPC Engineer, Machine Learning Infrastructure - EMEA Remote

Job expired!

Join Hugging Face as an HPC Engineer - Machine Learning Infrastructure - EMEA Remote

Welcome to Hugging Face! We are on a mission to advance Machine Learning and make it more accessible to everyone. Our journey involves contributing to the growth of technology for a better future.

We have created the world's fastest-growing, open-source library of pre-trained models. With over 1 Million+ models and 320K+ stars on GitHub, Hugging Face technology is being used in production by over 15,000 companies, including leading AI organizations like Google, Elastic, Salesforce, Grammarly, and NASA.

About the Role

We are seeking an experienced HPC Engineer to develop and scale our distributed large cluster. The ideal candidate will have a solid background in provisioning large compute clusters for AI workflows and guiding teams to create best practices for reliability and scalability.

Key Responsibilities

  • Design, develop, deploy, and maintain reliable and scalable infrastructure for efficient training workloads.
  • Manage large compute clusters for AI training and development.
  • Create tooling and infrastructure to abstract compute and storage in ML workflows.
  • Measure and optimize system performance.
  • Monitor and troubleshoot infrastructure issues to ensure high availability and performance of AI workloads.
  • Stay updated with the latest AI infrastructure technologies and recommend improvements.
  • Collaborate with AI software engineering teams to ensure infrastructure supports system requirements.
  • Provide primary operational support and engineering across multiple teams.

Qualifications

  • 7+ years of experience in a DevOps or Infrastructure Engineer role, focusing on machine learning infrastructure and large GPU clusters.
  • Expertise with cloud providers such as AWS, GCP, infra-as-code frameworks, and observability tools.
  • Proficiency with the Python Scientific stack and Pytorch.
  • Experience with data structures, data modeling, and database management, including object and file storage systems.
  • Strong communication, collaboration, and documentation skills.
  • Familiarity with Linux, Git, containers, networking, and command line tools.
  • Strong programming skills in Python, Golang, and/or Rust.

About You

If you are a passionate HPC Engineer with a deep interest in AI and thrive in a challenging, innovative environment, we want to hear from you. Join our team to help advance AI technologies while working alongside talented professionals in a collaborative and stimulating setting.

Why Join Hugging Face

Diversity, Equity, and Inclusivity

We are committed to building a diverse and inclusive workplace. Hugging Face is an equal opportunity employer and celebrates diversity. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Professional Development

At Hugging Face, you will collaborate with some of the brightest minds in the industry. We foster a culture of continuous growth and provide reimbursement for relevant conferences, training, and education.

Employee Well-being

We care about your well-being. Our benefits include flexible working hours, remote options, health, dental, and vision insurance for employees and their dependents, 12 weeks of parental leave (20 weeks for birthing mothers), and unlimited paid time off.

Workplace Support

While we have offices in NYC and Paris, we are a distributed team. Remote employees are welcome to visit our offices, and we will ensure your workstation is equipped for success, no matter where you are.

Equity and Ownership

We believe in sharing our success. All employees receive company equity as part of their compensation package. If Hugging Face becomes a category-defining platform in machine learning and AI, everyone benefits from our success.

Community Support

We support the ML/AI community because we believe major scientific advancements result from collaboration. Join us to contribute to this vibrant community.