Machine Learning Engineer - Inference (Accelerated AI)

Job expired!

Join Together AI as a Machine Learning Engineer - Inference (Accelerated AI)

Together AI is actively seeking a talented Machine Learning Engineer to join our dynamic Inference Engine team. This critical role focuses on optimizing and enhancing the performance of our AI inference systems, leveraging state-of-the-art large language models to ensure efficiency and scalability.

If you are passionate about AI inference, proficient with PyTorch, and skilled in developing high-performance systems, we want to hear from you. This position offers an exceptional opportunity to collaborate with leading AI researchers and engineers in creating cutting-edge AI solutions. Come shape the future of AI with Together AI!

Key Responsibilities

  • Design and develop advanced production systems that power the Together AI inference engine, ensuring reliability and performance at a large scale.
  • Optimize runtime inference services for extensive AI applications.
  • Collaborate with researchers, engineers, product managers, and designers to introduce new features and research capabilities.
  • Conduct thorough design and code reviews to uphold the highest quality standards.
  • Create services, tools, and comprehensive developer documentation to support the inference engine.
  • Implement robust and fault-tolerant systems for data ingestion and processing.

Job Requirements

  • A minimum of 3 years of experience in writing high-performance, well-tested, production-quality code.
  • Proficiency in Python and PyTorch.
  • Demonstrated experience in building high-performance libraries and tooling.
  • Exceptional understanding of low-level operating systems concepts, including multi-threading, memory management, networking, storage, performance, and scalability.

Preferred Qualifications

  • Knowledge of existing AI inference systems such as TGI, vLLM, TensorRT-LLM, Optimum.
  • Familiarity with AI inference techniques, such as speculative decoding.
  • Experience with CUDA/Triton programming.
  • Nice to have: Familiarity with Rust, Cython, and compilers.

About Together AI

Together AI is a pioneering artificial intelligence company driven by research. We are committed to open and transparent AI systems that drive innovation and generate the best outcomes for society. Our mission is to significantly reduce the cost of modern AI systems through the co-design of software, hardware, algorithms, and models.

We have contributed to leading open-source research, models, and datasets to advance the frontier of AI. Our team has been instrumental in technological breakthroughs such as FlashAttention, Hyena, FlexGen, and RedPajama. Join our passionate group of researchers and engineers in our journey to build the next-generation AI infrastructure.

Compensation and Benefits

We offer a competitive compensation package, including startup equity, health insurance, and other impressive benefits. The US base salary range for this full-time position is $160,000 - $220,000, plus equity and benefits. Our salary ranges are determined by location, level, and role, with individual compensation based on experience, skills, and job-related knowledge.

Equal Opportunity Employer

Together AI is proud to be an Equal Opportunity Employer, offering equal employment opportunities to all, irrespective of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.