Lead Software Engineer, MLOps Infrastructure

Full Time

Job expired!

We are seeking a Principal Software Engineer with significant deep learning leadership experience to join our team! Be part of NVIDIA's AI/AV Infrastructure group to construct state-of-the-art MLOps infrastructure for our sophisticated autonomous driving platform. Together, we will further enhance NVIDIA's capability to create and launch leading solutions for a wide spectrum of AI-based applications. To allow our fleet of autonomous vehicles to efficiently record, evaluate, and train our state-of-the-art machine learning models, we need to anticipate the needs of our engineering partners. We require someone who has constructed systems capable of handling petabyte-scale datasets, or can apply their past expertise to develop systems of such magnitude. We are either directly involved or working in cooperation with our partner teams in all data lifecycle activities. To meet our customers' expectations, our solutions need to be scalable and performant. Our approach must incorporate best practices in engineering and operational excellence across the stack. We need someone who can assure the reliability of the foundational elements of our platform. You will be part of one of our high-profile engineering teams and work on both product-focused and infrastructure roles. What You'll Be Doing: Participating in the creation of Deep Learning software infrastructure for large-scale image and video processing tasks, and leading major technical projects for the team. The applications you'll work on range from automotive driver assistance to autonomous navigation and robotics. Join a dynamic, product- and customer-centric team. Your expertise, creativity, and leadership will bring the future of self-driving cars to our doorsteps. Handle data modeling, schema design, dataset curation, search, and discovery. What We Need To See: Over 10 years of relevant experience in high-performance/distributed computing, including technical leadership responsibilities. At least 3 years of relevant experience in productizing deep learning systems. A BS or MS degree in Computer Science, Electrical Engineering, or equivalent experience. Experience with MLOps platforms such as Flyte, MLFlow, or similar. Experience with any of the Deep Learning frameworks: PyTorch, TensorFlow, Keras, or similar. A proven ability to design and develop scalable, data-focused, production machine learning pipelines for very large datasets. A track record of enhancing team capability and increasing their speed by supporting others. Experience mentoring junior developers. A solid understanding of highly parallel compute, storage, and software architectures. Experience working with multi-node/distributed training, data loading, and image/video processing. Excellent communication and organizational skills. Self-motivation, excellent cooperation with peers and users, customer-focused mindset. Ways to Stand Out from the Crowd: Prior experience scaling up and optimizing HPC, computer vision, or deep learning training pipelines for terabyte-scale datasets is highly desirable. Experience with Go Lang, C++, and CUDA. Contributions to open-source DNN frameworks. Prior experience leveraging machine learning or neural networks to address image or video analysis problems. NVIDIA is regarded as one of the most desirable employers in the tech industry. We employ some of the most progressive and hard-working individuals globally. If you're creative, passionate and self-driven, we want to hear from you! NVIDIA is pioneering in developments in AI, High-Performance Computing and Visualization. Our invention, the GPU, serves as the visual cortex of modern computers and is at the heart of our products and services. The base salary ranges from 268,000 USD to 414,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. In addition to your salary, you will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.