Join the team creating software to be used across the global AI landscape. You'll collaborate with top-tier software engineers to develop a large-scale toolset for testing deep learning models and frameworks on the most powerful computers. The role requires adaptability in a dynamic, fast-paced environment and strong interpersonal skills.
In this role, you will interact with internal partners, users, and members of the open source community to create solutions for building, testing, integrating, and releasing NVIDIA AI Services and Deep Learning Frameworks on the most high-performing, enterprise-grade GPU clusters capable of hundreds of Peta FLOPS. The role involves multiple products such as PyTorch, TensorFlow, JAX, PaddlePaddle. You'll collaborate with internal engineering teams to implement and operationalize AI models and services at a large scale, facilitating the adoption of end-to-end Machine Learning and Deep Learning solutions both in the cloud and on-premise.
We are seeking passionate and hardworking python developers to help us expand our AI and deep learning services, platforms, models, and internal tools. Your responsibilities will include development and maintenance of tools and infrastructure that enable productization of NVIDIA's software stack, from Deep Learning Frameworks (e.g., PyTorch, TF, JAX, PaddlePaddle) and Deep Learning models to AI services.
Are you ready to take on this challenge?
What you’ll be doing:
- Automating and optimizing testing for Deep Learning models and AI Services from various data domains, particularly focusing on inference.
- Developing shared utilities for system setup, running tests, recording results and visualizing them on dashboards.
- Configuring, maintaining, and enhancing deployments of industry-standard tools (e.g., GitLab, Docker, Bash).
- Enforcing best-practices for building, testing, and releasing software including AI Services and DL models.
- Identifying infrastructure requirements and acting on them.
- Building tools for automatic content generation mechanisms that save dozens of engineering hours.
What we need to see:
- A BSc or MS degree in Computer Science, Computer Architecture, or a related technical field.
- 3+ years of experience in software development.
- Excellent Python programming skills, exceptional coding capabilities, and a deep understanding of Object-oriented Programming concepts.
- Familiarity with DevOps concepts such as CI/CD, Docker, Jenkins, and automation tools.
- Experience in building front-end (e.g., JS, React, Vue, Dash, Streamlit) and back-end services (e.g., Flask, FastAPI, Django).
- Understanding of Deep Learning to allow for benchmarking on Deep Learning models.
- Proactive and with strong analytical abilities.
- Excellent time-management and organization skills required to coordinate multiple initiatives, priorities, and the integration of new technology and products into complex projects.
- Good communication and documentation habits.
Ways to stand out from the crowd:
- Solid knowledge of Linux environments.
- Experience with containerization technologies such as Docker.
- Experience in building monitoring or dashboard solutions to support CI/CD pipelines.
- Hands-on experience configuring complex CI pipelines.
- Experience with HPC-based compute clusters and scheduling solutions like Slurm.
NVIDIA is widely recognised as one of the most attractive employers in the tech world. Our team includes some of the brightest and most innovative minds globally. If you're creative and autonomous, we want to hear from you!
We're an equal opportunity employer and value diversity at our company. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. #deeplearning