We are looking for an experienced Query Engine Architect to enhance Apache Spark and related frameworks on GPUs. Nvidia, a leader in accelerated computing, is developing the next-generation data processing ecosystem. Apache Spark is a widely used distributed data processing engine for various workloads including data preparation, feature generation, reporting, analytics, and more. Such workloads often demand significant time and computing power, hindering efficient business operations.
At NVIDIA, we take on challenging problems that have impact. You will collaborate with the open-source community to facilitate Apache Spark data processing with GPUs. Accelerating data workflows can greatly benefit data scientists by allowing them to investigate larger data sets more quickly and effectively to achieve their business objectives.
What you'll be doing:
- Leading the query optimization efforts within the RAPIDS Spark team.
- Assessing each stage of query processing and identifying areas for logical plan and physical plan optimization. Build optimized plans considering CPU and GPU hardware resources.
- Identifying opportunities for resource-aware adaptive query execution, such as adapting based on CPU or GPU characteristics.
- Identifying areas where operator fusion can improve performance.
- Examining practices within columnar processing engines to see how they could apply to GPU-based columnar processing.
- Engaging with open-source communities, including Apache Spark and RAPIDS, for technical discussions and contributions.
- Collaborating with Nvidia strategic partners on implementing accelerated data processing solutions in public cloud or on-premise clusters.
- Presenting technical solutions at industry conferences and meetups.
What we need to see:
- A BS, MS, or PhD in Computer Science, Computer Engineering, or equivalent experience.
- 15+ years of work or research experience in software development.
- At least 5 years of experience working with key open-source big-data projects as a contributor or committer, including Apache Spark, Apache Hadoop, Apache Hive, Apache Flink, Apache Impala, Apache Drill, Apache Calcite, and Substrait.
- Exceptional technical skills in building and implementing high-quality distributed systems.
- Deep knowledge of database query engines and query optimization.
- Excellent programming skills in C++, Java, and/or Scala.
- Understanding of distributed system schedulers: Kubernetes, Hadoop YARN, Spark standalone, and/or Mesos.
- Ability to collaborate with multi-functional teams across different locations.
- Strong motivation and interpersonal skills.
Ways to stand out from the crowd:
- Contributions to major open-source projects such as Apache Spark, Apache Hive, Apache Impala, Apache Drill, Substrait, Apache Calcite.
- Professional experience with acceleration libraries (CUDA, RAPIDS, UCX).
- Basic machine-learning/deep-learning experience with Spark ML and XGBoost.
We are recognized as one of the most attractive employers in the tech world and therefore boast some of the most innovative and hardworking individuals in our teams. If you are passionate, creative, and driven, we would love for you to join our team. In addition to competitive salaries and a comprehensive benefits package, we offer a creative and autonomous engineering environment. Owing to our unprecedented growth, we are rapidly expanding our exclusive engineering teams. Interested candidates with a genuine love for technology are encouraged to apply.
The base salary range is 268,000 USD - 414,000 USD. Your base salary will be determined based on your location, expertise, and the compensation of employees in similar roles. You will be eligible for equity and benefits. NVIDIA accepts applications on a continuing basis.