Big Data Engineer (PySpark)

Job expired!

Join Capgemini as a Big Data Engineer (PySpark)

Roles and Responsibilities:

  • Design and develop data processing projects using Spark Scala and PySpark.
  • Implement robust unit tests for data transformations and auxiliary methods.
  • Employ Spark and Spark SQL for efficient data management and analytics in Hive using Scala API.
  • Work alongside Business Analysts to validate test outcomes and accomplish project sign-offs.
  • Document design and operational procedures to guide future processes and implementations.
  • Ensure high-quality deliveries through meticulous peer code reviews and adherence to quality standards.
  • Participate in collaborative coding environments, including pair programming.
  • Contribute to team efforts by forging high-quality, maintainable code.

Qualifications and Requirements:

  • Proven experience, ranging from 4 to 10 years, as a Hadoop Data Engineer.
  • Demonstrable expertise in Hadoop ecosystems such as Spark, Scala, PySpark, Python, Hive, and Impala.
  • Adeptness with additional tools like Oracle, Spark Streaming, Kafka, and foundational machine learning concepts.
  • Proficient in Agile practices, continuous integration and deployment (CI/CD), and tools like Git, Jenkins, and Cloudera Distribution.
  • Thorough understanding of data structure manipulation, distributed systems, application development, and automation.
  • Prior experience in the Core Banking or Finance sectors is highly advantageous.
  • Cloud computing experience, particularly with AWS, will be considered a strong asset.

Embark on an exciting career trajectory with Capgemini by leveraging your expertise in Big Data and ecosystem technologies to drive transformative projects and innovative solutions. Apply today to become part of a globally esteemed team as a Big Data Engineer specialized in PySpark!