Data Engineer PySpark

Job expired!

Join Our Team: Data Engineer - PySpark at Sopra Steria

Company Description

About Sopra Steria: A leading Tech entity in Europe, recognized for its proficiency in consulting, digital services, and software development. Sopra Steria aids clients in navigating their digital transformation, offering bespoke, sustainable solutions combining vast industry knowledge and cutting-edge technologies. Committed to innovation and excellence, Sopra Steria believes in a client-centric approach and a brighter digital future through teamwork. With a robust workforce of 50,000 across nearly 30 countries, we achieved a formidable €5.1 billion revenue in 2022.

Job Description

We are currently seeking a highly skilled and passionate Data Engineer to enhance our dynamic team. This role involves close collaboration with Data Scientists to develop efficient machine learning models by building and optimizing data pipelines. Your expertise in specific areas will be pivotal in maintaining and enhancing our data operations.

Key Responsibilities:

  • Partner with Data Scientists to craft, refine, and implement machine learning pipelines.
  • Employ PySpark for robust data processing, transformation, and preparation for model training.
  • Use AWS EMR and S3 to ensure scalable and efficient data handling and storage solutions.
  • Create and manage ETL workflows utilizing Stream sets for efficient data ingestion and transformation.
  • Design and maintain pipelines to deliver superior quality training and inference datasets.
  • Work across functional teams to guarantee seamless deployment and real-time/near real-time inferencing capacities.
  • Optimize pipelines for enhanced performance, scalability, and reliability.
  • Administer IAM policies and controls to secure data access and management effectively.
  • Optimize Spark jobs and architecture for high-performance data processing.

Requirements:

Mandatory:
  • Advanced SQL skills, including Window functions, and deep knowledge of Spark Architecture, PySpark or Scala with Spark, Hadoop.
  • Demonstrated ability to design and implement robust data pipelines.
  • Strong problem-solving abilities and effective collaboration in team settings.
  • Excellent communication skills with proficiency in translating technical concepts to non-technical stakeholders.
Desirable:
  • Prior experience with Airflow, S3, and Stream sets, or similar ETL tools.
  • Knowledge of real-time or near real-time inferencing architectures.
  • Basic understanding of Kafka, AWS IAM, AWS EMR, and Snowflake.

Qualifications:

BE or a similar qualification with expected total experience of 06-08 years in relevant fields.

Additional Information:

At Sopra Steria, we stand against all forms of discrimination. We pride ourselves on fostering an inclusive and respectful workspace, welcoming diversity in all its forms. We are dedicated to accessibility and encourage individuals with disabilities to apply for this position.

Ready to shape the future with us at Sopra Steria? Apply now for the role of Data Engineer - PySpark and join us in leading the digital transformation!