Design and Build Data Pipelines: Create efficient, reliable, streamable, and scalable data pipelines using industry-standard tools and techniques such as TorchData, WebDataset, Apache Parquet, Python, and SQL.
Data Ingestion: Develop strategies for ingesting data from providers, ensuring data quality and consistency.
Data Pre-processing: Implement parallel pre-processing to clean, transform, de-duplicate, combine and normalize data.
Data Curation and Enrichment: Curate, augment, and enrich datasets to improve data quality and provide valuable insights to stakeholders.
Synthetic Data Generation: Collaborate with synthetic data teams to generate data and incorporate it into existing pipelines.
Collaboration with Client Teams: Work closely with client scientists, engineers, and product teams to understand data requirements and collaborate on data delivery.
Monitoring, Maintenance & Updating: Monitor data pipelines for performance, errors, and bottlenecks, implementing regular maintenance and updates. Stay updated with the latest trends and best practices.
Technical Documentation: Document data pipelines, settings, and procedures for easy maintenance and knowledge sharing.
Bachelor’s degree in Computer Science, Information Technology, or a related field.
At least 3 years of experience as a Software Engineer or Data Engineer.
Strong software engineering skills, proficiency in Python.
Experience with data processing tools and formats such as Apache Parquet, WebDataset, TorchData, Pandas, Shell Scripting, Protobuf, TFRecord.
Knowledge of data warehouse architectures and cloud-based systems (e.g., AWS S3).
Strong problem-solving and analytical skills.
Excellent communication and collaboration skills.
Master’s degree in Data Science or a related field.
Experience with data curation and enrichment techniques, particularly for large-scale text, image, and video data.
Familiarity with natural language processing (NLP), machine learning concepts, and frameworks (PyTorch).
As an equal opportunity employer, ICONMA provides an employment environment that supports and encourages the abilities of all persons without regard to race, color, religion, gender, sexual orientation, gender identity or expression, ethnicity, national origin, age, disability status, political affiliation, genetics, marital status, protected veteran status, or any other characteristic protected by federal, state, or local laws.