Senior Data Engineer - Vector Databases | GCP
Kaizen Analytix LLC, an analytics products and services company that provides clients unparalleled speed to value through analytics solutions and actionable business insights, is seeking qualified candidates for the position of Senior Data Engineer. The selected professionals will be responsible for designing, developing, and maintaining data pipelines and data warehouses on the Google Cloud Platform. The ideal candidate will have a solid understanding of data engineering principles and best practices, and be comfortable working with large, unstructured datasets (100 GB+) such as video, audio, images, and text. We are looking for individuals who can support AI projects by providing the necessary data engineering for storing deep learning-based results, like vector embeddings, leveraging powerful distributed computing technologies.
Responsibilities:
Analysis and Design
- Conduct fact-gathering sessions with users.
- Consult with Technical Managers and Business Owners to identify and analyze technology needs and problems.
- Carry out data flow diagramming and/or process modeling (code architecture).
- Design, develop, and maintain data pipelines and data warehouses on Google Cloud Platform.
- Work with stakeholders to collect requirements and define data models.
- Develop and deploy data pipelines using Google Cloud Platform tools and services.
- Implement data quality checks and monitoring.
- Troubleshoot data issues and performance problems.
- Collaborate with other engineers to develop and maintain the company's data infrastructure.
- Stay informed about the latest data engineering technologies and trends.
Strategy Alignment
- Work with other technical team members to consistently improve implementation strategies, development standards, and other departmental processes and documentation.
- Provide technical assistance and mentoring to lower-level Data Engineers.
- Regularly communicate plans, status, and issues to management.
- Adhere to department standards, policies, procedures, and industry best practices.
Job Requirements:
- A Bachelor's/Master's degree in computer science, Information Systems, or a related field.
- 5+ years of experience in data engineering.
- A mandatory GCP Professional Data Engineer Certification.
- Experience in any migration projects involving data warehousing and migrating databases from one technology to another.
- Strong Python programming skills for developing ETL scripts.
- A solid understanding of data engineering principles and best practices.
- A strong understanding of Vertex AI and Vector Search.
- Experience with other Google Cloud Platform technologies, including BigQuery, Cloud Dataproc, Cloud Dataflow, and Cloud Data Fusion.
- Experience with data modeling and data warehousing.
- Experience with data quality checks and monitoring.
- Experience with any IaC tools like Terraform, GCP Python Clients.
- Must be knowledgeable about CI/CD pipelines and proficient in using tools such as Jenkins, Cloud Build, and TeamCity for creating required pipelines for CI/CD.
- Self-starter who proposes solutions and workarounds; able to work under strict deadlines.
- Able to troubleshoot key customer implementation issues and take steps to resolve them efficiently.
- Responsible for end-to-end solution design, architecture, and best practices for integrating with the GCP ecosystem and infrastructure.
- Able to partner with domain architects to develop the end-to-end solution architecture, including application, infrastructure, data, integration, and security domains.
Good to have:
- Prior experience with social media data analytics dealing with high volume and high frequency data.
- Experience in working on application development projects focused on data engineering activities using any of the programming languages (Python, SQL, Java).
- Prior experience with big data tools and concepts such as Hadoop, MapReduce, Spark, Hive, HBase, Apache Airflow (orchestration).