Site Reliability Engineer

Job expired!

Join Phaidra - Pioneer the Future of Industrial Automation

Welcome to Phaidra, where we are revolutionizing the world of industrial automation.

About Phaidra

Current industrial infrastructures, such as factories and power plants, rely on outdated control systems that can't adapt to new conditions. At Phaidra, we create AI-powered control systems that enable these infrastructures to automatically learn and improve over time.

  • Reinforcement Learning Algorithms: Transform raw sensor data into high-value decisions.
  • Industrial Applications: Ideal for sensorized environments with measurable KPIs.
  • Code-Free Configuration: Domain experts can set up AI control systems without coding.

Our dedicated team has a proven track record, from achieving superhuman performance with DeepMind's AlphaGo to reducing energy consumption at Google Data Centers.

We are a 100% remote company with a team spread across the USA, Canada, UK, Norway, Italy, Spain, Portugal, and India. We hire globally with the help of our partner, OysterHR.

Open Position: Site Reliability Engineer

Phaidra is looking for a passionate and innovative Site Reliability Engineer to join our engineering team. You will work on building and maintaining world-class infrastructure, ensuring the smooth operation and continuous improvement of our systems.

Location: North America/India

Responsibilities

As a Site Reliability Engineer, your core responsibilities will include:

  • Managing cloud infrastructure on AWS, GCP, or Azure
  • Setting up large-scale data ingestion and processing systems
  • Building distributed model training and evaluation platforms
  • Automating CI/CD pipelines and system improvements
  • Ensuring multi-cloud deployments
  • Utilizing Cloud Native technologies like Kubernetes, Prometheus, and gRPC
  • Applying SRE principles for observability, automation, and change management

Key Qualifications

  • 5+ years of experience
  • Bachelor's or Master's in Computer Science or equivalent
  • Experience with AWS, GCP, or Azure
  • Proficiency in Linux, Docker, and Kubernetes
  • Familiarity with Terraform and monitoring stacks like Prometheus
  • Programming skills in Python, Go, or Bash
  • Understanding of DevOps, SRE principles

Preferred Skills & Experience

  • Multi-cloud environment expertise
  • Software engineering experience
  • Experience with scalable, multi-tenant systems

Our Tech Stack

Languages: Python, Go, JavaScript/TypeScript, React; C# .NET

  • PyTorch
  • Docker, Kubernetes, Terraform, Kapitan
  • Gitlab CI, ArgoCD, Atlantis, Vercel
  • GCP (GKE, PubSub, CloudSQL, etc.)
  • Ray.io, REST, and gRPC micro-services
  • Poetry, Pantsbuild

Your Onboarding Journey

First 30 Days

  • Introduction to Phaidra and our product
  • Engage with the Engineering team
  • Setup development environment

By 60 Days

  • Solid understanding of our operations
  • Complete onboarding exercise

By 90 Days

  • Fully integrated with the team
  • Conduct on-call monitoring and improvements
  • Share knowledge across the organization

Interview Process

  • Initial Screening: People Operations (30 minutes)
  • Meeting: Director, Infrastructure Engineering (30 minutes)