We’re looking for a highly experienced Machine Learning Engineer with a strong infrastructure focus to design and build scalable, secure, and cost-efficient ML platforms. You’ll work with AWS, GCP, and modern MLOps tooling to enable smooth training, deployment, and monitoring workflows.
Must-haves:
- ML platform setup experience (SageMaker, Vertex AI, Azure ML, Kubeflow, etc.)
- Solid grasp of architecture patterns (training/serving, workflows, security, cost optimization)
- AWS & GCP (compute, storage, networking, IAM, VPCs)
- IaC & CI/CD (Terraform, GitHub Actions, Jenkins, Docker, Kubernetes)
- Strong collaboration and communication skills
Nice-to-haves:
- Distributed training (Ray or similar)
- Databricks & data lake experience