This is a high-impact project that will enable the client to scale AI agent deployments across multiple product lines and customer segments, to be built on top of their existing Azure Kubernetes Platform.
Key Responsibilities
- Platform Architecture Design: Design a scalable, reliable infrastructure platform optimized specifically for running AI agents at scale
- Infrastructure Implementation: Leverage Azure, Kubernetes, and related cloud-native technologies to build a modern, production-ready execution environment
- Requirements Assessment: Work with client's infrastructure and development teams to understand specific needs and translate them into a clear technical roadmap
- Security & Isolation: Implement managed namespaces, data isolation, network segmentation, and service mesh architecture to meet enterprise security requirements
- Developer Experience: Design with developer productivity in mind—create templates and abstractions that allow development teams to quickly build and deploy agents
- Observability & Governance: Implement monitoring, cost tracking (LLM Gateway integration), and feedback mechanisms for continuous agent improvement
Required Qualifications
- Strong experience in cloud infrastructure, platform engineering, or DevOps
- Deep expertise with Kubernetes and container orchestration at scale
- Strong experience with Microsoft Azure and cloud-native services
- Proven track record designing and implementing agent/AI execution platforms or similar distributed systems
- Experience with infrastructure-as-code tools (Terraform, Argo CD, CI/CD pipelines)
- Understanding of AI/ML workload requirements and optimization
Project Context
The client is building customer-facing AI agents that analyze data to answer their user's questions. These agents need a modern, efficient execution platform that can:
- Deploy multiple agents simultaneously
- Scale to serve enterprise customers
- Integrate with existing machine learning models
- Enable rapid agent development and iteration by internal teams
- Support governance, cost tracking, and quality monitoring