Infra AI Manager-Manager
Job description
At EY, we’re all in to shape your future with confidence.
We’ll help you succeed in a globally connected powerhouse of diverse teams and take your career wherever you want it to go.
Join EY and help to build a better working world.
Designation-Infra AI Manager
Job Description
- Architect and lead the development of AI-integrated infrastructure platforms, supporting both classical ML and generative AI workloads across Azure, AWS, or GCP.
- Drive end-to-end lifecycle management of AI/ML models, including deployment, monitoring, retraining, and scaling using tools like Kubeflow, MLflow, SageMaker, Azure ML, and TensorFlow Serving.
- Build and optimize data lakehouse architectures, pipelines, and distributed systems for high-throughput AI workloads.
- Lead the implementation of Infrastructure as Code (IaC) using Terraform, Pulumi, and Ansible, ensuring modularity, reusability, and compliance.
- Develop and integrate AI-enhanced observability frameworks using Prometheus, Grafana, ELK Stack, and OpenTelemetry for predictive alerting and anomaly detection.
- Oversee cloud migration and modernization initiatives, applying AI-driven insights for workload placement, performance tuning, and cost optimization.
- Establish and enforce DevSecOps practices, embedding security, governance, and compliance into infrastructure automation workflows.
- Collaborate with Data Science, DevOps, and Product teams to align infrastructure capabilities with AI model requirements and business objectives.
- Lead the adoption of MLOps best practices, including CI/CD for model deployment, versioning, and rollback strategies.
- Manage and mentor a team of engineers, fostering a culture of innovation, technical excellence, and continuous improvement.
- Balance short-term delivery goals with long-term infrastructure strategy, ensuring scalability, reliability, and performance across AI systems.
Desired Profile
- 10+ years of experience in infrastructure engineering, cloud automation, or AI platform development.
- Proven leadership in designing and scaling AI/ML infrastructure across cloud-native environments.
- Deep hands-on expertise in Python, Terraform, Kubernetes, and containerized AI workloads.
- Strong experience with AI/ML platforms: Kubeflow, MLflow, Azure ML, SageMaker, TensorFlow, PyTorch.
- Expertise in cloud platforms (Azure, AWS, GCP) and hybrid cloud architecture.
- Familiarity with data lakehouse technologies, distributed data processing, and scalable storage systems.
- Experience with CI/CD pipelines, GitOps, and automated model deployment workflows.
- Strong understanding of security frameworks, compliance standards, and infrastructure governance.
- Excellent communication and stakeholder management skills, with the ability to align technical strategy with business outcomes..
Experience
- 10 years and above
Education
- B.Tech. / BS in Computer Science
Technical Skills & Certifications
- Azure Solutions Architect Expert, AWS Certified Machine Learning – Specialty, or Google Cloud ML Engineer
- HashiCorp Certified Terraform Associate
- Certified Kubernetes Administrator (CKA)
- Certified MLOps Professional (e.g., MLflow, TFX)
EY | Building a better working world
EY is building a better working world by creating new value for clients, people, society and the planet, while building trust in capital markets.
Enabled by data, AI and advanced technology, EY teams help clients shape the future with confidence and develop answers for the most pressing issues of today and tomorrow.
EY teams work across a full spectrum of services in assurance, consulting, tax, strategy and transactions. Fueled by sector insights, a globally connected, multi-disciplinary network and diverse ecosystem partners, EY teams can provide services in more than 150 countries and territories.