Lead AI Platform Engineer
hace 5 días
We are looking for a Lead AI Platform Engineer to architect, deploy, and manage scalable Databricks platforms on AWS that support advanced ML and analytics pipelines.
In this role, you will work closely with data scientists and ML engineers to enhance the Lakehouse developer environment and drive innovation in AI infrastructure. Join us to lead the development of state-of-the-art AI platform solutions.
Responsibilities
- Architect and deploy scalable Databricks platform solutions for analytics, machine learning, and GenAI workflows across multiple environments
- Manage and enhance Databricks workspaces, including cluster policies, autoscaling, GPU compute, and job clusters
- Oversee Unity Catalog governance by managing metastores, catalogs, schemas, data sharing, masking, lineage, and access control
- Develop and maintain Infrastructure as Code with Terraform to enable automated, consistent platform provisioning
- Establish CI/CD pipelines for notebooks, libraries, DLT processes, and ML assets using GitHub Actions and Databricks APIs
- Standardize experiment tracking and model registry workflows with MLflow and manage model serving endpoints with monitoring and rollback
- Optimize Delta Lake batch and streaming pipelines using Auto Loader, Structured Streaming, and DLT while ensuring data quality and SLA compliance
- Collaborate with cross-functional teams to integrate platform features and deliver an exceptional developer experience
- Monitor system performance, troubleshoot issues, and implement enhancements to guarantee platform reliability and scalability
- Document platform operations and maintain automation runbooks for governance and support
- Coordinate with security teams to enforce data governance, encryption, and compliance standards
- Champion best practices in coding, testing, and deployment across the platform engineering team
- Drive ongoing improvements in automation and operational efficiency for the platform
- Engage stakeholders to capture requirements and provide expert technical guidance
- Lead and mentor junior engineers, sharing expertise in platform technologies
Requirements
- Proven expertise administering Databricks on AWS including Unity Catalog governance and enterprise integrations with at least 5 years in platform engineering
- Comprehensive knowledge of AWS services such as VPC, IAM, KMS, S3, CloudWatch, and network architecture
- Advanced skills with Terraform including the Databricks provider and experience with Infrastructure as Code for cloud environments
- Strong proficiency in Python and SQL, including packaging libraries and managing notebooks and repositories
- Experience using MLflow for experiment tracking, model registry, and model serving endpoints
- Familiarity with Delta Lake, Auto Loader, Structured Streaming, and DLT technologies
- Solid experience implementing DevOps automation, CI/CD pipelines, and using GitHub Actions or similar tools
- Expertise in Git and GitHub, including code review processes and branching strategies
- Working knowledge of REST APIs, Databricks CLI, and automation scripting
- Excellent communication and stakeholder management abilities
- Capacity to work autonomously and within distributed teams
- Detail-focused with strong problem-solving and organizational skills
- English language proficiency at B2 (Upper-Intermediate) level or above
Nice to have
- Hands-on experience with AWS EKS and Kubernetes
- Understanding of MLOps methodologies and pipeline automation
- Knowledge of attribute-based access control and enhanced data governance frameworks
- Experience with Secrets management and SSO/SCIM provisioning
- Relevant certifications in AWS or Databricks platform engineering
-
Ai Engineer
hace 7 días
Desde casa, México Mechanized AI A tiempo completo**Title**:AI Engineer **Job Type**:Full-Time **Location**:Remote **Company Description**: Mechanized AI is at the forefront of AI innovation, leveraging cutting-edge technology to transform legacy systems into modern, efficient, and scalable solutions. We work with today's fast-paced, digital landscape. Our team thrives on solving complex...
-
Lead Ai Platform Engineer
hace 2 días
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are seeking a capable **Lead AI Platform Engineer** to architect and manage sophisticated systems that enable stream-aligned teams to deliver reliable, scalable, and secure AI-driven solutions.In this role, you will provide mentorship to team members, collaborate with cross-functional stakeholders, and ensure production-grade deployment of machine...
-
Lead Ai Engineer
hace 4 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are seeking a **Lead AI Engineer** with expertise in Python to drive the development and deployment of innovative AI solutions across critical projects.**Responsibilities**- Design AI models and systems using Python, agentic frameworks, and MCP- Build scalable data agents with Google ADK and other targeted tools- Integrate robust AI frameworks to automate...
-
Chief Ai Platform Engineer
hace 2 días
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are seeking an accomplished **Chief AI Platform Engineer** to lead the development and management of advanced systems that empower stream-aligned teams to deliver secure, scalable, and high-performing AI-driven solutions.In this role, you will establish strategic direction for the platform, shape cross-functional collaboration, and drive production-grade...
-
Senior AI Platform Engineer
hace 5 días
Desde casa, México EPAM Systems A tiempo completoJoin our team as a Senior AI Platform Engineer, where you will design, deploy, and maintain next-generation Databricks platforms on AWS to support advanced analytics and machine learning workflows.You will collaborate closely with data scientists and ML engineers to deliver a seamless developer experience on the Lakehouse. Apply now to contribute to...
-
Lead Cloud Platform Engineer
hace 1 semana
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are looking for an accomplished **Lead Cloud Platform Engineer**with deep expertise in AWS Cloud, CI/CD, Python, and Bash scripting to lead our innovative team.As a Lead Cloud Platform Engineer, you will take a critical leadership role in architecting, scaling, and optimizing an enterprise-grade self-service platform that empowers development teams and...
-
Lead Cloud Platform Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are looking for an accomplished **Lead Cloud Platform Engineer**with deep expertise in AWS Cloud, CI/CD, Python, and Bash scripting to lead our innovative team. As a Lead Cloud Platform Engineer, you will take a critical leadership role in architecting, scaling, and optimizing an enterprise-grade self-service platform that empowers development teams and...
-
Lead DevOps Engineer
hace 2 días
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are seeking a highly skilled **Lead DevOps Engineer**to join our team.In this role, you will focus on enhancing infrastructure, improving workflows, and optimizing the performance of AI/ML systems. You will collaborate with multidisciplinary teams to innovate solutions, ensure system reliability, and facilitate the seamless deployment of machine learning...
-
Senior Ai Platform Engineer
hace 2 días
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are looking for a highly skilled **Senior AI Platform Engineer** to design and maintain advanced systems that equip stream-aligned teams with scalable, secure, and reliable AI-powered solutions. You will closely collaborate with diverse stakeholders to optimize software delivery and operationalize machine learning models for production, all while...
-
Senior Ai Engineer
hace 4 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are hiring a **Senior AI Engineer** with expertise in Python to lead the development and implementation of advanced AI solutions across key initiatives.**Responsibilities**- Design AI models and systems using Python, MCP, and agentic frameworks- Develop scalable data agents with Google ADK and complementary tools- Implement robust frameworks to enable...