Site Reliability Engineer
hace 2 semanas
Our Purpose
Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.
Title and Summary
Site Reliability Engineer (Automation & virtualization)Site Reliability EngineerAbout the Role
We're looking for a passionate and skilled Site Reliability Engineer (SRE) to join our Platform Engineering team. This role is pivotal in automating and managing VMware ESXi hypervisors across Dell and Cisco UCS platforms, ensuring high reliability, scalability, and performance of our infrastructure.
You'll work at the intersection of infrastructure and software, driving automation, observability, and operational excellence across our virtualization stack.
---
Key Responsibilities
Hypervisor & Infrastructure Management
- Deploy, configure, and patch ESXi hosts using tools like VMware Update Manager, iDRAC, and UCS Central.
- Validate host readiness and enforce consistency across environments.
Automation & Infrastructure as Code
- Build and maintain automation pipelines using PowerCLI, Python, Terraform, and Ansible.
- Develop Infrastructure-as-Code (IaC) templates for scalable provisioning.
NSX & Network Integration
- Administer NSX-T/V for logical switching, routing, and micro-segmentation.
- Troubleshoot endpoint tagging and network performance issues between NSX and ESXi.
Monitoring & Observability
- Implement observability stacks using Prometheus, Grafana, Splunk, and Dynatrace.
- Define and track SLOs, SLIs, and error budgets.
Security & Compliance
Planning & Optimization
- Lead modernization efforts including UCS blade decommissioning and Dell R760 upgrades.
- Optimize cluster and VM sizing for performance and cost efficiency.
Collaboration & Stakeholder Engagement
- Partner with application, storage, and network teams to align infrastructure with workload needs.
- Communicate upgrade plans and maintenance schedules across teams.
Documentation & Knowledge Sharing
- Maintain build guides, validation checklists, and operational runbooks.
- Contribute to internal wikis and onboarding materials.
Required Skills
- 5+ years in SRE, DevOps, or Platform Engineering roles.
- Strong scripting in PowerCLI, Python, or Go.
- Experience with VMware ESXi, vCenter, NSX, and UCS Manager.
- Proficiency in Terraform, Ansible, and CI/CD pipeline tools.
- Familiarity with observability platforms and incident response workflows.
Preferred Qualifications
- Experience with REST API integration for ESXi and vCenter.
- Knowledge of GitOps, AIOps, and chaos engineering practices.
- Certifications: VMware VCP, CKA/CKAD, or equivalent.
Corporate Security Responsibility
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:
Abide by Mastercard's security policies and practices;
Ensure the confidentiality and integrity of the information being accessed;
Report any suspected information security violation or breach, and
Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.
-
Site Reliability Engineer
hace 2 semanas
Ciudad de México, Ciudad de México Mastercard A tiempo completoOur PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships...
-
Site Reliability Engineer
hace 17 horas
Ciudad de México, Ciudad de México Azkait A tiempo completoAZKAITes una empresa mexicana que busca y conecta el mejor talento IT con empresas Latinoamericanas y de Estados Unidos.Estamos en la búsqueda de tu talento comoSite Reliability Engineer (SRE)Requisitos:Licenciatura o Ingeniería en Sistemas, Informática o afín.+5 años de experiencia en roles de SRE, DevOps o Ingeniería de Software.Experiencia...
-
Lead Site Reliability Engineer
hace 2 semanas
Ciudad de México, Ciudad de México Royal Caribbean Group A tiempo completoJourney with usCombine your career goals and sense of adventure by joining our incredible team of employees atRoyal Caribbean Group. We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world.We are proud to be the vacation-industry leader with global...
-
Site Reliability Engineer
hace 2 semanas
Ciudad de México, Ciudad de México Mastercard A tiempo completoOur PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships...
-
Site Reliability Engineer
hace 6 días
Ciudad de México, Ciudad de México Encora A tiempo completoImportant Information:Years of Experience: 5+ yearsJob Mode: Full-timeWork Mode: Remote within MexicoJob Summary:We are seeking a Site Reliability Engineer to ensure the reliability, scalability, and performance of custom platforms running on AWS infrastructure and Kubernetes. This role focuses on Tier 3 issue resolution, operational readiness for new...
-
Site Reliability Engineer
hace 1 semana
Ciudad de México, Ciudad de México Sur A tiempo completoAs the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform. You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...
-
Site Reliability Developer 3
hace 2 semanas
Ciudad de México, Ciudad de México Oracle A tiempo completoDescriptionSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems....
-
Site Reliability Engineer
hace 1 semana
Ciudad de México, Ciudad de México itD Website A tiempo completoitD is seeking a Site Reliability Engineer who will report to the Sr. Engineering Manager for a client in the gaming and entertainment space. As a Site Reliability Engineer, you will focus on designing, deploying, and operating resilient, secure, and globally scalable services in AWS, with , TypeScript, Kubernetes, GitLab, Argo CD (CI/CD). This...
-
Site Reliability Engineer
hace 2 días
Ciudad de México, Ciudad de México itD Tech A tiempo completoitD is seeking a Site Reliability Engineer who will report to the Sr. Engineering Manager for a client in the gaming and entertainment space. As a Site Reliability Engineer, you will focus on designing, deploying, and operating resilient, secure, and globally scalable services in AWS, with , TypeScript, Kubernetes, GitLab, Argo CD (CI/CD).This long-term W2...
-
Site Reliability Engineer
hace 1 semana
Ciudad de México, Ciudad de México Sur A tiempo completoAs the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform.You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...