Site Reliability Engineer

hace 5 días


Desde casa, México EPAM Systems, Inc. A tiempo completo

Join our team as a **Site Reliability Engineer,** where you will focus on cloud infrastructure, containerization, and monitoring using Kubernetes and Microsoft Azure.

**Responsibilities**
- Deploy and maintain Kubernetes resource manifests in clusters such as Kind, GKE, or AKS
- Troubleshoot and analyze logs to identify and resolve system events and issues
- Develop and maintain Azure DevOps CI/CD pipelines and GitOps deployment workflows
- Collaborate with teams to improve system reliability and deployment automation
- Manage infrastructure as code using Terraform and other tools
- Configure and maintain observability tools and alerting systems
- Ensure compliance with client constraints and security standards
- Participate in incident response and root cause analysis
- Document system configurations, processes, and procedures
- Support continuous improvement of deployment and monitoring practices

**Requirements**:

- Hands-on programming experience of at least 2 years
- Proficiency in at least one scripting language
- Experience with Kubernetes container orchestration
- Knowledge of at least one cloud provider including Microsoft Azure or Google Cloud Platform
- Familiarity with Prometheus or similar monitoring tools for observability
- Experience with Azure DevOps CI/CD pipelines or GitOps tools like Helm and ArgoCD
- Understanding of distributed systems troubleshooting and log analysis
- Practical skills in containerization using Docker or Podman
- Experience creating and managing Kubernetes resource manifests
- Ability to deploy and monitor Prometheus agents
- Knowledge of infrastructure as code tools such as Terraform
- Strong problem-solving and analytical skills
- Effective communication and teamwork abilities
- English proficiency at B2 level or higher

**We offer**
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Monthly non-taxable amount for the electricity and internet bills

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.



  • Desde casa, México thegetch mexico A tiempo completo

    **Función: Site Reliability Engineer****Aperturas: más de 10 contrataciones****Ubicación: - any city with TCS Office presence (Queretaro, Guadalajara, Mexico City or Monterrey)****Salario:- 25-33 USD/hr****Comunicación en inglés: avanzado****Experiência: 4+ años****Responsabilidades de Site Reliability Engineer**:Reúna y analice métricas de sistemas...


  • Desde casa, México Right Balance A tiempo completo

    **Overview** We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA. **Engagement Details** Our client is a USA-based company producing video solutions with the mission to advance scientific...


  • Desde casa, México Right Balance A tiempo completo

    **Overview**We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA.**Engagement Details**Our client is a USA-based company producing video solutions with the mission to advance scientific...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are seeking an experienced **Senior Site Reliability Engineer**to join our team. As a key member of the Reliability Tooling team, you will be responsible for writing and reviewing code, contributing to critical technical decisions, and mentoring engineers within your squad. This role requires a deep understanding of SRE principles and best practices, as...


  • Desde casa, México Tekshapers Inc A tiempo completo

    **Position : Lead Site Reliability Engineer****Location : Remote****Duration : Contract**- Lead and mentor a team of SREs to ensure operational excellence and maximize the reliability and availability of client systems.- Minimum 10 years of work experience in DevOps/SRE, including leadership roles.- Architect and design highly scalable and available...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are looking for an experienced **Lead Site Reliability Engineer**to join our team. In this role, you will play a pivotal part in the Reliability Tooling team, taking responsibility for writing and reviewing code, making key technical decisions, and mentoring engineers within your squad. This position requires a strong grasp of SRE principles and best...


  • Desde casa, México Luxoft A tiempo completo

    **Project description**: Do you like to work with existing and new software product development teams? This position is to instrument end-to-end observability and visibility for business-critical systems with log ingestion, metrics, and traces. You will function as a site reliability engineer (SRE) that will collaborate with product teams, infrastructure...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems. **Responsibilities** - Troubleshoot and resolve complex incidents to maintain system uptime - Ensure reliability and performance of Azure-based enterprise infrastructure - Implement observability, monitoring, and logging...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems. **Responsibilities** - Resolve complex incidents to ensure system availability - Maintain reliability and performance of Azure-based enterprise infrastructure - Deploy observability, monitoring, and logging tools - Automate...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems.**Responsibilities**- Troubleshoot and resolve complex incidents to maintain system uptime- Ensure reliability and performance of Azure-based enterprise infrastructure- Implement observability, monitoring, and logging solutions-...