Senior Site Reliability Engineer

hace 2 semanas


Desde casa, México EPAM Systems, Inc. A tiempo completo

We are seeking an experienced **Senior Site Reliability Engineer**to join our team.

As a key member of the Reliability Tooling team, you will be responsible for writing and reviewing code, contributing to critical technical decisions, and mentoring engineers within your squad. This role requires a deep understanding of SRE principles and best practices, as well as the ability to guide and support your team in achieving operational excellence.

**Responsibilities**
- Deploy and manage modern cloud technologies using Infrastructure as Code (IaC), self-healing mechanisms, and automated security patterns
- Create effective telemetry, alerts, and response mechanisms to reduce Mean Time to Recovery (MTTR)
- Collaborate within and across teams to provide technical leadership and ensure high-quality solutions
- Advise on best practices and develop tools to enable smooth adoption of service reliability methods, including sustainable incident response and blameless postmortems
- Identify opportunities to improve reliability, operational efficiency, and overall system performance
- Write code to enhance scalability, performance, maintainability, and security across systems
- Encourage team-wide participation in producing thoughtful, high-quality software solutions
- Mentor SREs in both technical and non-technical aspects of site reliability engineering

**Requirements**:

- Bachelor’s degree in Computer Science, Computer Engineering, or a related field
- At least 3 years of experience as a Site Reliability Engineer
- Hands-on experience with CI/CD practices and tools
- Background in DevOps, technical operations, systems engineering, or software engineering
- Excellent verbal and written communication skills
- A passion for leveraging technology and a commitment to continuous learning
- Proven experience using containers in enterprise production environments, such as Docker, Kubernetes, or LXC
- Proficiency in one or more programming languages, such as Python, Go, or Rust
- Fluent English communication skills, both written and spoken, at a B2 level or higher

**Nice to have**
- Experience with Amazon Elastic Container Service (ECS) or Elastic Kubernetes Service (EKS)
- Knowledge of configuration management tools such as Ansible or Chef
- Hands-on experience with Docker in production environments
- Familiarity with engineering excellence tools like GitLab or Jenkins
- Understanding of cloud platforms like Microsoft Azure and their PaaS/SaaS solutions
- Experience with deployment tools such as Spinnaker
- Proficiency in Infrastructure as Code tools like Terraform

**We offer**
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- International Mobility Plan within 25 countries
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days plus 4 floating days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Monthly non-taxable amount for the electricity and internet bills

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.


  • Site Reliability Engineer

    hace 2 semanas


    Desde casa, México thegetch mexico A tiempo completo

    **Función: Site Reliability Engineer****Aperturas: más de 10 contrataciones****Ubicación: - any city with TCS Office presence (Queretaro, Guadalajara, Mexico City or Monterrey)****Salario:- 25-33 USD/hr****Comunicación en inglés: avanzado****Experiência: 4+ años****Responsabilidades de Site Reliability Engineer**:Reúna y analice métricas de sistemas...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems.**Responsibilities**- Troubleshoot and resolve complex incidents to maintain system uptime- Ensure reliability and performance of Azure-based enterprise infrastructure- Implement observability, monitoring, and logging solutions-...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems. **Responsibilities** - Troubleshoot and resolve complex incidents to maintain system uptime - Ensure reliability and performance of Azure-based enterprise infrastructure - Implement observability, monitoring, and logging...


  • Desde casa, México EPAM Systems A tiempo completo

    **DESCRIPTION**:Join EPAM as a **Senior Site Reliability Engineer specializing in AWS!**In this role, you'll ensure fleet services reliability and availability under the SRE model.If you have a good track record of highly scalable, distributed systems projects and previous experience working as an SRE, we'd love to hear from you.EPAM is a leading global...


  • Desde casa, México Pinnacle A tiempo completo

    **Job Title**: Senior Azure Site Reliability Engineer **Reports** **To**: Azure Site Reliability Lead **About us**: Welcome to Pinnacle, the ultimate destination for sports enthusiasts seeking an exhilarating sportsbook and gaming experience! Established in 1998, we have solidified our position as one of the globe's foremost licensed online gaming...


  • Desde casa, México Pinnacle A tiempo completo

    **Job Title**: Senior Azure Site Reliability Engineer**Reports** **To**: Azure Site Reliability Lead**About us**:Welcome to Pinnacle, the ultimate destination for sports enthusiasts seeking an exhilarating sportsbook and gaming experience! Established in 1998, we have solidified our position as one of the globe's foremost licensed online gaming companies....


  • Desde casa, México Luxoft A tiempo completo

    **Project description**: Do you like to work with existing and new software product development teams? This position is to instrument end-to-end observability and visibility for business-critical systems with log ingestion, metrics, and traces. You will function as a site reliability engineer (SRE) that will collaborate with product teams, infrastructure...


  • Desde casa, México Right Balance A tiempo completo

    **Overview** We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA. **Engagement Details** Our client is a USA-based company producing video solutions with the mission to advance scientific...


  • Desde casa, México Right Balance A tiempo completo

    **Overview**We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA.**Engagement Details**Our client is a USA-based company producing video solutions with the mission to advance scientific...


  • Desde casa, México Tekshapers Inc A tiempo completo

    **Position : Lead Site Reliability Engineer****Location : Remote****Duration : Contract**- Lead and mentor a team of SREs to ensure operational excellence and maximize the reliability and availability of client systems.- Minimum 10 years of work experience in DevOps/SRE, including leadership roles.- Architect and design highly scalable and available...