Senior Site Reliability Engineer/DevOps

hace 4 semanas


Ciudad de México Gila Software A tiempo completo

We are hiring a development-oriented, collaborative, and detail-focused  Site Reliability Engineer (SRE)responsible for solving operational, scalability, and reliability challenges. In this role, you will apply software engineering methodologies to system administration processes and collaborate with software engineers and product developers to optimize system performance, stability, and reliability. The ideal candidate will focus on improving and automating operational tasks while ensuring system availability and scalability. You will manage critical aspects such as latency, performance efficiency, monitoring, emergency response, capacity planning, and change management alongside your team.
We are seeking a proactive individual with strong leadership, resource administration, and communication skills who thrives in a team-oriented environment. A background in development, combined with hands-on SRE or DevOps experience, is essential.

What will you do?
Gain a deep understanding of our platform, how it serves our clients, and how they interact with it.
Monitor and maintain system availability, performance, and overall health.
Build tools and systems to automate infrastructure management and operations.
Run production environments with a holistic view of reliability, uptime, and scalability.
Implement Infrastructure as Code (IaC) using tools like Terraform.
Develop and manage CI/CD pipelines for seamless code integration and deployment.
Create and maintain robust monitoring, alerting, and logging frameworks using tools such as New Relic, SumoLogic, Pingdom, CloudWatch, and CloudTrail.
Lead incident response efforts, perform root cause analysis, and implement preventative measures.
Participate in on-call rotations and ensure proper incident management and escalation.
Collaborate with developers to enhance release processes, testing, and deployment automation.
Document operational processes and create detailed runbooks/playbooks for emergency response.
Measure and optimize system performance using SLOs, SLIs, and key metrics.

Requirements:
~5–7 years of proven experience in a Site Reliability Engineering or DevOps role.
~ Bachelor’s Degree in Computer Science, Engineering, or related field, or equivalent practical experience.
~ Advanced English communication skills, both verbal and written.
~ Background in software development (no longer a full-time developer but with hands-on past experience).
~ Expert-level experience with AWS (mandatory) and cloud-native technologies.
~ Strong understanding of Linux system internals, networking, distributed systems, and service-oriented architectures.
~ Proficiency in containerization and orchestration technologies (e.g., Docker, Kubernetes).
~ Hands-on experience with Infrastructure as Code (IaC), particularly with Terraform.
~ Experience with relational databases (MSSQL, MySQL, Aurora MySQL) and NoSQL (especially DynamoDB).
~ Knowledge of observability concepts, including metrics, logging, tracing, SLOs, and SLIs.
~ Familiarity with CI/CD tools (e.g., Jenkins, CodePipeline, CodeDeploy).
~ Ability to lead and influence technical decisions in a cross-functional team environment.
~ Proactive mindset with strong problem-solving and automation skills.
~ Passion for continuous improvement, scalability, and operational excellence.



  • Ciudad de México Royal Caribbean Group A tiempo completo

    Join to apply for the Senior Site Reliability Engineer role at Royal Caribbean Group . 1 week ago Be among the first 25 applicants. Journey with us! Combine your career goals and sense of adventure by joining our incredible team at Royal Caribbean Group . We offer a competitive compensation and benefits package, along with excellent career development...


  • Ciudad de México Tata Consultancy Services A tiempo completo

    We are looking for a Site Reliability Engineer (SRE) to join our team and help us ensure seamless, high-performing, and reliable technology operations. What you’ll work with: Azure DevOps - Pipelines, repositories, and automation ServiceNow - Incident, change, and problem management AppDynamics - Application performance monitoring and alerting Microsoft...


  • Ciudad de México ITJ A tiempo completo

    Our customer is revolutionizing the cancer diagnostics space and is now looking for another Site Reliability Engineer (SRE) to join its incredible team. The Site Reliability Engineering team constantly practices the DevOps mindset to build and deploy distributed, fault-tolerant systems at scale. As part of this team, you will work with developers,...


  • Ciudad de México Royal Caribbean Group A tiempo completo

    Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Senior Site Reliability Engineer Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group . We are proud to offer a competitive compensation and benefits package, and excellent career...


  • Ciudad de México GrainChain Inc A tiempo completo

    Estamos en busca de nuevos talentos! GrainChain es una empresa tecnológica dedicada a reducir la brecha digital en la industria agrícola. Nuestras plataformas facilitan las transacciones de manera rápida, seguras y sencillas para nuestros usuarios. Estamos en búsqueda de un Site Reliability Engineer capaz de integrar y automatizar las áreas de...


  • Estado de México The Dignify Solutions, LLC A tiempo completo

    A technology solutions firm is seeking a Senior AWS DevOps Engineer in Mexico. The role requires strong knowledge of AWS, CI/CD pipelines, and experience with automation tools. Candidates should have a deep understanding of Cloud infrastructure and Site Reliability Engineering concepts. This full-time position offers the opportunity to lead Agile-oriented...


  • Ciudad de México Thomson Reuters A tiempo completo

    Location Mexico City, Mexico Category Technology Careers Job Id JREQ Job Type Full time Hybrid**Senior Site Reliability Engineer, Service Management.**Are you passionate about the chance to bring your experience to a world-class company that is market-leading for both content and technology? If yes, we are looking for you!Join our team! Your mission will be...


  • Ciudad de México Thomson Reuters A tiempo completo

    Location Mexico City, Mexico Category Technology Careers Job Id JREQ Job Type Full time Hybrid**Senior Site Reliability Engineer**Are you passionate about the chance to bring your experience to a world-class company that is market-leading or both content and technology? If yes, we’re looking for you.Join our team! We are looking for a Senior Site...


  • Ciudad de México Thomson Reuters A tiempo completo

    Location Mexico City, Mexico Category Technology Careers Job Id JREQ Job Type Full time Hybrid**Senior Site Reliability Engineer**Are you passionate about the chance to bring your experience to a world-class company that is market-leading or both content and technology? If yes, we’re looking for you.Join our team! We are looking for a Senior Site...


  • México GrainChain Inc A tiempo completo

    Estamos en busca de nuevos talentos! GrainChain es una empresa tecnológica dedicada a reducir la brecha digital en la industria agrícola. Nuestras plataformas facilitan las transacciones de manera rápida, seguras y sencillas para nuestros usuarios. Estamos en búsqueda de un Site Reliability Engineer capaz de integrar y automatizar las áreas de...