Site Reliability Developer

hace 7 días


Zapopan, México Oracle A tiempo completo

As part of the Site Reliability Engineering (SRE) team, you’ll contribute to designing, automating, and evolving mission-critical systems. You'll combine deep systems expertise with modern software engineering practices to reduce operational toil and build resilient, self-healing services.This is a high-impact role where your work directly affects the reliability of cloud services used by thousands of customers around the world.**What You’ll Do**:- Collaborate with SRE and development teams to ensure end-to-end reliability across a wide range of services and technology stacks.- Design, write, and deploy software and automation tools that enhance availability, observability, and scalability.- Own and evolve metrics, SLOs, SLAs, KPIs, and dashboards that track system health and customer experience.- Build tooling to reduce manual operations and eliminate sources of toil.- Improve CI/CD pipelines, deployment processes, and validation frameworks for reliability and efficiency.- Review and influence architectural designs for distributed systems with a focus on resilience, performance, and fault tolerance.- Lead and participate in post-incident reviews, capacity planning, and production-readiness assessments.- Provide on-call support on a rotational basis (12-hour shifts, 7-day coverage).**What We’re Looking For**:- Advanced Linux systems administration- Strong coding skills in Python (automation-focused)- Intermediate experience with Bash/Shell scripting- Familiarity with networking principles and distributed systems behavior- Basic to intermediate knowledge of databases (e.g., SQL, NoSQL)- Understanding of unit testing and modern software engineering practices- Experience with CI/CD pipelines and deployment automation- Comfortable working in Agile development environments**Nice to Have**:- Exposure to monitoring/observability tools (e.g., Prometheus, Grafana, New Relic)- Experience building internal tools for operational efficiency- Participation in SRE culture: blameless postmortems, runbooks, and service design reviews



  • Zapopan, México Oracle A tiempo completo

    We are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and performance. You...


  • Zapopan, México Oracle A tiempo completo

    We are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and performance. You...


  • Zapopan, México Oracle A tiempo completo

    Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate...


  • Zapopan, Jalisco, México Oracle A tiempo completo

    DescriptionWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and...

  • Site Reliability Engineer

    hace 2 semanas


    Zapopan, México Oracle A tiempo completo

    About The Job: At Oracle, we're seeking a talented and skilled Site Reliability Engineer to work on Oracle Cloud Observability and Management platform. As a Site Reliability Engineer, you will solve interesting technical challenges by designing, deploying, and troubleshooting key Cloud services, platforms, and infrastructure, always thinking about...


  • Zapopan, México Oracle A tiempo completo

    **Job Description**:Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the critical stack, with focus...


  • Zapopan, México Oracle A tiempo completo

    Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas.Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.Responsible for the design and delivery of the mission critical stack, with focus on security,...


  • Zapopan, México Oracle A tiempo completo

    Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security,...


  • Zapopan, México Oracle A tiempo completo

    Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security,...


  • Zapopan, México Oracle A tiempo completo

    Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate...