Site Reliability Developer
hace 4 semanas
As part of the Site Reliability Engineering (SRE) team, you’ll contribute to designing, automating, and evolving mission-critical systems. You'll combine deep systems expertise with modern software engineering practices to reduce operational toil and build resilient, self-healing services.This is a high-impact role where your work directly affects the reliability of cloud services used by thousands of customers around the world.**What You’ll Do**:- Collaborate with SRE and development teams to ensure end-to-end reliability across a wide range of services and technology stacks.- Design, write, and deploy software and automation tools that enhance availability, observability, and scalability.- Own and evolve metrics, SLOs, SLAs, KPIs, and dashboards that track system health and customer experience.- Build tooling to reduce manual operations and eliminate sources of toil.- Improve CI/CD pipelines, deployment processes, and validation frameworks for reliability and efficiency.- Review and influence architectural designs for distributed systems with a focus on resilience, performance, and fault tolerance.- Lead and participate in post-incident reviews, capacity planning, and production-readiness assessments.- Provide on-call support on a rotational basis (12-hour shifts, 7-day coverage).**What We’re Looking For**:- Advanced Linux systems administration- Strong coding skills in Python (automation-focused)- Intermediate experience with Bash/Shell scripting- Familiarity with networking principles and distributed systems behavior- Basic to intermediate knowledge of databases (e.g., SQL, NoSQL)- Understanding of unit testing and modern software engineering practices- Experience with CI/CD pipelines and deployment automation- Comfortable working in Agile development environments**Nice to Have**:- Exposure to monitoring/observability tools (e.g., Prometheus, Grafana, New Relic)- Experience building internal tools for operational efficiency- Participation in SRE culture: blameless postmortems, runbooks, and service design reviews
-
Site Reliability Developer
hace 21 horas
Zapopan, México Oracle A tiempo completoWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and performance. You...
-
Site Reliability Developer
hace 21 horas
Zapopan, México Oracle A tiempo completoWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and performance. You...
-
Site Reliability Developer
hace 21 horas
Zapopan, México Oracle A tiempo completoWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and performance. You...
-
Site Reliability Developer
hace 21 horas
Zapopan, México Oracle A tiempo completoWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and performance. You...
-
Site Reliability Developer 4
hace 1 semana
Zapopan, Jalisco, México Oracle A tiempo completoDescriptionWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and...
-
Site Reliability Developer 3
hace 23 horas
Zapopan, Jalisco, México Oracle A tiempo completoDescriptionWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and...
-
Site Reliability Engineer
hace 3 días
Zapopan, México Oracle A tiempo completoAbout The Job:At Oracle, we're seeking a talented and skilled Site Reliability Engineer to work on Oracle Cloud Observability and Management platform.As a Site Reliability Engineer, you will solve interesting technical challenges by designing, deploying, and troubleshooting key Cloud services, platforms, and infrastructure, always thinking about reliability,...
-
Site Reliability Developer
hace 3 semanas
Zapopan, México Oracle A tiempo completoWe are hiring for OCI Corporate Network and Security operation. You will be a member of the team responsible for network and security Incident/change/capacity management for supporting the Oracle Corporate Network. Resolve the complex problems related to network infrastructure services and build automation to prevent problem recurrence. Design, write, and...
-
Site Reliability Developer
hace 2 semanas
Zapopan, México Oracle A tiempo completo**Job Description**: Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the critical stack, with focus...
-
Site Reliability Developer
hace 3 semanas
Zapopan, México Oracle A tiempo completo**GoldenGate Service**is a new multi-tenant, cloud native service for real-time data integration and replication in heterogeneous IT environments. GoldenGate enables users to replicate and integrate data from different sources, including operational and analytics data, as well as real-time data streaming such as Apache Kafka. This service is expected to...