Site Reliability Developer
hace 1 semana
We are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and performance. You will work closely with engineering, product, and operations teams to design and implement robust automation and monitoring solutions, and lead efforts to improve system reliability and efficiency.
**Responsibilities**:
- Work with Site Reliability Engineering (SRE) team to build, and maintain OCI compute cloud infrastructure and services across multiple geographic regions.
- Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of Oracle Cloud Region Build services.
- Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs).
- Be part of incident response to help removing blockers during the region build process.
- Continuously improve compute cloud infrastructure region build.
- Participate in on-call rotations and provide support for critical infrastructure issues.
- Automate infrastructure provisioning, configuration, and deployment using tools like Terraform.
- Collaborate with cross-functional teams to design and roll out new cloud region builds and expansions.
- Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations.
- Collaborate with software engineers to build scalable, reliable, and highly available cloud-native systems.
- Monitor system health and performance using tools like Grafana.
- Document operational procedures, and runbooks.
**Required Qualifications**:
- Bachelor’s degree in Computer Science, Engineering, or related technical field (or equivalent experience).
- Proven experience (3+ years) as an SRE, Cloud Engineer, or DevOps Engineer in cloud environments.
- Strong knowledge of cloud platforms such as AWS, GCP, or Azure with hands-on experience in building and managing regional deployments.
- Expertise in Infrastructure as Code (Terraform, CloudFormation, Ansible, etc.).
- Proficient with scripting languages (Python, Bash, Go, etc.).
- Experience with monitoring, alerting, and logging tools (Prometheus, Grafana, ELK stack, Datadog, etc.).
- Solid understanding of networking, security, and distributed systems in cloud environments.
- Experience working in Agile teams and collaborating with software engineers and product teams.
- Strong troubleshooting and problem-solving skills.
- Excellent communication and documentation skills.
-
Site Reliability Developer
hace 7 días
Zapopan, México Oracle A tiempo completoAs part of the Site Reliability Engineering (SRE) team, you’ll contribute to designing, automating, and evolving mission-critical systems. You'll combine deep systems expertise with modern software engineering practices to reduce operational toil and build resilient, self-healing services.This is a high-impact role where your work directly affects the...
-
Site Reliability Developer
hace 1 semana
Zapopan, México Oracle A tiempo completoWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and performance. You...
-
Site Reliability Developer
hace 7 días
Zapopan, México Oracle A tiempo completoSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate...
-
Site Reliability Developer 3
hace 2 días
Zapopan, Jalisco, México Oracle A tiempo completoDescriptionWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and...
-
Site Reliability Engineer
hace 2 semanas
Zapopan, México Oracle A tiempo completoAbout The Job: At Oracle, we're seeking a talented and skilled Site Reliability Engineer to work on Oracle Cloud Observability and Management platform. As a Site Reliability Engineer, you will solve interesting technical challenges by designing, deploying, and troubleshooting key Cloud services, platforms, and infrastructure, always thinking about...
-
Site Reliability Developer
hace 3 semanas
Zapopan, México Oracle A tiempo completo**Job Description**:Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the critical stack, with focus...
-
Senior Site Reliability Developer
hace 2 semanas
Zapopan, México Oracle A tiempo completoWork with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas.Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.Responsible for the design and delivery of the mission critical stack, with focus on security,...
-
Senior Site Reliability Developer
hace 3 semanas
Zapopan, México Oracle A tiempo completoWork with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security,...
-
Senior Site Reliability Developer
hace 1 semana
Zapopan, México Oracle A tiempo completoWork with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security,...
-
Principal Site Reliability Developer
hace 7 días
Zapopan, México Oracle A tiempo completoSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate...