Principal Site Reliability Engineer
hace 2 días
Job Description
As a senior member of the Site Reliability Engineering (SRE) team, you'll take ownership of highly available systems, influence service design, and work across teams to drive resiliency, automation, and operational excellence. This is a hands-on engineering role where deep infrastructure knowledge meets software engineering expertise, ideal for experienced SREs ready to take the lead.
Responsibilities
What You'll Do:
- Lead the design, automation, and support of OCI services with a focus on resiliency, security, scalability, and performance.
- Own and improve the end-to-end reliability metrics (SLOs, SLAs, KPIs) for your services.
- Design and implement high-availability architectures and standards for large-scale distributed systems.
- Serve as the ultimate escalation point for complex operational issues, using a deep understanding of service topologies and interdependencies.
- Architect and build automation and orchestration tools that reduce manual work and prevent problem recurrence.
- Collaborate with development teams to improve service designs, optimize deployments, and implement best practices for operational efficiency.
- Guide technical decision-making and mentor junior SREs and developers across teams.
- Participate in and lead postmortems, root cause analysis, and preventative design changes.
- Contribute to capacity planning, demand forecasting, and long-term service scalability strategies.
- Participate in a rotational on-call schedule to ensure the health and availability of production services.
What We're Looking For:
- Advanced experience with Linux systems administration
- Strong programming skills in Python (with automation libraries)
- Advanced Bash/Shell scripting
- Deep understanding of distributed systems, networking, and service architecture
- Solid knowledge of databases and how they behave in production (SQL or NoSQL)
- Strong understanding of CI/CD pipelines, Agile methodologies, and DevOps best practices
- Experience writing and maintaining unit tests and production-grade software
- Proven ability to lead cross-functional efforts and technical problem-solving in live environments
Nice to Have:
- Hands-on experience with monitoring and observability tools (Grafana, Prometheus, New Relic, etc.)
- Familiarity with Oracle Cloud Infrastructure (OCI) or other cloud platforms (AWS, Azure, GCP)
- Experience with Infrastructure-as-Code (Terraform, Ansible) and container orchestration (Kubernetes)
Qualifications
Career Level - IC4
About Us
As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation- or by calling in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
-
Principal Site Reliability Engineer
hace 5 días
Zapopan, Jalisco, México Oracle A tiempo completoDescriptionAs a senior member of the Site Reliability Engineering (SRE) team, you'll take ownership of highly available systems, influence service design, and work across teams to drive resiliency, automation, and operational excellence. This is a hands-on engineering role where deep infrastructure knowledge meets software engineering expertise, ideal for...
-
Site Reliability Engineer
hace 2 semanas
Zapopan, Jalisco, México GrainChain A tiempo completoEstamos en busca de nuevos talentosGrainChain es una empresa tecnológica dedicada a reducir la brecha digital en la industria agrícola. Nuestras plataformas facilitan las transacciones de manera rápida, seguras y sencillas para nuestros usuarios. Estamos en búsqueda de un Site Reliability Engineer capaz de integrar y automatizar las áreas de desarrollo...
-
Principal Site Reliability Developer
hace 2 días
Zapopan, Jalisco, México Oracle A tiempo completoDescriptionSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems....
-
Site Reliability Developer 3
hace 5 días
Zapopan, Jalisco, México Oracle A tiempo completoDescriptionWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and...
-
Site Reliability
hace 2 semanas
Zapopan, Jalisco, México Canonical - Jobs A tiempo completoCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers,...
-
Site Reliability Developer 4
hace 16 horas
Zapopan, Jalisco, México Oracle A tiempo completoJob DescriptionWe are looking for a skilled and motivated Cloud Region Build Site Reliability Engineer (SRE) to join our Oracle Cloud Infrastructure Region Build team. In this role, you will be responsible for building, deploying, and maintaining compute cloud infrastructure services across multiple regions to ensure high availability, scalability, and...
-
Principal Site Reliability Developer
hace 5 días
Zapopan, Jalisco, México Oracle A tiempo completoDescriptionWork with an elite team to provide Oracle Database Administration support for customer production systems in the Oracle Cloud, with the opportunity to work on the latest Oracle database releases and features as part of the cloud first strategy. Provide DBA operational support with a high degree of customer service, technical expertise, and...
-
Senior Site Reliability
hace 2 semanas
Zapopan, Jalisco, México Canonical - Jobs A tiempo completoCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers,...
-
Principal Product Engineer
hace 16 horas
Zapopan, Jalisco, México Oracle A tiempo completoDescriptionResponsibilitiesBe on the technology and supply chain forefront for the Manufacturing and Operations team within Supply Chain Operations. As the regional factory Product Engineer, you will engage directly and be co-located with our suppliers ensuring a high-quality operation. In partnership with Design Engineering, External Manufacturing...
-
Principal Site Reliability Developer
hace 5 días
Zapopan, Jalisco, México Oracle A tiempo completoWork with an elite team to provide Oracle Database Administration support for customer production systems in the Oracle Cloud, with the opportunity to work on the latest Oracle database releases and features as part of the cloud first strategy. Provide DBA operational support with a high degree of customer service, technical expertise, and timeliness. ...