Site Reliability Engineering
hace 6 días
SITE RELIABILITY ENGINEERING-220008CJ
**Applicants are required to read, write, and speak the following languages***: English
**Preferred Qualifications**
Oracle, the world leader in Enterprise Cloud, is hiring passionate technologists in the industry as we continue to add customer-centric, world-class, leading edge, secure, hyper-scale based solutions throughout all levels of the cloud stack. Oracle’s cloud eco-system is the only complete business cloud platform on the planet, with market leading and business transforming solutions spanning SaaS, DaaS, PaaS and IaaS. If you are interested in developing solutions that ensure our world class ERP services are fast, secure, reliable and scalable then we invite you to explore the positions we have available in our group
Key Tasks and Responsibilities
- **Service Ownership** - You will be part of the SRE team, whose mission is the shared full stack ownership of a collection of services, with our Service Development and Operations SRE partners.
- **Ownership Scope** - You will understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of the production services you own. In partnership with your Service Development and Operations SRE partners, you will have the responsibility to ensure that services are designed and delivered to be mission critical with focus on monitoring, telemetry, security, resiliency, scale and performance.
- **Service Requirements** - You will provide direction and prioritization to service Product Management and Service Development teams to engineer and add premier SRE capabilities to the Oracle SaaS/ERP services.
- **Incident Response** - You will be the primary author of technical content for both customer and internal communications used throughout the incident response process, e.g. postmortem/root cause analysis, end-to-end repair item definition, and fixes in production.
- **Prevention** - Using data-driven incident findings, you will work on solutions that will ultimately prevent the incident/problem from arising ever again, and develop interim solutions to more quickly resolve the problem next time.
- **Service Performance** - You will work with SaaS Operations and Product Development teams to triage performance issues (both reactive and proactive). You will work with central teams to define and drive monitoring tooling and process enhancements, including identification of service metrics to enhance performance issue triage, diagnostics and improvements.
- **Service Health Reviews** - You will represent ERP Development in periodic cross-organizational service health reviews. You will help to identify patterns that influence service performance and/or reliability. You will lead efforts to eliminate process deficiencies and drive simplification into processes and procedures.
- **Automation** - Our goal is to eliminate human intervention wherever possible. You will be responsible for driving automation into our monitoring and recovery processes, code delivery procedures and issue resolution processes.
Skills and Qualifications (5 or more desired)
- Minimum of 5 years of software development and demonstrated knowledge of professional software engineering best practices for the full software development life cycle, including coding standards, code reviews, source control, build and release processes, continuous deployment and test suite development and maintenance.
- Problem solving skills with abilities in analysis, problem identification and resolution.
- Experience with enterprise system components, architecture and deployments
- Experience in deploying and running large scale online systems built on Cloud platforms such as Oracle Cloud, AWS, Azure, Google Cloud Platform and/or OpenStack
- Experience with monitoring and alerting using technologies like Prometheus, Sensu, Nagios, Kafka, Wavefront, BigPanda, DataDog, and/or PagerDuty.
- Experience with Oracle Linux, RedHat Linux, Ubuntu, Centos, CoreOS, and/or Amazon Linux.
- Experience in designing and building automated tools and solutions, including programming and data model design skills
- Hands-on with web protocols and Linux/Unix tools and architecture, from kernel to shell, file systems, and client-server protocols.
- Excellent written and verbal technical communications with technical and non-technical peers, customers and at times executive leadership.
- Proven success in contributing in a collaborative, team-oriented environment, with the ability to establish and nurture relationships at all levels.
- BS in Computer Science or related field and 5 years relevant experience.
**Detailed Description and Job Requirements**
As a member of the software engineering division, you will analyze and integrate external customer specifications. Specify, design and implement modest changes to existing software architecture. Build new products and development tools. Build and execute unit tests and unit test plans. Review integration and
-
Site Reliability Engineer
hace 2 semanas
Guadalajara, México Capgemini Engineering A tiempo completo**Site Reliability Engineer (REMOTE)**:**At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world’s most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the...
-
Sre (Site Reliability Engineering) On Site Guadalajara
hace 3 semanas
Guadalajara, México GSB A tiempo completoImportant IT company At the Latin American level, growth requires:**SRE- Site Reliability Engineering****Job description**:- We are looking for a Lead Site Reliability Engineer who takes the initiative on developing and maintain the system and services for our Cash Management Platform, automating the deployment process, ensuring system scaling, investigating...
-
Senior Site Reliability Engineer
hace 2 semanas
Guadalajara, México Capgemini Engineering A tiempo completo**Senior SRE - Capgemini**:We’re hiring a **Senior Site Reliability Engineer**to join a major telecom client through Capgemini Engineering. Join a collaborative team building and operating large-scale cloud platforms that power next‑generation connectivity and customer experiences. This is a hands‑on role where you’ll design, automate, secure, and...
-
Site Reliability Engineer
hace 2 semanas
Guadalajara, México Valce Talent Solutions A tiempo completoWe are looking for a Lead Site Reliability Engineer who takes the initiative on developing and maintain the system and services for our Cash Management Platform, automating the deployment process, ensuring system scaling, investigating and resolving outdates, identifying and implementing preventive measures proactively, collaborating with key stakeholders,...
-
Site Reliability Engineer
hace 3 semanas
Guadalajara, México Valce Talent Solutions A tiempo completoWe are looking for a Lead Site Reliability Engineer who takes the initiative on developing and maintain the system and services for our Cash Management Platform, automating the deployment process, ensuring system scaling, investigating and resolving outdates, identifying and implementing preventive measures proactively, collaborating with key stakeholders,...
-
Site Reliability Engineer
hace 3 semanas
Guadalajara, México Valce Talent Solutions A tiempo completoWe are looking for a Lead Site Reliability Engineer who takes the initiative on developing and maintain the system and services for our Cash Management Platform, automating the deployment process, ensuring system scaling, investigating and resolving outdates, identifying and implementing preventive measures proactively, collaborating with key stakeholders,...
-
Site Reliability Professional
hace 3 semanas
Guadalajara, México Intel A tiempo completoCome and join a dynamic and challenging team within the Intel Data Center and Artificial Intelligence Group focused on engineering, developing, and supporting world class platforms and component building blocks aligned to the Data Center roadmap and strategies.We are seeking a well-rounded Site Reliability Engineer to work with a team of architects and other...
-
Site Reliability Engineer
hace 4 semanas
Guadalajara, México Capgemini Engineering A tiempo completo**At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world’s most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and...
-
Site Reliability Engineer
hace 3 días
Guadalajara, México Capgemini Engineering A tiempo completo**At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world’s most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and...
-
Site Reliability Engineer
hace 3 semanas
Guadalajara, México Wizeline A tiempo completo**The Company**:Wizeline is a global digital services company helping mid-size to Fortune 500 companies build, scale, and deliver high-quality digital products and services.We thrive in solving our customer's challenges through human-centered experiences, digital core modernization, and intelligence everywhere (AI/ML and data).We help them succeed in...