Service Reliability Engineer

hace 2 semanas


Guadalajara, México Oracle A tiempo completo

Service Reliability Engineer- J**Applicants are required to read, write, and speak the following languages**: English**Preferred Qualifications**Oracle, the world leader in Enterprise Cloud, is hiring passionate technologists in the industry as we continue to add customer-centric, world-class, leading edge, secure, hyper-scale based solutions throughout all levels of the cloud stack. Oracle’s cloud eco-system is the only complete business cloud platform on the planet, with market leading and business transforming solutions spanning SaaS, DaaS, PaaS and IaaS. If you are interested in developing solutions that ensure our world class ERP services are fast, secure, reliable and scalable then we invite you to explore the positions we have available in our groupKey Tasks and Responsibilities- **Service Ownership** - You will be part of the SRE team, whose mission is the shared full stack ownership of a collection of services, with our Service Development and Operations SRE partners.- **Ownership Scope** - You will understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of the production services you own. In partnership with your Service Development and Operations SRE partners, you will have the responsibility to ensure that services are designed and delivered to be mission critical with focus on monitoring, telemetry, security, resiliency, scale and performance.- **Service Requirements** - You will provide direction and prioritization to service Product Management and Service Development teams to engineer and add premier SRE capabilities to the Oracle SaaS/ERP services.- **Incident Response** - You will be the primary author of technical content for both customer and internal communications used throughout the incident response process, e.g. postmortem/root cause analysis, end-to-end repair item definition, and fixes in production.- **Prevention** - Using data-driven incident findings, you will work on solutions that will ultimately prevent the incident/problem from arising ever again, and develop interim solutions to more quickly resolve the problem next time.- **Service Performance** - You will work with SaaS Operations and Product Development teams to triage performance issues (both reactive and proactive). You will work with central teams to define and drive monitoring tooling and process enhancements, including identification of service metrics to enhance performance issue triage, diagnostics and improvements.- **Service Health Reviews** - You will represent ERP Development in periodic cross-organizational service health reviews. You will help to identify patterns that influence service performance and/or reliability. You will lead efforts to eliminate process deficiencies and drive simplification into processes and procedures.- **Automation** - Our goal is to eliminate human intervention wherever possible. You will be responsible for driving automation into our monitoring and recovery processes, code delivery procedures and issue resolution processes.Skills and Qualifications (3 or more desired)- Minimum of 3 years of software development and demonstrated knowledge of professional software engineering best practices for the full software development life cycle, including coding standards, code reviews, source control, build and release processes, continuous deployment and test suite development and maintenance.- Problem solving skills with abilities in analysis, problem identification and resolution.- Experience with enterprise system components, architecture and deployments- Experience in deploying and running large scale online systems built on Cloud platforms such as Oracle Cloud, AWS, Azure, Google Cloud Platform and/or OpenStack- Experience with monitoring and alerting using technologies like Prometheus, Sensu, Nagios, Kafka, Wavefront, BigPanda, DataDog, and/or PagerDuty.- Experience with Oracle Linux, RedHat Linux, Ubuntu, Centos, CoreOS, and/or Amazon Linux.- Experience in designing and building automated tools and solutions, including programming and data model design skills- Hands-on with web protocols and Linux/Unix tools and architecture, from kernel to shell, file systems, and client-server protocols.- Excellent written and verbal technical communications with technical and non-technical peers, customers and at times executive leadership.- Proven success in contributing in a collaborative, team-oriented environment, with the ability to establish and nurture relationships at all levels.- BS in Computer Science or related field and 8+ years relevant experience.**Detailed Description and Job Requirements**As a member of the software engineering division, you will analyze and integrate external customer specifications. Specify, design and implement modest changes to existing software architecture. Build new products and development tools. Build and execute unit tests and unit test plans. Review integration and



  • Guadalajara, Jalisco, México NTT DATA North America A tiempo completo

    SRE – Site Reliability EngineerWe are currently seeking a Site Reliability Engineer to join our team in GDL, Jalisco (MX-JAL), Mexico (MX).Perform L1.5 activities such as monitoring, deployment, rollback. Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage. Troubleshoot Azure...


  • Guadalajara, Jalisco, México NTT DATA A tiempo completo

    SRE - Site Reliability EngineerWe are currently seeking a Site Reliability Engineer to join our team in GDL, Jalisco (MX-JAL), Mexico (MX). Perform L1.5 activities such as monitoring, deployment, rollback. Monitor the efficiency of the Azure cloud systems to prevent outages and initiate an Incident Management bridge in case of an outage. Troubleshoot Azure...


  • Guadalajara, México Oracle A tiempo completo

    Service Reliability Engineer-*******J**Applicants are required to read, write, and speak the following languages**: English**Preferred Qualifications**Oracle, the world leader in Enterprise Cloud, is hiring passionate technologists in the industry as we continue to add customer-centric, world-class, leading edge, secure, hyper-scale based solutions...

  • Site Reliability Engineer

    hace 2 semanas


    Guadalajara, México Oracle A tiempo completo

    Site Reliability Engineer- EH**Applicants are required to read, write, and speak the following languages**: English**Preferred Qualifications**The Hospitality Cloud SRE team is focused on maximizing service reliability for our hotel product service offerings across global Oracle data centres. Our team runs with a start-up like approach, leaving room for...


  • Guadalajara, Jalisco, México rctsglobal A tiempo completo

    Site Reliability Engineer (SRE)Overview We're looking for a passionate and hands-on Site Reliability Engineer (SRE) to join our team. This role is critical for ensuring the stability, performance, and scalability of our production services. You'll be the bridge between development and operations, with a strong focus on using code to manage infrastructure and...

  • Reliability Engineer

    hace 1 semana


    Guadalajara, México Flex A tiempo completo

    At Flex, we welcome people of all backgrounds. Our employees thrive here by living our values: we support each other as we strive to find a better way, we move fast with discipline and purpose, and we do the right thing always. Through a respectful, inclusive and collaborative culture, a career at Flex offers the opportunity to make a difference, invest in...


  • Guadalajara, México Oracle A tiempo completo

    Site Reliability Engineer-22000CGZ **Applicants are required to read, write, and speak the following languages***: English, Spanish **Preferred Qualifications** The Oracle Cloud Infrastructure (OCI) team can provide you the opportunity to build and operate a suite of massive scale, integrated cloud services in a broadly distributed, multi-tenant cloud...


  • Guadalajara, México Oracle A tiempo completo

    Site Reliability Engineer-22000CGY **Applicants are required to read, write, and speak the following languages***: English, Spanish **Preferred Qualifications** The Oracle Cloud Infrastructure (OCI) team can provide you the opportunity to build and operate a suite of massive scale, integrated cloud services in a broadly distributed, multi-tenant cloud...

  • Site Reliability Engineer

    hace 3 semanas


    Guadalajara, México f5 A tiempo completo

    Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.- Site Reliability Engineer IIIWhy do you want to join our team?- Everything we do centers around people. That means we obsess over how to...

  • Site Reliability Engineer

    hace 2 semanas


    Guadalajara, México F5 A tiempo completo

    Everything we do centers around people.That means we obsess over how to make the lives of our customers, and their customers, better.And it means we prioritize a diverse F5 community where each individual can thrive.- Site Reliability Engineer IIIWhy do you want to join our team?- Everything we do centers around people.That means we obsess over how to make...