Senior Reliability Engineer

hace 3 días


Guadalajara, México Oracle A tiempo completo

Senior Reliability Engineer-22000DX9

**Applicants are required to read, write, and speak the following languages***: English

**Preferred Qualifications**

Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.

Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the effect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.

A BS or MS in Computer Science, or equivalent. Identifies solutions to knowledge of server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security and compliance. Experience running large scale customer facing web services. Identifies solutions to understanding of load balancing technologies and experience with development in programming languages, databases and big data stores, and container technologies. Work involves defining and documenting technical architecture of complex and highly scalable products. A minimum of 5+ years experience of running large scale customer facing web services.

Be comfortable with mission critical production issues and manage customer anxiety appropriately. We would like to see some combination of the following skills:

- 5+ years of software design or development experience or devops role with distributed, highly-scalable, maximum availability (HA, brownout), multi-node environments (partitioning, isolation with vlan, pkeys, qinq, vrf, evpn)
- Oncall
- Knowledge of server virtualization technologies: Xen, KVM Linux containers, docker including vnuma, domain groups, SR-IOV
- Knowledge of Linux kernel internals (memory management, scheduler, builds), TCP/IP Networking stack, Infiniband/ OFED Architecture (RDS, RoCE V2, OCFS2), Filesystems/volumes
- Familiar with x86 systems, network switches from either Cisco, Arista, Juniper, Mellanox, L3 top of switch routing (OSPF, BGP), Mellanox HCAs (CX3, CX5 and newer) programmer's guide
- Experience working with Cloud infrastructure APIs, REST API model, and developing REST APIs
- Demonstrate experience with Java, as well as strong experience with scripting languages such as Python, Bash.
- Strong troubleshooting and performance tuning skills, OPS or system administration

Knowledge on any of the following areas is a plus:

- Understand latest features of Exadata / Engineered systems, Oracle Grid Infrastructure and Database is a plus
- Familiar with Openstack and/or other Cloud infrastructure products is a plus
- Understanding and experience of Cloud Networking & Security (like Application Firewall, IPSec VPN, NAT, IPv6, websockets, TLS, certificates, tunneling protocols) architectures
- Strong understanding of I/O characteristics and storage systems
- A background in multi-tenant service offering and concepts on Service Level Availability a strong plus
- PCI, HIPAA audits, UK gov, security vulnerabilities remediation

**Detailed Description and Job Requirements**

Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performa



  • Guadalajara, México AstraZeneca A tiempo completo

    **Senior Network Reliability Engineer** **About the AstraZeneca** AstraZeneca is a global, innovation-driven biopharmaceutical business that focuses on the discovery, development, and commercialization of prescription medicines for some of the world's most serious diseases. But we're more than one of the world's leading pharmaceutical companies. At...


  • Guadalajara, México AstraZeneca A tiempo completo

    **WHY JOIN US** We’re a network of entrepreneurial self-starters who contribute to something far bigger. There’s a diversity of expertise in our Technology group that’s unique to AstraZeneca - it allows us to dive deep into exploring new leading-edge technology. A place to be open and transparent - we speak up, think creatively, and share ideas. Our...


  • Guadalajara, Jalisco, México Tech Holding A tiempo completo

    About UsTech Holding is a full-service consulting firm that delivers predictable outcomes and high-quality solutions to clients. Our team has industry experience and holds senior positions in various companies, including emerging startups and large Fortune 50 firms.Our unique approach is supported by the principles of deep expertise, integrity, transparency,...


  • Guadalajara, México f5 A tiempo completo

    Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive. Business/Job Title: Senior Site Reliability Engineer Position Summary Software engineering is a core discipline at F5 for many roles. As a...


  • Guadalajara, México C3 AI A tiempo completo

    We are looking for a Senior Site Reliability Engineer to join our team in Guadalajara. **Responsibilities**: - Maximize system uptime and availability, ensuring functional and performance SLAs. - Establish end-to-end monitoring and alerting on all critical aspects. - Solve complex problems for critical services and build automation to prevent problem...


  • Guadalajara, México Nextiva A tiempo completo

    At Nextiva, we create connected communication tools that help businesses stay in touch with their customers and teams. Over 100,000 companies rely on Nextiva for phone service and customer management tools. We're not your parent's phone company. Founded in 2008, Nextiva took on the trillion-dollar telecom industry and succeeded in changing the game by...


  • Guadalajara, México Tech Holding A tiempo completo

    **About us**: Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm that was founded on the premise of delivering predictable outcomes and high-quality solutions to our clients. Our founders and team members have industry experience and have held senior positions in a wide...


  • Guadalajara, México Oracle A tiempo completo

    Senior Reliability Engineer-22000E28 **Applicants are required to read, write, and speak the following languages***: English **Preferred Qualifications** The Database cloud service team can provide you the opportunity to build and operate a suite of massive scale, integrated cloud services in a broadly distributed, multi-tenant cloud environment. Oracle...


  • Guadalajara, México f5 A tiempo completo

    Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive. - Site Reliability Engineer III Why do you want to join our team? - Everything we do centers around people. That means we obsess over how to...


  • Guadalajara, México Finastra USA Corporation A tiempo completo

    **Responsibilities**: **What will you contribute?** As a Site Reliability Engineer your mission is to protect and advance the software & systems behind Finastra’s Cloud hosted services running on Fusion Operate. Finastra believes in a blameless culture where the primary objective is continuous improvement. You’ll be treating operations as a software...


  • Guadalajara, México AstraZeneca A tiempo completo

    **Positions are open to Mexican Citizens and official residents of Mexico.** **Location: Guadalajara (hybrid)** **Strong English interpersonal skills required**: **About the AstraZeneca**: **AstraZeneca is a global, science-led, patient-focused pharmaceutical company that focuses on the discovery, development, and commercialization of prescription medicines...


  • Guadalajara, Jalisco, México Capgemini Engineering A tiempo completo

    About the RoleWe are seeking a seasoned Senior Network Reliability Engineer to join our Capgemini Engineering team. In this role, you will be responsible for designing, implementing, and maintaining highly available and scalable network infrastructure for our production and development environments.Key ResponsibilitiesDesign and implement large-scale network...

  • Site Reliability Engineer

    hace 3 semanas


    Guadalajara, México Valce Talent Solutions A tiempo completo

    We are looking for a Lead Site Reliability Engineer who takes the initiative on developing and maintain the system and services for our Cash Management Platform, automating the deployment process, ensuring system scaling, investigating and resolving outdates, identifying and implementing preventive measures proactively, collaborating with key stakeholders,...


  • Guadalajara, México f5 A tiempo completo

    Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive. Business/Job Title: Site Reliability Engineer - IAM - III Position Summary: Software engineering is a core discipline at F5 for many...


  • Guadalajara, Jalisco, México Cognizant A tiempo completo

    Senior .NET Software Engineer - MexicoWe are seeking a highly skilled and experienced individual to join our team as a Senior .NET Software Engineer in Mexico City, Guadalajara, and Monterrey.Overview:Cognizant is a leading global company that provides consulting and IT services. As a Senior .NET Software Engineer, you will play a crucial role in supporting...

  • Senior Ios Engineer

    hace 3 días


    Guadalajara, México Brillio A tiempo completo

    **Senior iOS Engineer**: **About Brillio**: **Senior iOS Engineer** **Primary Skills**: - iOS Native **Secondary Skills**: - Jenkins, Objective C, Swift **Specialization**: - Mobile - iOS: Senior Engineer, XT **Job requirements**: - Experience building software with Redux or other unidirectional state management paradigms - Experience writing with...


  • Guadalajara, México f5 A tiempo completo

    Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive. But our success isn’t driven solely by what we do. We also care deeply about how we do it. At F5, our culture is how we live, every single...


  • Guadalajara, México Finastra A tiempo completo

    Your deliverables as a Site Reliability Engineer will include, but are not limited to, the following: - Work with containers and container orchestration systems such as Kubernetes - Capacity Planning to determine resource requirements of your service for it to be scalable, efficient, and reliable - Collaborate with other engineers to implement operational...


  • Guadalajara, Jalisco, México Broadridge A tiempo completo

    Broadridge fosters a culture where innovation meets reliability, empowering associates to drive scalable solutions.**Job Overview**We are seeking an experienced Infrastructure Engineer - Site Reliability to join our team. As a key member of our SRE group, you will be responsible for designing and implementing scalable and highly reliable software...


  • Guadalajara, México Oracle A tiempo completo

    Site Reliability Engineer-2200059K **Applicants are required to read, write, and speak the following languages**: English, Spanish **Preferred Qualifications** Are you a seasoned Site Reliability Engineer or Cloud DevOps guru? Are you a backup, restore and recovery expert? If you are, we are looking for you to join our exciting growing Cloud DevOps...