Principal Site Reliability Developer
hace 1 semana
Be comfortable with mission-critical production issues and manage customer anxiety appropriately. We would like to see some combination of the following skills:
- 5+ years of software design or development experience or DevOps role with distributed, highly-scalable, maximum availability (HA, brownout), multi-node environments (partitioning, isolation with vlan, pkeys, qinq, vrf, evpn)
- Oncall
- Knowledge of server virtualization technologies: Xen, KVM Linux containers, docker including vnuma, domain groups, SR-IOV
- Knowledge of Linux kernel internals (memory management, scheduler, builds), TCP/IP Networking stack, Infiniband/ OFED Architecture (RDS, RoCE V2, OCFS2), Filesystems/volumes
- Familiar with x86 systems, network switches from either Cisco, Arista, Juniper, Mellanox, L3 top of switch routing (OSPF, BGP), Mellanox HCAs (CX3, CX5 and newer) programmer's guide
- Experience working with Cloud infrastructure APIs, REST API model, and developing REST APIs
- Demonstrate experience with Java, as well as strong experience with scripting languages such as Python, Bash.
- Strong troubleshooting and performance tuning skills, OPS or system administration
- Knowledge on any of the following areas is a plus:
- Understanding the latest features of Exadata / Engineered systems, Oracle Grid Infrastructure and Database is a plus
- Familiar with OpenStack and/or other Cloud infrastructure products is a plus
- Understanding and experience of Cloud Networking & Security (like Application Firewall, IPSec VPN, NAT, IPv6, websockets, TLS, certificates, and tunneling protocols) architectures
- Strong understanding of I/O characteristics and storage systems
- A background in multi-tenant service offering and concepts on Service Level Availability a strong plus
- PCI, HIPAA audits, UK gov, security vulnerabilities remediation
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
-
Principal Site Reliability Developer
hace 4 semanas
Zapopan, México Oracle A tiempo completoApplicants are required to read, write, and speak the following languages: English **Role**: Site Reliability Engineer **Location**: Guadalajara preferred **Who are we looking for?** **Roles and Responsibilities** - Perform DevOps activities to support customers, engineers, and processes through our release cycles as well as production - Participate in a...
-
Principal Site Reliability Developer
hace 2 semanas
Zapopan, Jalisco, México Oracle A tiempo completoBe comfortable with mission-critical production issues and manage customer anxiety appropriately.We would like to see some combination of the following skills: 5+ years of software design or development experience or DevOps role with distributed, highlyscalable, maximum availability (HA, brownout), multinode environments (partitioning, isolation with vlan,...
-
Principal Site Reliability Engineer
hace 2 semanas
Zapopan, Jalisco, México Oracle A tiempo completoResponsibilities Solve complex problems related to Linux infrastructure and Oracle Cloud Infrastructure Act as a partner concern point for critical issues that may not have a detailed procedure and provide Root Cause Analysis (RCA) Understand the end-to-end configuration, technical dependencies, characteristics of production infrastructure and...
-
Principal Site Reliability Engineer
hace 2 meses
Zapopan, México Oracle A tiempo completo**Responsibilities** - Solve complex problems related to Linux infrastructure and Oracle Cloud Infrastructure - Act as a partner concern point for critical issues that may not have a detailed procedure and provide Root Cause Analysis (RCA) - Understand the end-to-end configuration, technical dependencies, characteristics of production infrastructure and...
-
Principal Site Reliability Engineer
hace 4 semanas
Zapopan, México Oracle A tiempo completo**Responsibilities** - Solve complex problems related to Linux infrastructure and Oracle Cloud Infrastructure - Act as escalation point for critical issues that may not have a documented procedure and provide Root Cause Analysis (RCA) - Understand the end-to-end configuration, technical dependencies, characteristics of production infrastructure and...
-
Principal Site Reliability Engineer
hace 2 semanas
Zapopan, Jalisco, México Oracle A tiempo completoResponsibilities Solve complex problems related to Linux infrastructure and Oracle Cloud Infrastructure Act as a partner concern point for critical issues that may not have a detailed procedure and provide Root Cause Analysis (RCA) Understand the endtoend configuration, technical dependencies, characteristics of production infrastructure and services Quickly...
-
Site Reliability Developer
hace 2 semanas
Zapopan, México Oracle A tiempo completo**Job Description**: Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the critical stack, with focus...
-
Principal Site Reliability Engineer
hace 1 mes
Zapopan, Jalisco, México Oracle A tiempo completoResponsibilitiesJob DescriptionSolve complex problems related to Linux infrastructure and Oracle Cloud Infrastructure Act as a partner concern point for critical issues that may not have a detailed procedure and provide Root Cause Analysis (RCA)Understand the end-to-end configuration, technical dependencies, characteristics of production infrastructure and...
-
Principal Site Reliability Engineer
hace 2 semanas
Zapopan, Jalisco, México myGwork - LGBTQ+ Business Community A tiempo completoThis inclusive employer is a member of myGwork – the largest global platform for the LGBTQ+ business community. ResponsibilitiesSolve complex problems related to Linux infrastructure and Oracle Cloud Infrastructure Act as a partner concern point for critical issues that may not have a detailed procedure and provide Root Cause Analysis (RCA)Understand the...
-
Principal Site Reliability Engineer
hace 1 mes
Zapopan, Jalisco, México myGwork - LGBTQ+ Business Community A tiempo completoThis inclusive employer is a member of myGwork – the largest global platform for the LGBTQ+ business community. ResponsibilitiesSolve complex problems related to Linux infrastructure and Oracle Cloud Infrastructure Act as a partner concern point for critical issues that may not have a detailed procedure and provide Root Cause Analysis (RCA)Understand the...
-
Senior Site Reliability Developer
hace 2 meses
Zapopan, México Oracle A tiempo completoWork with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security,...
-
Site Reliability Developer
hace 2 semanas
Zapopan, Jalisco, México Oracle A tiempo completoJob Description:Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the critical stack, with focus on...
-
Senior Site Reliability Developer
hace 2 semanas
Zapopan, Jalisco, México Ll Oefentherapie A tiempo completoAre you interested in the exciting challenges of building and operating large-scale distributed infrastructure for the cloud? Oracle's Cloud Infrastructure is building its next generation of cloud technologies that operate in a broadly distributed, highly available, highly scalable, multi-tenant environment. Our mission is to provide our customers with an...
-
Principal Site Reliability Engineer
hace 1 mes
Zapopan, Jalisco, México Oracle A tiempo completoJob DescriptionmdclpJoinOCIMXWe are looking to recruit a Site Reliability Engineer to the established Oracle Cloud Infrastructure (OCI) Enterprise Engineering team. The successful candidate will be located in Mexico and will mainly be responsible for defining and deploying key services with deep focus on architecture, production operations, performance...
-
Site Reliability Developer
hace 2 semanas
Zapopan, México Oracle A tiempo completoSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate...
-
Site Reliability Engineer
hace 4 semanas
Zapopan, México GrainChain Inc A tiempo completo¡Te estamos buscando, únete a GrainChain! Estamos en búsqueda de un Site Reliability Engineer capaz de integrar y automatizar las áreas de desarrollo y operaciones, asegurando la calidad y la entrega de soluciones de software. Somos una empresa de tecnología que ayuda a la industria agrícola a cerrar la brecha digital, con diferentes plataformas que...
-
Site Reliability Developer
hace 2 semanas
Zapopan, Jalisco, México Oracle A tiempo completoSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate...
-
Senior Site Reliability Developer
hace 1 mes
Zapopan, Jalisco, México Oracle A tiempo completoJob DescriptionAre you interested in the exciting challenges of building and operating large-scale distributed infrastructure for the cloud? Oracle's Cloud Infrastructure is building its next generation of cloud technologies that operate in a broadly distributed, highly available, highly scalable, multi-tenant environment. Our mission is to provide our...
-
Senior Site Reliability Developer
hace 2 semanas
Zapopan, Jalisco, México myGwork - LGBTQ+ Business Community A tiempo completoThis inclusive employer is a member of myGwork – the largest global platform for the LGBTQ+ business community. Are you interested in the exciting challenges of building and operating large-scale distributed infrastructure for the cloud? Oracle's Cloud Infrastructure is building its next generation of cloud technologies that operate in a broadly...
-
Senior Site Reliability Developer
hace 1 mes
Zapopan, Jalisco, México myGwork - LGBTQ+ Business Community A tiempo completoThis inclusive employer is a member of myGwork – the largest global platform for the LGBTQ+ business community. Are you interested in the exciting challenges of building and operating large-scale distributed infrastructure for the cloud? Oracle's Cloud Infrastructure is building its next generation of cloud technologies that operate in a broadly...