Principal Sre

hace 3 semanas


Guadalajara, México Oracle A tiempo completo

Oracle’s Cloud Infrastructure team is supporting and building Block Storage Service, it involves Support, Operation, Deployment at scale in a broadly distributed multi-tenant cloud environment, closely working with various engineering teams. Our customers run their businesses on our cloud, and our mission is to provide them with best-in-class Block storage capabilities in conjunction with other compute, storage, networking, database, security offerings.

We’re looking for hands-on engineers with a passion for solving problems in distributed systems, virtualized infrastructure, and highly available services. Joining Oracle will give you the opportunity to learn and help build innovative new systems from the ground up and operate services at scale. Engineers at every level can have significant technical and business impact while delivering critical enterprise level features during multiple parallel deployments.

As a member of the software reliability engineering, you will take an active role in the support and operation of Block Storage service.

As **Principal SRE in the Block Storage **team you will be required to:

- **Lead/mentor/guide team **and drive projects end to end
- **Monitor **our service and proactively debug operational issues.
- Work with internal and external teams to diagnose **performance issues **.
- **Support Automation **and maintain build and test systems including systems for performance and scalability testing.
- Improve efficiency of the **deployment **processes across a **fast-growing number of regions **through automation and scale improvements to tools and dashboards.
- Participate in our **on-call rotation **and resolve complex distributed issues through debugging, communication and collaboration across multiple SRE teams across OCI.
- Improve our operational capabilities by developing **runbooks, alarming, and building tools **and documentation that enable customers to self-diagnose problems.
- **Deploy our service in new regions **and help to automate this process

**Basic Qualifications**:

- 10+ years of **SRE/Devops/Automation experience in a Linux based environment**:

- Familiarity with Storage Technologies - **iSCSI, NVME, SAN/NAS, Block Storage etc**:

- 2+ years of experience with Linux shell scripting, and **Python**:

- Proficient with Linux based build and analysis tools (e.g. make, scons/cons, bazel)
- Familiarity with **CICD **environments
- Familiarity with Agile Development
- Proficient with commonly used networking protocols such as TCP/IP, HTTP
- Familiarity with docker containers
- Familiarity with databases, NoSQL systems, **storage and distributed persistence technologies.**:

- **Troubleshooting and performance tuning skills **.
- Bachelors in computer science and Engineering or related engineering fields