FBS Site Reliability Engineer

hace 1 semana

Ciudad de México, Ciudad de México Capgemini A tiempo completo

Our Client is one of the United States' largest insurers, providing a wide range of insurance and financial services products with gross written premiums well over US$25 Billion (P&C). They proudly serve more than 10 million U.S. households with more than 19 million individual policies across all 50 states through the efforts of over 48,000 exclusive and independent agents and nearly 18,500 employees. Finally, our Client is part of one the largest Insurance Groups in the world.

Job Summary

This position will focus on infrastructure & code reviews to ensure solutions built and delivered are Highly Available and to minimize unplanned downtime.

Key Responsibilities

•Expert troubleshooter within IT who has broad technical experience in multiple disciplines of IT and is willing to help our Incident and Problem Management teams

•Understand root cause and the necessary tasks needed to ensure this incident does not recur.

•Validate root cause of incidents in nonproduction regions, ensuring that the cause is validated and then work with teams to determine the best approach to resolve.

•Participate in chaos testing - where we leverage a third-party tool to disable functions on a server and we verify that we can alert teams to the failure and then assemble a technical troubleshooting call to identify and restore the service.

•Leverage Observability tools set to define key transactions and observe their performance within systems

•Create golden signal reporting and error budgets for development teams. Must know the framework

•Perform failure analysis, leveraging chaos testing practices to break nonproduction systems to find weak points and work with infrastructure and development teams to improve the applications resilience.

Requirements

•At least 6 years of experience in a similar role as a Reliability Engineer or Resilience Engineer

•Full English Fluency

•BS in Computer Science or similar

•Very strong experience using Code (writing, testing leveraging observability process) Ideally JAVA, C++

•Hands on approach, troubleshooting, very technical background.

Technical & Business Skills

Site Reliability Engineer - Advanced
Trend & Pattern Analysis – Advanced, Optimization,
Resilience Engineering – Advanced
Golden Signal Cyber Reliability (MUST)
Dynatrace - Intermediate (4-6 Years) Desirable, not a must, any other Observabilty tool
Gremlin - Entry Level (1-3 Years) Chaos testing, Failure modeling experience or similiar (Very Desirable)
Cloud Infrastructure, Experience: AWS / Azure / GCP - Intermediate (4-6 Years)
Strong Coding experience

Benefits

This position comes with competitive compensation and benefits package:

Competitive salary and performance-based bonuses
Comprehensive benefits package
Career development and training opportunities
Flexible work arrangements (remote and/or office-based)
Dynamic and inclusive work culture within a globally renowned group
Private Health Insurance
Pension Plan
Paid Time Off
Training & Development

About Capgemini

Capgemini is a global leader in partnering with companies to transform and manage their business by harnessing the power of technology. The Group is guided everyday by its purpose of unleashing human energy through technology for an inclusive and sustainable future. It is a responsible and diverse organization of over 340,000 team members in more than 50 countries. With its strong 55-year heritage and deep industry expertise, Capgemini is trusted by its clients to address the entire breadth of their business needs, from strategy and design to operations, fueled by the fast evolving and innovative world of cloud, data, AI, connectivity, software, digital engineering and platforms. The Group €22.5 billion in revenues in 2023.

Site Reliability Engineer

hace 7 días

Ciudad de México, Ciudad de México Azkait A tiempo completo

AZKAITes una empresa mexicana que busca y conecta el mejor talento IT con empresas Latinoamericanas y de Estados Unidos.Estamos en la búsqueda de tu talento comoSite Reliability Engineer (SRE)Requisitos:Licenciatura o Ingeniería en Sistemas, Informática o afín.+5 años de experiencia en roles de SRE, DevOps o Ingeniería de Software.Experiencia...
Site Reliability Engineer

hace 2 semanas

Ciudad de México, Ciudad de México Sur A tiempo completo

As the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform. You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...
Site Reliability Engineer

hace 2 semanas

Ciudad de México, Ciudad de México Sur A tiempo completo

As the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform. You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...
Site Reliability Engineer

hace 2 semanas

Ciudad de México, Ciudad de México Sur A tiempo completo

As the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform.You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...
Lead Site Reliability Engineer

hace 6 días

Ciudad de México, Ciudad de México Pathlock A tiempo completo

About Pathlock:Pathlock is a leader in application security, access governance, and compliance automation. Our cloud-based solutions help organizations secure critical applications, mitigate risk, and enforce policies across a diverse IT landscape.Job Summary:Join Pathlock, a fast-growing leader in Governance, Access and Compliance, where you'll help shape...
Site Reliability Engineer

hace 2 semanas

Ciudad de México, Ciudad de México Tech Mahindra A tiempo completo

We're Hiring We are seeking a talented Site Reliability Engineer (SRE) CDMX with robust experience in Azure environments, Kubernetes, and DevOps practices.Your mission will be to ensure the reliability, scalability, and automation of our critical platforms. If you thrive on solving complex challenges, automating processes, and ensuring seamless operations,...
Senior Site Reliability Engineer

hace 3 días

Ciudad de México, Ciudad de México Thomson Reuters México A tiempo completo

Are you passionate about the chance to bring your experience to a world-class company that is market-leading or both content and technology? If yes, we're looking for you.Join our team Senior Site Reliability Engineer (SRE) will be implement Site Reliability Engineering and DevOps best practices. Feed non-functional requirements into the product backlog,...
Linux Site Reliability Engineer

hace 6 días

Ciudad de México, Ciudad de México AXA Group Operations A tiempo completo

Main missionsBeing part of our global team as a Linux Engineer and become a key member of the SRO Squad (Site Reliability Operations), collaborating with a diverse group of experts to ensure robust and secure Linux (RHEL) infrastructure worldwide.Engineer (Build) and test solutions, document accordingly and handover to operations team. Provide 3rd level...
Site Reliability Engineer

hace 1 semana

Santiago de Querétaro, Querétaro de Arteaga, México RELEX Solutions A tiempo completo

Technical Service Consultant/Site Reliability EngineerBased at: RELEX office in MexicoEmployment type: Permanent, full-timeTravel: Some ad hoc travel to client sites and the Atlanta office may be requiredThe RELEX team in the Americas is growing, and we're now looking for a Technical Consultant/Site Reliability Engineer. You'll join our global continuous...
Linux Site Reliability Engineer

hace 19 horas

Ciudad de México, Ciudad de México AXA A tiempo completo

About AXAAs a world-leading insurance company, we act for human progress by protecting what matters. With 153,000 employees in 54 countries working for 105 million customers, we've created a truly dynamic and vibrant community. Inclusion and diversity link closely with our values, and together we're nurturing a culture of respect, for each other, for our...

Américas

Europa

Asia / Oceanía

África

FBS Site Reliability Engineer