Sr. SRE

hace 2 semanas

México NTD Software A tiempo completo

The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of production systems. The role focuses on monitoring, alerting, and dashboard creation with a strong emphasis on SRE tools like Grafana, Prometheus, and Datadog. The ideal candidate should have hands-on experience with Python scripting and be able to collaborate effectively with cross-functional teams to address service issues and improve system reliability.

Requirements

+4 years of experience in similar roles
Fluent English
Experience with creating and modifying Grafana dashboards for system monitoring.
Knowledge of Prometheus for setting up and maintaining monitoring systems.
Experience with Datadog for user and system monitoring.
Hands-on experience with Python scripting for automation and other tasks.
Understanding of SRE practices, including monitoring, alerting, and incident response.
Ability to create and enhance runbooks for incident response and remediation.
Experience with DevOps practices, such as CI/CD and infrastructure automation, is a secondary desired skill set.
Strong communication skills to collaborate with cross-functional teams and stakeholders.
Ability to proactively identify and address service issues.
Familiarity with ITIL process experience, including Service Management, Knowledge Management, and Incident Management.
Experience with user and system monitoring, remediation, and implementation to maintain service stability.

Responsibilities

Create and modify Grafana dashboards to monitor system performance and user experience.
Set up and maintain monitoring and alerting systems using Prometheus and Datadog.
Collaborate with cross-functional teams to improve service reliability and respond to incidents.
Develop and enhance runbooks for incident response and remediation.
Proactively work with alerting to ensure timely detection of issues and minimize downtime.
Implement monitoring, remediation, and other operational practices to maintain high service levels.

#J-18808-Ljbffr

Senior Site Reliability Engineer

hace 2 semanas

Ciudad de México SimCorp A tiempo completo

Senior Site Reliability Engineer (SRE/Azure) page is loaded Senior Site Reliability Engineer (SRE/Azure) Apply locations Manila posted on Posted 30+ Days Ago job requisition id R-206253 Senior Site Reliability Engineer (SRE/Azure) Who we are: For over 50 years, we have worked closely with investment and asset managers to become the world’s leading...
Sr. DevOps Engineer

hace 1 semana

Ciudad de México Digital@FEMSA Careers A tiempo completo

Digital@FEMSA es la división de innovación tecnológica que ofrece soluciones digitales para simplificar la vida de nuestros clientes. Está integrada por negocios que aprovechan la tecnología para generar herramientas prácticas y confiables, como el medio de pago Spin by OXXO, así como por un equipo diverso y multidisciplinario centrado en desarrollar...
Sr. Site Reliability Engineer

hace 2 semanas

Ciudad de México SimCorp A tiempo completo

Sr. Site Reliability Engineer (Azure) page is loaded Sr. Site Reliability Engineer (Azure) Apply locations Manila time type Full time posted on Posted 30+ Days Ago job requisition id R-206416 Who we are: For over 50 years, we have worked closely with investment and asset managers to become the world’s leading provider of integrated investment...

Américas

Europa

Asia / Oceanía

África

Sr. SRE

Senior Site Reliability Engineer

Sr. DevOps Engineer

Sr. Site Reliability Engineer