Sr. SRE
hace 2 semanas
The Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of production systems. The role focuses on monitoring, alerting, and dashboard creation with a strong emphasis on SRE tools like Grafana, Prometheus, and Datadog. The ideal candidate should have hands-on experience with Python scripting and be able to collaborate effectively with cross-functional teams to address service issues and improve system reliability.
Requirements- +4 years of experience in similar roles
- Fluent English
- Experience with creating and modifying Grafana dashboards for system monitoring.
- Knowledge of Prometheus for setting up and maintaining monitoring systems.
- Experience with Datadog for user and system monitoring.
- Hands-on experience with Python scripting for automation and other tasks.
- Understanding of SRE practices, including monitoring, alerting, and incident response.
- Ability to create and enhance runbooks for incident response and remediation.
- Experience with DevOps practices, such as CI/CD and infrastructure automation, is a secondary desired skill set.
- Strong communication skills to collaborate with cross-functional teams and stakeholders.
- Ability to proactively identify and address service issues.
- Familiarity with ITIL process experience, including Service Management, Knowledge Management, and Incident Management.
- Experience with user and system monitoring, remediation, and implementation to maintain service stability.
- Create and modify Grafana dashboards to monitor system performance and user experience.
- Set up and maintain monitoring and alerting systems using Prometheus and Datadog.
- Collaborate with cross-functional teams to improve service reliability and respond to incidents.
- Develop and enhance runbooks for incident response and remediation.
- Proactively work with alerting to ensure timely detection of issues and minimize downtime.
- Implement monitoring, remediation, and other operational practices to maintain high service levels.
-
Senior Site Reliability Engineer
hace 2 semanas
Ciudad de México SimCorp A tiempo completoSenior Site Reliability Engineer (SRE/Azure) page is loaded Senior Site Reliability Engineer (SRE/Azure) Apply locations Manila posted on Posted 30+ Days Ago job requisition id R-206253 Senior Site Reliability Engineer (SRE/Azure) Who we are: For over 50 years, we have worked closely with investment and asset managers to become the world’s leading...
-
Sr. DevOps Engineer
hace 1 semana
Ciudad de México Digital@FEMSA Careers A tiempo completoDigital@FEMSA es la división de innovación tecnológica que ofrece soluciones digitales para simplificar la vida de nuestros clientes. Está integrada por negocios que aprovechan la tecnología para generar herramientas prácticas y confiables, como el medio de pago Spin by OXXO, así como por un equipo diverso y multidisciplinario centrado en desarrollar...
-
Sr. Site Reliability Engineer
hace 2 semanas
Ciudad de México SimCorp A tiempo completoSr. Site Reliability Engineer (Azure) page is loaded Sr. Site Reliability Engineer (Azure) Apply locations Manila time type Full time posted on Posted 30+ Days Ago job requisition id R-206416 Who we are: For over 50 years, we have worked closely with investment and asset managers to become the world’s leading provider of integrated investment...