Site Reliability Engineer
hace 4 semanas
**Project description**:Do you like to work with existing and new software product development teams? This position is to instrument end-to-end observability and visibility for business-critical systems with log ingestion, metrics, and traces. You will function as a site reliability engineer (SRE) that will collaborate with product teams, infrastructure SMEs, DevOps engineers, and the proactive monitoring team to provide unique dashboards of germane service level analytics for various product stakeholders.**Responsibilities**:Work closely with software product development teams (ITSO, Product Owner, SME) to implement monitoring & observability instrumentation within their platforms.Drive adoption of best practices in monitoring, alerting, automation, and site reliability.Lead/contribute to engineering efforts from design to implementation focusing on instrumentation of logs, metrics, and traces.Drive use of automation in software instrumentation as well as in response to service degradation events.Identify and execute on opportunities to implement instrumentation in pre-production environments.Proactively pursue continuous improvement and expansion in observability coverage, service reliability best practices, incident management, and problem management.**Skills**:Must haveProduction support experience as developer for e-commerce platformStrong knowledge and experience in JavaSRE experienceScripting experience5+ years of experience with administrating Linux and at least 2 years in supporting production environments;Experience with designing large-scale distributed solutions accompanied with it's capacity planning;Deep understanding of TCP/IP networking;Familiar with SLA, SLO, and SLI terms;Experience with monitoring and alerting tools like Grafana, Datadog, Prometheus etc;Strong knowledge of virtualization and containerization principles including orchestration tools;Familiar with CaC and IaC tools (Ansible, Salt, Terraform, Packer);Familiar with CI/CD tools (Jenkins, Azure DevOps);Experience with relational and NoSQL DBMSA clear understanding of Agile and DevOps culture and what kind of problem they intended to solve;Strong written and verbal communication skills;Understanding of information security principles;Understanding of popular deployment strategies (Feature flags, Blue/Green, Canary, Dark launch, etc);"Critical thinker" and "problem solver"Nice to haveExperience working with AzurePrevious experience of working in SRE teams;**Other**:LanguagesEnglish: B2 Upper IntermediateSenioritySeniorRemote Mexico, MexicoReq. VR- Technical Support (SL2)Cross Industry Solutions27/01/2025Req. VR-
-
Site Reliability Engineer
hace 2 semanas
Desde casa, México thegetch mexico A tiempo completo**Función: Site Reliability Engineer** **Aperturas: más de 10 contrataciones** **Ubicación: - any city with TCS Office presence (Queretaro, Guadalajara, Mexico City or Monterrey)** **Salario: - 25-33 USD/hr** **Comunicación en inglés: avanzado** **Experiência: 4+ años** **Responsabilidades de Site Reliability Engineer**: Reúna y analice métricas...
-
Site Reliability Engineer
hace 4 días
Desde casa, México Right Balance A tiempo completo**Overview** We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA. **Engagement Details** Our client is a USA-based company producing video solutions with the mission to advance scientific...
-
Senior Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are seeking an experienced **Senior Site Reliability Engineer**to join our team.As a key member of the Reliability Tooling team, you will be responsible for writing and reviewing code, contributing to critical technical decisions, and mentoring engineers within your squad. This role requires a deep understanding of SRE principles and best practices, as...
-
Lead Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are looking for an experienced **Lead Site Reliability Engineer**to join our team.In this role, you will play a pivotal part in the Reliability Tooling team, taking responsibility for writing and reviewing code, making key technical decisions, and mentoring engineers within your squad. This position requires a strong grasp of SRE principles and best...
-
Site Reliability Engineer
hace 2 semanas
Desde casa, México Synechron A tiempo completoSynechron is a self-funded, leading digital transformation Consulting firm focused on the financial services industry working to accelerate digital initiatives for Banks, Asset Managers and Insurance. We achieve this by providing our clients with innovative solutions that solve their most complex business challenges and combining Synechron’s unique,...
-
Site Reliability Engineer
hace 1 semana
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are looking for a skilled **Site Reliability Engineer**to join our team. This position will focus on supporting the LatAm timezone, working closely with a team of SREs and a hands-on Lead SRE, while collaborating with a European-based SRE team. The role ensures seamless follow-the-sun 24/7 on-call support for a customer platform comprising multiple Java...
-
Site Reliability Engineer Iii
hace 6 días
Desde casa, México Cabify A tiempo completoDo you want to change the world? At Cabify, that’s what we’re doing. We aim to make cities better places to live by improving mobility for the people living in them, connecting riders to drivers, providing mobility alternatives such as scooters and mopeds and many others to come, all at the touch of a button. Maybe one day cities will be places where...
-
Site Reliability Engineer
hace 6 días
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Site Reliability Engineer,** where you will focus on cloud infrastructure, containerization, and monitoring using Kubernetes and Microsoft Azure.**Responsibilities**- Deploy and maintain Kubernetes resource manifests in clusters such as Kind, GKE, or AKS- Troubleshoot and analyze logs to identify and resolve system events and issues-...
-
Senior Site Reliability Engineer
hace 1 semana
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are seeking an experienced **Senior Site Reliability Engineer**to join our team. This role will cover the LatAm timezone, working collaboratively with a team of SREs and a hands-on Lead SRE, while also coordinating with a European-based SRE team. The position ensures follow-the-sun 24/7 on-call support for a customer platform that includes multiple Java...
-
Senior Site Reliability Engineer
hace 4 días
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Senior Site Reliability Engineer**, where you will maintain and improve our product monitoring system, manage incident responses, and facilitate collaboration between operations and development teams. **Responsibilities** - Maintain and improve the product monitoring system - Manage incident response including troubleshooting,...