Site Reliability Engineer
hace 14 horas
**Project description**: Do you like to work with existing and new software product development teams? This position is to instrument end-to-end observability and visibility for business-critical systems with log ingestion, metrics, and traces. You will function as a site reliability engineer (SRE) that will collaborate with product teams, infrastructure SMEs, DevOps engineers, and the proactive monitoring team to provide unique dashboards of germane service level analytics for various product stakeholders. **Responsibilities**: Work closely with software product development teams (ITSO, Product Owner, SME) to implement monitoring & observability instrumentation within their platforms. Drive adoption of best practices in monitoring, alerting, automation, and site reliability. Lead/contribute to engineering efforts from design to implementation focusing on instrumentation of logs, metrics, and traces. Drive use of automation in software instrumentation as well as in response to service degradation events. Identify and execute on opportunities to implement instrumentation in pre-production environments. Proactively pursue continuous improvement and expansion in observability coverage, service reliability best practices, incident management, and problem management. **Skills**: Must have Production support experience as developer for e-commerce platform Strong knowledge and experience in Java SRE experience Scripting experience 5+ years of experience with administrating Linux and at least 2 years in supporting production environments; Experience with designing large-scale distributed solutions accompanied with it's capacity planning; Deep understanding of TCP/IP networking; Familiar with SLA, SLO, and SLI terms; Experience with monitoring and alerting tools like Grafana, Datadog, Prometheus etc; Strong knowledge of virtualization and containerization principles including orchestration tools; Familiar with CaC and IaC tools (Ansible, Salt, Terraform, Packer); Familiar with CI/CD tools (Jenkins, Azure DevOps); Experience with relational and NoSQL DBMS A clear understanding of Agile and DevOps culture and what kind of problem they intended to solve; Strong written and verbal communication skills; Understanding of information security principles; Understanding of popular deployment strategies (Feature flags, Blue/Green, Canary, Dark launch, etc); "Critical thinker" and "problem solver" Nice to have Experience working with Azure Previous experience of working in SRE teams; **Other**: Languages English: B2 Upper Intermediate Seniority Senior Remote Mexico, Mexico Req. VR-111205 Technical Support (SL2) Cross Industry Solutions 27/01/2025 Req. VR-111205
-
Site Reliability Engineer
hace 1 semana
Desde casa, México thegetch mexico A tiempo completo**Función: Site Reliability Engineer****Aperturas: más de 10 contrataciones****Ubicación: - any city with TCS Office presence (Queretaro, Guadalajara, Mexico City or Monterrey)****Salario:- 25-33 USD/hr****Comunicación en inglés: avanzado****Experiência: 4+ años****Responsabilidades de Site Reliability Engineer**:Reúna y analice métricas de sistemas...
-
Site Reliability Engineer
hace 3 días
Desde casa, México Right Balance A tiempo completo**Overview** We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA. **Engagement Details** Our client is a USA-based company producing video solutions with the mission to advance scientific...
-
Site Reliability Engineer
hace 2 días
Desde casa, México Right Balance A tiempo completo**Overview**We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA.**Engagement Details**Our client is a USA-based company producing video solutions with the mission to advance scientific...
-
Senior Site Reliability Engineer
hace 1 semana
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are seeking an experienced **Senior Site Reliability Engineer**to join our team. As a key member of the Reliability Tooling team, you will be responsible for writing and reviewing code, contributing to critical technical decisions, and mentoring engineers within your squad. This role requires a deep understanding of SRE principles and best practices, as...
-
Lead Site Reliability Engineer
hace 2 semanas
Desde casa, México Tekshapers Inc A tiempo completo**Position : Lead Site Reliability Engineer****Location : Remote****Duration : Contract**- Lead and mentor a team of SREs to ensure operational excellence and maximize the reliability and availability of client systems.- Minimum 10 years of work experience in DevOps/SRE, including leadership roles.- Architect and design highly scalable and available...
-
Lead Site Reliability Engineer
hace 1 semana
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are looking for an experienced **Lead Site Reliability Engineer**to join our team. In this role, you will play a pivotal part in the Reliability Tooling team, taking responsibility for writing and reviewing code, making key technical decisions, and mentoring engineers within your squad. This position requires a strong grasp of SRE principles and best...
-
Senior Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems.**Responsibilities**- Troubleshoot and resolve complex incidents to maintain system uptime- Ensure reliability and performance of Azure-based enterprise infrastructure- Implement observability, monitoring, and logging solutions-...
-
Lead Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems.**Responsibilities**- Resolve complex incidents to ensure system availability- Maintain reliability and performance of Azure-based enterprise infrastructure- Deploy observability, monitoring, and logging tools- Automate...
-
Senior Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems. **Responsibilities** - Troubleshoot and resolve complex incidents to maintain system uptime - Ensure reliability and performance of Azure-based enterprise infrastructure - Implement observability, monitoring, and logging...
-
Lead Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems. **Responsibilities** - Resolve complex incidents to ensure system availability - Maintain reliability and performance of Azure-based enterprise infrastructure - Deploy observability, monitoring, and logging tools - Automate...