Site Reliability Engineer

hace 4 semanas


Desde casa, México Luxoft A tiempo completo

**Project description**:Do you like to work with existing and new software product development teams? This position is to instrument end-to-end observability and visibility for business-critical systems with log ingestion, metrics, and traces. You will function as a site reliability engineer (SRE) that will collaborate with product teams, infrastructure SMEs, DevOps engineers, and the proactive monitoring team to provide unique dashboards of germane service level analytics for various product stakeholders.**Responsibilities**:Work closely with software product development teams (ITSO, Product Owner, SME) to implement monitoring & observability instrumentation within their platforms.Drive adoption of best practices in monitoring, alerting, automation, and site reliability.Lead/contribute to engineering efforts from design to implementation focusing on instrumentation of logs, metrics, and traces.Drive use of automation in software instrumentation as well as in response to service degradation events.Identify and execute on opportunities to implement instrumentation in pre-production environments.Proactively pursue continuous improvement and expansion in observability coverage, service reliability best practices, incident management, and problem management.**Skills**:Must haveProduction support experience as developer for e-commerce platformStrong knowledge and experience in JavaSRE experienceScripting experience5+ years of experience with administrating Linux and at least 2 years in supporting production environments;Experience with designing large-scale distributed solutions accompanied with it's capacity planning;Deep understanding of TCP/IP networking;Familiar with SLA, SLO, and SLI terms;Experience with monitoring and alerting tools like Grafana, Datadog, Prometheus etc;Strong knowledge of virtualization and containerization principles including orchestration tools;Familiar with CaC and IaC tools (Ansible, Salt, Terraform, Packer);Familiar with CI/CD tools (Jenkins, Azure DevOps);Experience with relational and NoSQL DBMSA clear understanding of Agile and DevOps culture and what kind of problem they intended to solve;Strong written and verbal communication skills;Understanding of information security principles;Understanding of popular deployment strategies (Feature flags, Blue/Green, Canary, Dark launch, etc);"Critical thinker" and "problem solver"Nice to haveExperience working with AzurePrevious experience of working in SRE teams;**Other**:LanguagesEnglish: B2 Upper IntermediateSenioritySeniorRemote Mexico, MexicoReq. VR- Technical Support (SL2)Cross Industry Solutions27/01/2025Req. VR-


  • Site Reliability Engineer

    hace 2 semanas


    Desde casa, México thegetch mexico A tiempo completo

    **Función: Site Reliability Engineer** **Aperturas: más de 10 contrataciones** **Ubicación: - any city with TCS Office presence (Queretaro, Guadalajara, Mexico City or Monterrey)** **Salario: - 25-33 USD/hr** **Comunicación en inglés: avanzado** **Experiência: 4+ años** **Responsabilidades de Site Reliability Engineer**: Reúna y analice métricas...


  • Desde casa, México Right Balance A tiempo completo

    **Overview** We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA. **Engagement Details** Our client is a USA-based company producing video solutions with the mission to advance scientific...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are seeking an experienced **Senior Site Reliability Engineer**to join our team.As a key member of the Reliability Tooling team, you will be responsible for writing and reviewing code, contributing to critical technical decisions, and mentoring engineers within your squad. This role requires a deep understanding of SRE principles and best practices, as...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are looking for an experienced **Lead Site Reliability Engineer**to join our team.In this role, you will play a pivotal part in the Reliability Tooling team, taking responsibility for writing and reviewing code, making key technical decisions, and mentoring engineers within your squad. This position requires a strong grasp of SRE principles and best...

  • Site Reliability Engineer

    hace 2 semanas


    Desde casa, México Synechron A tiempo completo

    Synechron is a self-funded, leading digital transformation Consulting firm focused on the financial services industry working to accelerate digital initiatives for Banks, Asset Managers and Insurance. We achieve this by providing our clients with innovative solutions that solve their most complex business challenges and combining Synechron’s unique,...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are looking for a skilled **Site Reliability Engineer**to join our team. This position will focus on supporting the LatAm timezone, working closely with a team of SREs and a hands-on Lead SRE, while collaborating with a European-based SRE team. The role ensures seamless follow-the-sun 24/7 on-call support for a customer platform comprising multiple Java...


  • Desde casa, México Cabify A tiempo completo

    Do you want to change the world? At Cabify, that’s what we’re doing. We aim to make cities better places to live by improving mobility for the people living in them, connecting riders to drivers, providing mobility alternatives such as scooters and mopeds and many others to come, all at the touch of a button. Maybe one day cities will be places where...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Site Reliability Engineer,** where you will focus on cloud infrastructure, containerization, and monitoring using Kubernetes and Microsoft Azure.**Responsibilities**- Deploy and maintain Kubernetes resource manifests in clusters such as Kind, GKE, or AKS- Troubleshoot and analyze logs to identify and resolve system events and issues-...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are seeking an experienced **Senior Site Reliability Engineer**to join our team. This role will cover the LatAm timezone, working collaboratively with a team of SREs and a hands-on Lead SRE, while also coordinating with a European-based SRE team. The position ensures follow-the-sun 24/7 on-call support for a customer platform that includes multiple Java...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Senior Site Reliability Engineer**, where you will maintain and improve our product monitoring system, manage incident responses, and facilitate collaboration between operations and development teams. **Responsibilities** - Maintain and improve the product monitoring system - Manage incident response including troubleshooting,...