Lead Site Reliability Engineer

hace 2 semanas


Desde casa, México Tekshapers Inc A tiempo completo

**Position : Lead Site Reliability Engineer****Location : Remote****Duration : Contract**- Lead and mentor a team of SREs to ensure operational excellence and maximize the reliability and availability of client systems.- Minimum 10 years of work experience in DevOps/SRE, including leadership roles.- Architect and design highly scalable and available infrastructure solutions, integrating best practices in reliability engineering and automation.- Collaborate with cross-functional teams (DevOps, Development, IT) to implement SRE principles throughout the software development life cycle.- Establish and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical services, monitoring and maintaining performance against defined targets.- Implement and enhance observability, alerting, and incident response processes to proactively address issues and minimize downtime.- Develop and maintain documentation related to system architecture, configuration, and procedures.- Stay current with industry trends, recommending and adopting new tools and practices to enhance system reliability.**Qualifications**:- Strong background in designing and implementing highly available and scalable infrastructure.- Proficiency in scripting and automation using Python or Shell.- Experience with container orchestration platforms, serverless architectures, CI/CD pipelines, and IaC implementations. (Ansible & Terraform).- Experience with Observability tools (preferred: Datadog, CloudWatch).- In-depth knowledge of cloud computing platforms (preferred: AWS).- Solid understanding of SRE/DevOps principles and practices.- Excellent problem-solving skills with the ability to troubleshoot complex issues in production environments.- Strong communication and leadership skills, fostering effective collaboration with cross-functional teams.- Relevant certifications in SRE, DevOps, Cloud, etc., are a plus.**Job Types**: Full-time, Contract**Salary**: From $65,000.00 per month**Experience**:- SRE: 4 years (required)Work Location: Remote



  • Desde casa, México thegetch mexico A tiempo completo

    **Función: Site Reliability Engineer****Aperturas: más de 10 contrataciones****Ubicación: - any city with TCS Office presence (Queretaro, Guadalajara, Mexico City or Monterrey)****Salario:- 25-33 USD/hr****Comunicación en inglés: avanzado****Experiência: 4+ años****Responsabilidades de Site Reliability Engineer**:Reúna y analice métricas de sistemas...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are looking for an experienced **Lead Site Reliability Engineer**to join our team. In this role, you will play a pivotal part in the Reliability Tooling team, taking responsibility for writing and reviewing code, making key technical decisions, and mentoring engineers within your squad. This position requires a strong grasp of SRE principles and best...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems.**Responsibilities**- Resolve complex incidents to ensure system availability- Maintain reliability and performance of Azure-based enterprise infrastructure- Deploy observability, monitoring, and logging tools- Automate...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems. **Responsibilities** - Resolve complex incidents to ensure system availability - Maintain reliability and performance of Azure-based enterprise infrastructure - Deploy observability, monitoring, and logging tools - Automate...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are looking for an experienced **Site Reliability Engineer (SRE)** to take a leadership role in ensuring the stability, scalability, and performance of our cloud infrastructure on **Google Cloud Platform (GCP)**. As an SRE, you will be at the forefront of optimizing system reliability, automating processes, and collaborating with engineering teams to...


  • Desde casa, México Right Balance A tiempo completo

    **Overview** We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA. **Engagement Details** Our client is a USA-based company producing video solutions with the mission to advance scientific...


  • Desde casa, México Right Balance A tiempo completo

    **Overview**We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA.**Engagement Details**Our client is a USA-based company producing video solutions with the mission to advance scientific...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are seeking a Lead Site Reliability Engineer to join our team.In this role, you will help drive the reliability and performance of critical systems for a leading client. You will work in a collaborative environment focused on innovation and operational excellence. Please note, the client operates in the US Central Time Zone from 8 am CST to 5 pm...


  • Desde casa, México Luxoft A tiempo completo

    **Project description**: Do you like to work with existing and new software product development teams? This position is to instrument end-to-end observability and visibility for business-critical systems with log ingestion, metrics, and traces. You will function as a site reliability engineer (SRE) that will collaborate with product teams, infrastructure...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are seeking an experienced **Senior Site Reliability Engineer**to join our team. As a key member of the Reliability Tooling team, you will be responsible for writing and reviewing code, contributing to critical technical decisions, and mentoring engineers within your squad. This role requires a deep understanding of SRE principles and best practices, as...