Site Reliability Engineer – Azure DevOps
hace 2 semanas
Site Reliability Engineer – Azure DevOps 1 week ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential. Join our team as a Site Reliability Engineer , where you will ensure system reliability, manage incident responses, and enable seamless collaboration between operations and development teams. This role demands a background in Oil & Gas combined with expertise in automation and cloud technologies. Apply now to support critical infrastructure and drive operational excellence. Responsibilities Oversee and enhance the product monitoring system Handle incidents, including troubleshooting, resolution, documentation, and analysis Distribute knowledge and insights across teams Facilitate collaboration between operations and development Create automation for log analysis, testing production systems, and alerting Track system health, performance, and SLIs/SLOs/SLAs Maintain documentation for incident management procedures Conduct incident analyses and implement corrective actions Respond to on-call support requests during and after business hours Collaborate with teams to enhance system efficiency and reliability Leverage tools such as PagerDuty, ELK/Kibana, SEQ logging, Prometheus, and Grafana for system monitoring Develop scripts and implement automation solutions using Python, C#, and Bash Manage orchestration and infrastructure through SaltStack and Docker Support project workflows using Azure DevOps and maintain a comprehensive Wiki Maintain code repositories and implement version control systems using Git Requirements 1+ years of experience in creating solutions, particularly in Site Reliability Engineering Expertise in cloud services and automation scripting with Python and Bash Background in Oil & Gas operations and incident handling Skill in managing incident responses and providing on-call support Familiarity with monitoring tools such as Prometheus and Grafana Proficiency in logging tools like ELK/Kibana and SEQ logging Knowledge of orchestration and infrastructure solutions including SaltStack and Docker Understanding of fundamental networking concepts like inbound/outbound rules and firewalls Proficiency in tools for project management and issue tracking like Azure DevOps Capability to manage source code with Git Strong skills in creating documentation and disseminating knowledge Competency in conducting detailed post-incident reviews Excellent troubleshooting abilities and problem-solving skills Effective communication skills, with an English level of at least B2 Nice to have Experience using PagerDuty for incident handling Competency in C# programming Understanding of SQL and MongoDB databases Background in Zededa infrastructure Experience in supporting Oil & Gas field operations We offer International projects with top brands Work with global teams of highly skilled, diverse peers Employee financial programs Paid time off and sick leaveUpskilling, reskilling and certification courses Unlimited access to the LinkedIn Learning library and 22,000+ courses Global career opportunities Volunteer and community involvement opportunities EPAM Employee Groups Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn Seniority level Associate Employment type Full-time Job function Engineering, Information Technology, and Business Development Industries Software Development, IT Services and IT Consulting, and Nanotechnology Research #J-18808-Ljbffr
-
Senior Site Reliability Engineer/DevOps
hace 5 días
Ciudad de México GrainChain Inc A tiempo completoEstamos en busca de nuevos talentos! GrainChain es una empresa tecnológica dedicada a reducir la brecha digital en la industria agrícola. Nuestras plataformas facilitan las transacciones de manera rápida, seguras y sencillas para nuestros usuarios. Estamos en búsqueda de un Site Reliability Engineer capaz de integrar y automatizar las áreas de...
-
Site Reliability Engineer
hace 2 semanas
Ciudad de México GrainChain Inc A tiempo completo¡Estamos en busca de nuevos talentos!GrainChain es una empresa tecnológica dedicada a reducir la brecha digital en la industria agrícola. Nuestras plataformas facilitan las transacciones de manera rápida, seguras y sencillas para nuestros usuarios. Estamos en búsqueda de un Site Reliability Engineer capaz de integrar y automatizar las áreas de...
-
Azure DevOps Engineer
hace 1 semana
Ciudad de México TechBiz Global GmbH A tiempo completoAt TechBiz Global, we are providing recruitment service to our TOP clients from our portfolio. We are currently seeking a Azure DevOps Engineer**to join one of our **clients**' teams in Mexico. If you're looking for an exciting opportunity to grow in a innovative environment, this could be the perfect fit for you. **Key Responsibilities**: - Design,...
-
Site Reliability Engineer
hace 7 minutos
Ciudad de México GrainChain Inc A tiempo completoEstamos en busca de nuevos talentos! GrainChain es una empresa tecnológica dedicada a reducir la brecha digital en la industria agrícola. Nuestras plataformas facilitan las transacciones de manera rápida, seguras y sencillas para nuestros usuarios. Estamos en búsqueda de un Site Reliability Engineer capaz de integrar y automatizar las áreas de...
-
Site Reliability Engineer
hace 7 minutos
Ciudad de México GrainChain Inc A tiempo completo¡Estamos en busca de nuevos talentos! GrainChain es una empresa tecnológica dedicada a reducir la brecha digital en la industria agrícola. Nuestras plataformas facilitan las transacciones de manera rápida, seguras y sencillas para nuestros usuarios. Estamos en búsqueda de un Site Reliability Engineer capaz de integrar y automatizar las áreas de...
-
Site Reliability Engineer
hace 2 semanas
Ciudad de México Zenta group A tiempo completo**Site Reliability Engineer | Presencial - CDMX****Resumen del Rol**:Como **Site Reliability Engineer (SRE)** en Zenta Group, serás el puente entre desarrollo y operaciones, asegurando que los servicios sean **escalables, confiables y resilientes**. Diseñarás e implementarás soluciones que mejoren la estabilidad y el rendimiento de la infraestructura,...
-
Site Reliability Engineer
hace 4 semanas
Ciudad de México Thomson Reuters A tiempo completoLocation Mexico City, Mexico Category Technology Careers Job Id JREQ Job Type Full time Hybrid**Senior Site Reliability Engineer**Are you passionate about the chance to bring your experience to a world-class company that is market-leading or both content and technology? If yes, we’re looking for you.Join our team! We are looking for a Senior Site...
-
Senior Site Reliability Engineer
hace 4 semanas
Ciudad de México Thomson Reuters A tiempo completoLocation Mexico City, Mexico Category Technology Careers Job Id JREQ Job Type Full time Hybrid**Senior Site Reliability Engineer, Service Management.**Are you passionate about the chance to bring your experience to a world-class company that is market-leading for both content and technology? If yes, we are looking for you!Join our team! Your mission will be...
-
Site Reliability Engineer
hace 7 minutos
México ITJ A tiempo completoMid-level Site Reliability Engineer (SRE). The Site Reliability Engineering team constantly practices the DevOps mindset to build and deploy distributed, fault-tolerant systems at scale. As part of this team, you will work with developers, operations, and product sponsors to help design, build, and deploy the critical infrastructure needed. Essential Duties...
-
Azure DevOps Engineer
hace 3 semanas
Ciudad de México IQVIA Laboratories A tiempo completo**Job Overview**Designs, architects and implements automated build and release pipelines to support the company’s on-premise and cloud-based platforms and infrastructure. Creates and manages standardized software development environments.Develops automated systems for deployment, monitoring and fault tolerance.**Key Responsibilities**:Create and manage...