Site Reliability Engineer
hace 1 semana
JOB DESCRIPTIONSite Reliability Engineer (SRE) - Application Performance Monitoring (APM)Location: Monterrey, Nuevo León, Mexico (Hybrid - candidates must reside in Monterrey or the metropolitan area)Language requirement: Fluent English (spoken and written)About the RoleWe're looking for a Site Reliability Engineer (SRE) with a passion for Application Performance Monitoring (APM) and system optimization.In this role, you'll be at the heart of ensuring the reliability, scalability, and performance of NOV's mission-critical applications. You'll work closely with software engineering and operations teams to design monitoring strategies, analyze performance, and proactively prevent issues before they affect users.If you thrive in fast-paced environments, love solving complex technical challenges, and enjoy turning data into insight, this is the role for you.What You'll Do Design and manage APM strategies using tools like Elastic APM, Datadog, Dynatrace, or similar platforms. Perform deep performance analysis, tracing distributed requests and identifying bottlenecks in both code and infrastructure. Build real-time dashboards and alerting systems using Grafana, Kibana, or equivalent tools to visualize system health. Proactively monitor systems to detect performance degradations, security threats, and system failures - before users are impacted. Define and track Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to continuously improve reliability. Lead Root Cause Analysis (RCA) sessions after incidents and implement corrective actions to prevent recurrence. Automate repetitive tasks and monitoring setups using Python, Bash, or PowerShell. Collaborate with cross-functional teams to embed reliability, performance, and observability best practices into every stage of development. Continuously refine tools, processes, and APM strategies to enhance efficiency, reliability, and visibility across platforms. Engage with stakeholders to understand performance challenges and shape the platform roadmap.What You Bring Bachelor's or Master's degree in Computer Science, Engineering, or related field. 5+ years of experience in Site Reliability, DevOps, or Performance Engineering roles. Proven hands-on experience with APM tools such as Elastic APM, Datadog, Dynatrace, New Relic, or AppDynamics. Expertise in the Elastic Stack (Elasticsearch, Logstash, Kibana, Beats) for logging, monitoring, and APM. Deep understanding of SRE principles, DevOps methodologies, and Production Support operations. Strong scripting ability in Python, Bash, or PowerShell for automation and analysis. Solid grasp of Linux/Unix systems, networking fundamentals, and distributed system architecture. Experience with containerization (Docker) and orchestration (Kubernetes). Excellent analytical, problem-solving, and collaboration skills, with the ability to communicate effectively in a global team.Preferred Skills Fluent English (Mandatory) Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or Chef. Familiarity with cloud-native services (AWS, Azure, or GCP) and serverless architectures (AWS Lambda, Azure Functions). Knowledge of CI/CD tools like GitHub Actions, Azure DevOps, or Jenkins. Understanding of other observability pillars, including metrics (Prometheus) and logging. Experience working in agile environments.Why NOVAt NOV, we combine over 150 years of innovation with cutting-edge technology to power the global energy industry.You'll join a global engineering team that values collaboration, curiosity, and continuous improvement - giving you the opportunity to make a real impact on systems that matter.
-
Site Reliability Engineer
hace 2 semanas
Monterrey, Nuevo León, México NOV A tiempo completoJob DescriptionSite Reliability Engineer (SRE) – Application Performance Monitoring (APM)Location:Monterrey, Nuevo León, Mexico (Hybrid – candidates must reside in Monterrey or the metropolitan area)Language requirement:Fluent English (spoken and written)About The RoleWe're looking for aSite Reliability Engineer (SRE)with a passion forApplication...
-
Site Reliability Engineer
hace 1 día
Monterrey, Nuevo León, México NOV A tiempo completoDescriptionSite Reliability Engineer (SRE) – Application Performance Monitoring (APM)Location: Monterrey, Nuevo León, Mexico (Hybrid – candidates must reside in Monterrey or the metropolitan area)Language requirement: Fluent English (spoken and written)About the RoleWe're looking for a Site Reliability Engineer (SRE) with a passion for Application...
-
Senior Site Reliability Engineer
hace 2 semanas
Monterrey, México Perficient, Inc A tiempo completoWe currently have a career opportunity for a **Senior Site Reliability Engineer**to join our team located in Mexico. At Perficient, we’re passionate about building software that solves problems. We count on our site reliability engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their...
-
Site Reliability Engineer
hace 3 días
Monterrey, Nuevo León, México NOV A tiempo completoSite Reliability Engineer (SRE) – Application Performance Monitoring (APM)Location:Monterrey, Nuevo León, Mexico (Hybrid – candidates must reside in Monterrey or the metropolitan area).Language requirement:Fluent English (spoken and written).About the Role:We're looking for aSite Reliability Engineer (SRE)with a passion forApplication Performance...
-
Lead Site Reliability Engineer
hace 2 semanas
Monterrey, México Interfell A tiempo completoSon una empresa integradora de servicios y productos de tecnología orientados a la inteligencia de negocios, el aseguramiento de la calidad del software y a generar soluciones que incrementen la agilidad de las empresas, proveen una combinación única de talento humano con experiência, capacidades técnicas y humanas que marcan la diferencia. Sus...
-
Senior Site Reliability Engineer
hace 4 semanas
Monterrey, México Datalogics A tiempo completo**Senior Site Reliability Engineer (Hybrid from Monterrey)**- **MXN $1,020,000 - $1,260,000/year (gross)**:- **Equity and comprehensive health benefits**:- **Hybrid from Monterrey, Mexico**:- **Full-time, Contract of Employment**Are you passionate about optimizing software release processes in a fast-paced environment? Do you enjoy building tools and systems...
-
Site Reliability Engineer
hace 1 semana
Monterrey, México National Oilwell Varco A tiempo completoOverview We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) with a specialization in Application Performance Monitoring (APM) to join our team. You will be a key player in ensuring the reliability, performance, and scalability of our mission-critical applications and systems. You will work closely with software engineering and...
-
Site Reliability Engineer ID45689
hace 3 semanas
Monterrey, México AgileEngine A tiempo completoOverview Site Reliability Engineer ID45689 at AgileEngine. AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards. WHY...
-
Site Reliability Engineer ID45689
hace 3 semanas
Monterrey, México AgileEngine A tiempo completoOverview Site Reliability Engineer ID45689 at AgileEngine. AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards. WHY...
-
Sre- Site Reliability Engineering
hace 2 semanas
Monterrey, México GSB Solutions A tiempo completoImportant IT company At the Latin American level, growth requires: **SRE- Site Reliability Engineering** **Job description**: - We are looking for a Lead Site Reliability Engineer who takes the initiative on developing and maintain the system and services for our Cash Management Platform, automating the deployment process, ensuring system scaling,...