Site Reliability Engineer
hace 6 días
Site Reliability Engineer (SRE) – Application Performance Monitoring (APM) Location: Monterrey, Nuevo León, Mexico (Hybrid – candidates must reside in Monterrey or the metropolitan area) Language requirement: Fluent English (spoken and written) About the Role We’re looking for a Site Reliability Engineer (SRE) with a passion for Application Performance Monitoring (APM) and system optimization. In this role, you’ll be at the heart of ensuring the reliability, scalability, and performance of NOV’s mission‑critical applications. You’ll work closely with software engineering and operations teams to design monitoring strategies, analyze performance, and proactively prevent issues before they affect users. If you thrive in fast‑paced environments, love solving complex technical challenges, and enjoy turning data into insight, this is the role for you. What You’ll Do Design and manage APM strategies using tools like Elastic APM, Datadog, Dynatrace, or similar platforms. Perform deep performance analysis, tracing distributed requests and identifying bottlenecks in both code and infrastructure. Build real‑time dashboards and alerting systems using Grafana, Kibana, or equivalent tools to visualize system health. Proactively monitor systems to detect performance degradations, security threats, and system failures before users are impacted. Define and track Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to continuously improve reliability. Lead Root Cause Analysis (RCA) sessions after incidents and implement corrective actions to prevent recurrence. Automate repetitive tasks and monitoring setups using Python, Bash, or PowerShell. Collaborate with cross‑functional teams to embed reliability, performance, and observability best practices into every stage of development. Continuously refine tools, processes, and APM strategies to enhance efficiency, reliability, and visibility across platforms. Engage with stakeholders to understand performance challenges and shape the platform roadmap. What You Bring Bachelor’s or Master’s degree in Computer Science, Engineering, or related field. 5+ years of experience in Site Reliability, DevOps, or Performance Engineering roles. Proven hands‑on experience with APM tools such as Elastic APM, Datadog, Dynatrace, New Relic, or AppDynamics. Expertise in the Elastic Stack (Elasticsearch, Logstash, Kibana, Beats) for logging, monitoring, and APM. Deep understanding of SRE principles, DevOps methodologies, and Production Support operations. Strong scripting ability in Python, Bash, or PowerShell for automation and analysis. Solid grasp of Linux/Unix systems, networking fundamentals, and distributed system architecture. Experience with containerization (Docker) and orchestration (Kubernetes). Excellent analytical, problem‑solving, and collaboration skills, with the ability to communicate effectively in a global team. Preferred Skills Fluent English (Mandatory) Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or Chef. Familiarity with cloud‑native services (AWS, Azure, or GCP) and serverless architectures (AWS Lambda, Azure Functions). Knowledge of CI/CD tools like GitHub Actions, Azure DevOps, or Jenkins. Understanding of other observability pillars, including metrics (Prometheus) and logging. Experience working in agile environments. Why NOV At NOV, we combine over 150 years of innovation with cutting‑edge technology to power the global energy industry. You’ll join a global engineering team that values collaboration, curiosity, and continuous improvement — giving you the opportunity to make a real impact on systems that matter. Seniority Level Mid‑Senior level Employment Type Full‑time Job Function Engineering and Information Technology Industries Oil and Gas #J-18808-Ljbffr
-
Site Reliability Engineer: Cloud, Automation
hace 6 días
baja california, México Itj A tiempo completoA leading technology company in Baja California is looking for a Site Reliability Engineer to enhance software architecture, automate infrastructure, and maintain monitoring solutions. This position requires experience in cloud-centric systems, particularly using Python and Terraform. The ideal candidate will collaborate with development teams and maintain...
-
Remote Site Reliability Engineer – Open Source Cloud
hace 2 semanas
baja california, México Canonical A tiempo completoA leading open source software provider is seeking a Site Reliability Engineer. In this role, you will deploy and manage OpenStack, Kubernetes, and various storage solutions while practicing DevOps. Ideal candidates will have a strong foundation in Python and Linux, with the ability to operate in mission-critical services globally. This is a remote position...
-
Site Reliability Engineer
hace 6 días
baja california, México Itj A tiempo completoPosition Overview SREs at ITJ support our mission by pushing out new features and applications every day. The Site Reliability Engineering team constantly practices the DevOps mindset to build and deploy distributed, fault‑tolerant systems at scale. As part of this team, you will work with developers, operations, and product sponsors to help design, build,...
-
Remote SRE
hace 6 días
baja california, México Canonical A tiempo completoA leading open-source software company is seeking a Site Reliability / Gitops Engineer to join their Information Systems team. This role focuses on infrastructure automation and software operations for their extensive customer base, utilizing the best of open-source technologies. Candidates should have a strong background in Linux, cloud computing, and...
-
Automation & Facilities Engineer – Uptime Leader
hace 2 semanas
baja california, México Medtronic A tiempo completoA global leader in healthcare is seeking a Maintenance Excellence Engineer to drive the reliability and performance of automated systems. The ideal candidate will have over 7 years of experience in automation, robotics maintenance, and PLC programming. Responsibilities include complex troubleshooting, ensuring safety compliance, and leading improvement...
-
Cloud Distributed Systems Test Engineer
hace 6 días
baja california, México Canonical A tiempo completoA leading technology firm in Mexico is seeking a Software Engineer for Distributed Systems Testing to ensure the reliability and performance of cloud orchestration tools. The role involves creating automated testing infrastructures, enhancing CI pipelines, and collaborating with a global team. Ideal candidates will have a strong background in modern testing...
-
Electrical Engineer
hace 1 semana
baja california, México Alliance Air Products, Llc. A tiempo completoOverview Established in 2004, Alliance Air Products specializes in designing and manufacturing custom HVAC solutions tailored to meet specific requirements for demanding commercial and industrial applications. Renowned for delivering high-quality, efficient, and engineered-to-order equipment, the company is focused on exceeding customer expectations. In...
-
Senior Quality Engineer
hace 6 días
baja california, México Schlage De México, S.A. De C.V. A tiempo completoDescripción y detalle de las actividades Senior Quality Engineer will work on complex and innovative projects to reduce internal and external quality issues. Will provide technical support to define inspections techniques, test procedures, equipment development and training. Will prepare quality documentation such as control plans, FMEAs, inspections plans...
-
Sre/devops Engineer
hace 3 semanas
La California, México VSG Business Solutions A tiempo completo**3+ years** of experience in a Site Reliability Engineer (SRE) or similar role in a digital environment.- Strong expertise with **Dynatrace **for performance monitoring, analysis, and optimization.- Proficiency in **Google Analytics**, including the ability to derive insights and optimize user experiences based on data.- Advanced data analysis skills-...
-
Qa Software Engineer
hace 6 días
baja california, México Grupo Tress Internacional A tiempo completoOverview We are seeking a skilled Test Equipment Engineer / Section Manager (TE) to join our team in the electrical/electronic manufacturing industry. This role offers exciting opportunities for career growth and development, as well as the chance to collaborate with international teams from Taiwan, China, the Philippines, and beyond. We warmly welcome local...