Site Reliability Engineer

hace 2 semanas

Santiago de Querétaro, México MedTrainer A tiempo completo

Compensation: MXN 45,000 - MXN 70,000 - monthly Company Description MedTrainer is an innovator in the healthcare industry, changing the landscape of technology offerings with its Platform Solution, comprised of our proprietary Learning Management System (LMS), our core focus on Compliance Training, and our Managed Services offering in Credentialing and Compliance Management. We impact thousands of healthcare providers, and we are building the future of healthcare through innovation, scale, and collaboration. Job Description Looking for a Site Reliability Engineer who can build, scale, maintain, and monitor highly available, secure, and cost-efficient cloud platforms and Kubernetes workloads with a strong focus on reliability engineering practices (SLIs/SLOs, error budgets, incident response, postmortems). Own production readiness and operational excellence across infrastructure and delivery tooling. Ensure performance, uptime, and scalability while maintaining high standards of code quality and thoughtful design. Lead the transition and continuous improvement of applications and infrastructure toward resilient, automated, and observable systems. Qualifications Bachelor's in Computer Science, equivalent degree, or equivalent professional experience. 3+ years working on distributed systems and cloud operations. Strong hands‑on experience with at least two major cloud providers (Azure, AWS, GCP) and their managed Kubernetes services. Deep experience architecting and/or operating large Kubernetes clusters: workload identity, networking, storage, autoscaling, upgrades, security, and multi‑tenancy. Container expertise (Docker/OCI), packaging and configuration, and service mesh experience is a plus. Advanced GitHub Actions expertise: reusable workflows/composites, concurrency/queueing, environments and approvals, OIDC federation, artifacts, caching, dependency review, and policy/as‑code. Strong Python skills (required) for Pulumi‑based IaC, tooling, and automation; Golang knowledge is a plus. Familiarity with CI/CD, change management, and experience in progressive delivery. Observability stack experience and alerting practices tied to SLOs. Configuration of cloud‑native networking, storage, Linux, security controls, and cost governance. Experience migrating and scaling infrastructure across clouds. Relevant certifications (e.g., CKA) are a plus. Advanced English (optional) Responsibilities Design, build, and operate production‑grade Kubernetes (AKS) clusters and supporting services with high availability, security, and cost optimization. Architect, implement, and maintain CI/CD using GitHub Actions (advanced), including reusable workflows, matrices, environments, required approvals, OIDC‑based cloud auth, self‑hosted runners, and policy controls. Define, codify, and evolve Infrastructure as Code with Pulumi (Python) as the primary stack; create reusable components, enforce code reviews, testing, and documentation. Develop and maintain configuration management with Ansible (roles, collections, inventories, playbooks) for OS, middleware, and app operations. Implement progressive delivery and deployment strategies (blue/green, canary, feature flags) and automate rollback/roll‑forward based on health checks and SLOs. Establish comprehensive observability (metrics, logs, traces, profiles) with alerting tied to SLIs; drive capacity planning, performance tuning, and chaos/resiliency testing. Lead incident management and on‑call response; coordinate triage, communication, mitigation, root‑cause analysis, and follow‑through on corrective actions. Partner with product and engineering to design for reliability (readiness/liveness probes, graceful shutdown, backpressure, retries/timeouts, circuit breakers). Implement security best practices (least privilege, secrets management) and ensure compliance with internal policies and audits. Continuously review existing systems, eliminate toil via automation, reduce technical debt, and document operational runbooks and standards. Essential technologies and/or skills: Exceptional problem‑solving, with the ability to anticipate and remediate issues before they affect business productivity. Proven experience handling production environments and being available for emergencies. Clear, calm communication with technical and non‑technical audiences. Passion for detail and a structured, methodical mindset in design, execution, and documentation. Professional, positive approach with strong ethics and high working morale. Curiosity to learn, bias for automation, and a true can‑do attitude. Version control tools (Git/GitHub) Continuous Integration servers (GitHub Actions as primary) Configuration management (Ansible) Containers (Docker/OCI) Monitoring and analytics (metrics/logs/traces, APM, alerting) Secrets management and security scanning/signing Incident management and on‑call tooling Python (scripting level) MySQL Additional Information What We Offer Competitive monthly net salary: $45,000 – $70,000 MXN. 100% remote work from anywhere in Mexico. Major Medical Insurance and healthcare coverage. Home office and ergonomics support (internet, electricity, office chair). Professional development opportunities, including English classes. Wellness benefits such as TotalPass gym discounts. Savings plan. Paid time off, including personal days. A collaborative, international, and growth‑oriented environment. All your information will be kept confidential according to EEO guidelines. #J-18808-Ljbffr

Site Reliability Engineer

hace 3 semanas

santiago de querétaro, México BairesDev A tiempo completo

Site Reliability Engineer - Remote Work | REF# Join to apply for the Site Reliability Engineer - Remote Work | REF# role at BairesDev Site Reliability Engineer - Remote Work | REF# 6 months ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer - Remote Work | REF# role at BairesDev At BairesDev, we've been leading the way in...
Senior Site Reliability Engineer

hace 2 semanas

santiago de querétaro, México Canonical A tiempo completo

Senior Site Reliability Engineer Canonical, a leading provider of open source software and the Ubuntu operating system, is hiring a Senior Site Reliability Engineer to join its distributed engineering team. Responsibilities Architect and run OpenStack, Kubernetes, and storage solutions across bare metal and container environments. Develop Python-based...
Senior Site Reliability Engineer

hace 1 semana

Santiago de Querétaro, México Canonical A tiempo completo

Senior Site Reliability Engineer Canonical, a leading provider of open source software and the Ubuntu operating system, is hiring a Senior Site Reliability Engineer to join its distributed engineering team. Responsibilities Architect and run OpenStack, Kubernetes, and storage solutions across bare metal and container environments. Develop Python-based...
Senior Site Reliability Engineer

hace 1 semana

Santiago de Querétaro, México Canonical A tiempo completo

Senior Site Reliability Engineer Canonical, a leading provider of open source software and the Ubuntu operating system, is hiring a Senior Site Reliability Engineer to join its distributed engineering team. Responsibilities Architect and run OpenStack, Kubernetes, and storage solutions across bare metal and container environments. Develop Python-based...
Site Reliability Engineer

hace 2 semanas

Santiago de Querétaro, México BairesDev A tiempo completo

Site Reliability Engineer - Remote Work | REF#282115Join to apply for the Site Reliability Engineer - Remote Work | REF#282115 role at BairesDevSite Reliability Engineer - Remote Work | REF#2821156 months ago Be among the first 25 applicantsJoin to apply for the Site Reliability Engineer - Remote Work | REF#282115 role at BairesDevAt BairesDev, we've been...
Site Reliability Engineer

hace 2 semanas

Santiago de Querétaro, México BairesDev A tiempo completo

Site Reliability Engineer - Remote Work | REF#282115Join to apply for the Site Reliability Engineer - Remote Work | REF#282115 role at BairesDevSite Reliability Engineer - Remote Work | REF#2821156 months ago Be among the first 25 applicantsJoin to apply for the Site Reliability Engineer - Remote Work | REF#282115 role at BairesDevAt BairesDev, we've been...
Remote Site Reliability Engineer: Kubernetes

hace 6 días

Santiago de Querétaro, México MedTrainer A tiempo completo

A leading technology provider in healthcare is seeking a Site Reliability Engineer to build, scale, and maintain cloud platforms. The ideal candidate will have 3+ years of experience in cloud operations and Kubernetes, with a strong focus on reliability engineering practices. This position offers a competitive monthly salary of MXN 45,000 to MXN 70,000 and...
Site Reliability Engineer

hace 2 semanas

Ciudad de México UST A tiempo completo

Join to apply for the Site Reliability Engineer role at UST Continue with Google Continue with Google Join to apply for the Site Reliability Engineer role at UST Get AI-powered advice on this job and more exclusive features. Sign in to access AI-powered advices Continue with Google Continue with Google Continue with Google Continue with Google Continue with...
Site Reliability Engineer

hace 3 días

Ciudad de México Royal Caribbean Group A tiempo completo

Join to apply for the Site Reliability Engineer role at Royal Caribbean Group 1 week ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer role at Royal Caribbean Group Get AI-powered advice on this job and more exclusive features. Journey with us! Combine your career goals and sense of adventure by joining our incredible team...
Site Reliability Engineer

hace 3 semanas

Ciudad de México Atos A tiempo completo

**Job Applicant Privacy Notice**:**Site Reliability Engineer**:- Publication Date: Jan 14, 2025- Ref. No: - Location: Mexico City, MXEviden, part of the Atos Group, with an annual revenue of circa € 5 billion is a global leader in data-driven, trusted and sustainable digital transformation. As a next generation digital business with worldwide leading...

Américas

Europa

Asia / Oceanía

África

Site Reliability Engineer