Site Reliability Engineer

hace 3 semanas

estado de méxico Félix A tiempo completo

About Us At Félix , we’re building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI‑powered chatbot built on WhatsApp, allowing users to send money home as easily as sending a text message. We leverage cutting‑edge technology like AI, blockchain, and stablecoins to make cross‑border payments faster, more affordable, and more accessible than ever before. We are a hyper‑growth Series B company, backed by over $100 million in funding from top‑tier global investors, including QED, Castle Island, Switch Ventures, HTwenty, Monashees, and General Catalyst Customer Value Fund. Félix was selected as an “Endeavour Entrepreneur” and received the CrossTech Fintech Startup Award. Our team is composed of extremely talented, dedicated high‑performers united by a single goal: empowering our customers. Joining Félix means you will be part of a team building a legacy that will outlive us all. We foster radical transparency, constructive feedback, and a bias for action, focused on creating sustainable value and a product that truly improves our users’ lives. We are building the future, today. About The Role We’re looking for a Site Reliability Engineer (SRE) to join our Engineering Operations team, reporting directly to Damian Finol, Head of EngOps. This new role strengthens the reliability, scalability, and security of the infrastructure that powers our fintech platform. The SRE will work closely with Engineering and SecOps to ensure our systems are highly available, observable, and cost‑efficient. The role blends software engineering, systems operations, and security practices, with strong emphasis on automation, proactive monitoring, and continuous improvement. Responsibilities Manage and optimize our infrastructure on Google Cloud Platform (GCP) and Google Kubernetes Engine (GKE). Automate provisioning and configuration using Terraform, Helm, and scripting languages such as Go, Python, and Bash. Build, maintain, and improve monitoring and alerting systems using Prometheus, Grafana, and centralized logging tools (e.g., ELK or Loki). Participate in on‑call rotations, incident response, and post‑mortem analyses, ensuring rapid recovery and continuous learning from failures. Define and track SLOs/SLIs and error budgets to monitor service health and performance. Implement cloud security best practices to protect sensitive data and maintain the integrity of our systems. Collaborate across Engineering, Security, and Product teams to embed reliability and automation in every phase of development and deployment. Contribute to GKE cost optimization and resource management strategies to enhance efficiency and control operational spend. Requirements 4+ years of experience as an SRE, DevOps, Infrastructure, or Platform Engineer. Strong hands‑on experience with GCP and GKE. Proficiency in Kubernetes (architecture, deployments, networking, and troubleshooting). Solid programming or scripting skills in Go, Python, or Bash. Experience with Terraform and Helm for Infrastructure as Code. Strong understanding of monitoring and observability using Prometheus, Grafana, and logging frameworks. Familiarity with incident management, on‑call operations, and post‑mortem processes. Knowledge of network fundamentals (TCP/IP, DNS, load balancing). Experience with PostgreSQL or distributed databases. Awareness of FinOps and cloud cost management principles. Excellent problem‑solving, communication, and collaboration skills, with a proactive mindset. Certified Kubernetes Administrator (CKA). Experience in FinOps, cloud security, or regulated industries. Familiarity with PagerDuty or similar incident management tools. Background implementing SLOs/SLIs and error budgets in production environments. These are the applicable requisites, although equivalent competencies in any of the above will also be considered. What We Offer Competitive salary Initial stock‑option grant Annual performance bonus Health, dental, and vision plans Remote work environment (hybrid model available) Continuous learning opportunities Unlimited PTO Paid parental leave Empowering opportunities for growth in a dynamic entrepreneurial environment Equal Opportunity Employer At Félix, we are committed to providing equal employment opportunities to all qualified employees and applicants without regard to race, religion, nationality, sex, sexual orientation, gender identity, age, or disability. This policy applies to all terms and conditions of employment, including recruitment, hiring, placement, promotion, training, compensation, benefits, and termination. Referrals increase your chances of interviewing at Félix by 2×. Get notified about new Site Reliability Engineer jobs in Mexico . #J-18808-Ljbffr

Site Reliability Engineer

hace 5 días

Ciudad de México Atos A tiempo completo

**Job Applicant Privacy Notice**:**Site Reliability Engineer**:- Publication Date: Jan 8, 2025- Ref. No: - Location: Mexico City, MX**_Site Reliability Engineer_**Certain Scripting experience in languages like Java or Python or Shell scripting.- +3 years of significant experience in working as Site Reliability Engineer- Strong in Terraform, Ansible, Packer,...
Site Reliability Engineer

hace 7 días

Ciudad de México Atos A tiempo completo

**Job Applicant Privacy Notice**: **Site Reliability Engineer**: - Publication Date: Jan 8, 2025 - Ref. No: 523940 - Location: Mexico City, MX **_Site Reliability Engineer_** Certain Scripting experience in languages like Java or Python or Shell scripting. - +3 years of significant experience in working as Site Reliability Engineer - Strong in Terraform,...
Site Reliability Engineer

hace 2 semanas

Ciudad de México Royal Caribbean Group A tiempo completo

Join to apply for the Site Reliability Engineer role at Royal Caribbean Group 1 week ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer role at Royal Caribbean Group Get AI-powered advice on this job and more exclusive features. Journey with us! Combine your career goals and sense of adventure by joining our incredible team...
Site Reliability Engineer

hace 5 días

Ciudad de México Atos A tiempo completo

**Job Applicant Privacy Notice**: **Site Reliability Engineer**: - Publication Date: Jan 14, 2025 - Ref. No: 523941 - Location: Mexico City, MX Eviden, part of the Atos Group, with an annual revenue of circa € 5 billion is a global leader in data-driven, trusted and sustainable digital transformation. As a next generation digital business with worldwide...
Site Reliability Engineer

hace 2 semanas

Ciudad de México Royal Caribbean Group A tiempo completo

Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Site Reliability Engineer Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group . We are proud to offer a competitive compensation and benefits package, and excellent career development...
Site Reliability Engineer

hace 1 día

Ciudad de México Zenta group A tiempo completo

**Site Reliability Engineer | Presencial - CDMX****Resumen del Rol**:Como **Site Reliability Engineer (SRE)** en Zenta Group, serás el puente entre desarrollo y operaciones, asegurando que los servicios sean **escalables, confiables y resilientes**. Diseñarás e implementarás soluciones que mejoren la estabilidad y el rendimiento de la infraestructura,...
Site Reliability Engineer

hace 1 día

Ciudad de México Zenta group A tiempo completo

**Site Reliability Engineer | Presencial - CDMX** **Resumen del Rol**: Como **Site Reliability Engineer (SRE)** en Zenta Group, serás el puente entre desarrollo y operaciones, asegurando que los servicios sean **escalables, confiables y resilientes**. Diseñarás e implementarás soluciones que mejoren la estabilidad y el rendimiento de la infraestructura,...
Site Reliability Engineer

hace 3 días

Ciudad de México The Functionary A tiempo completo

Senior Site Reliability Engineer We are looking for a Senior Site Reliability Engineer to build and maintain reliable, high‑capacity, and high‑performing systems that support our mission to protect and improve customer platforms, with a strong focus on reliability, security, performance, cost, and operational excellence. As a Site Reliability Engineer on...
Site Reliability Engineer

hace 2 semanas

Ciudad de México Tata Consultancy Services A tiempo completo

We are looking for a Site Reliability Engineer (SRE) to join our team and help us ensure seamless, high-performing, and reliable technology operations. What you’ll work with: Azure DevOps - Pipelines, repositories, and automation ServiceNow - Incident, change, and problem management AppDynamics - Application performance monitoring and alerting Microsoft...
Site Reliability Engineer

hace 1 día

estado de méxico HCLTech A tiempo completo

Direct message the job poster from HCLTech Fulltime Permanent Position with HCLTech Key Responsibilities Design, build, and maintain highly available, scalable, and secure infrastructure Implement monitoring, alerting, and incident response strategies Optimize system reliability and performance across cloud and containerized environments Collaborate with...

Américas

Europa

Asia / Oceanía

África

Site Reliability Engineer