Site Reliability Engineer

hace 2 semanas


Ciudad de México, Ciudad de México Felix Technologies, Inc. A tiempo completo
About Us
At Félix, we're building the financial ecosystem for Latin immigrants in the U.S., starting with a revolution in remittances. Our core product is an AI-powered chatbot built on WhatsApp, allowing our users to send money home as easily as sending a text message. We leverage cutting-edge technology like AI, blockchain, and stablecoins to make cross-border payments faster, more affordable, and more accessible than ever before. 
We are a hyper-growth Series B company, backed by over $100 million in funding from top-tier global investors, including QED, Castle Island, Switch Ventures, HTwenty, Monashees, and General Catalyst Customer Value Fund. This isn't just about the numbers; it's a testament to the trust our investors have in our vision and our team. Additionally, Félix was selected as an "Endeavour Entrepreneur" and was a recipient of the CrossTech Fintech Startups Award. We are a group of extremely talented and dedicated high-performers, united by our shared obsession with a single goal: empowering our customers. We are all owners of Félix, driven by a bias for action and a true experimentation spirit to get shit done with urgency and focus.
Joining Félix means you will be part of a team building a legacy, a company that will outlive us all. This is a rare opportunity to apply your skills to a deeply meaningful mission—serving a community that has been underserved for too long. We are a team that is fiercely loyal to each other, where radical transparency and constructive feedback are how we grow and push for excellence. We are bold, we care less about what others are doing, and more about creating sustainable value and a product that truly makes our users' lives better. We are building the future, today.
About the Role
We're looking for a Site Reliability Engineer (SRE) to join our Engineering Operations team, reporting directly to Damian Finol, Head of EngOps. This is a new role focused on strengthening the reliability, scalability, and security of the infrastructure that powers our fintech platform. You'll work closely with Engineering and SecOps to ensure our systems are highly available, observable, and cost-efficient. The role blends software engineering, systems operations, and security practices, with a strong emphasis on automation, proactive monitoring, and continuous improvement.
Responsibilities
Manage and optimize our infrastructure on Google Cloud Platform (GCP) and Google Kubernetes Engine (GKE).Automate provisioning and configuration using Terraform, Helm, and scripting languages such as Go, Python, and Bash.Build, maintain, and improve monitoring and alerting systems using Prometheus, Grafana, and centralized logging tools (e.g., ELK or Loki).Participate in on-call rotations, incident response, and post-mortem analyses, ensuring rapid recovery and continuous learning from failures.Define and track SLOs/SLIs and error budgets to monitor service health and performance.Implement cloud security best practices to protect sensitive data and maintain the integrity of our systems.Collaborate across Engineering, Security, and Product teams to embed reliability and automation in every phase of development and deployment.Contribute to GKE cost optimization and resource management strategies to enhance efficiency and control operational spend.
Requirements
4+ years of experience as an SRE, DevOps, Infrastructure, or Platform Engineer.Strong hands-on experience with GCP and GKE.Proficiency in Kubernetes (architecture, deployments, networking, and troubleshooting).Solid programming or scripting skills in Go, Python, or Bash.Experience with Terraform and Helm for Infrastructure as Code.Strong understanding of monitoring and observability using Prometheus, Grafana, and logging frameworks.Familiarity with incident management, on-call operations, and post-mortem processes.Knowledge of network fundamentals (TCP/IP, DNS, load balancing).Experience with PostgreSQL or distributed databases.Awareness of FinOps and cloud cost management principles.Excellent problem-solving, communication, and collaboration skills, with a proactive mindset.Certified Kubernetes Administrator (CKA).Experience in FinOps, cloud security, or regulated industries.Familiarity with PagerDuty or similar incident management tools.Background implementing SLOs/SLIs and error budgets in production environments.These are the applicable requisites, although equivalent competencies in any of the above will also be considered.
What We Offer
Competitive salaryInitial stock options grantAnnual performance bonusHealth, dental, and vision plans Remote work environment, although we have offices in Miami and México City and would love to work in hybrid model if you are up to it.Continuous learning opportunities Unlimited PTOPaid parental leaveEmpowering opportunities for growth in a dynamic entrepreneurial environment
Equal Opportunity Employer
At Félix, we are committed to providing equal employment opportunities to all qualified employees and applicants without regard to race, religion, nationality, sex, sexual orientation, gender identity, age, or disability. This policy applies to all terms and conditions of employment, including recruitment, hiring, placement, promotion, training, compensation, benefits, and termination.
Want to learn more about our privacy practices? Check out our Privacy Policy.

  • Ciudad de México, Ciudad de México Azkait A tiempo completo

    AZKAITes una empresa mexicana que busca y conecta el mejor talento IT con empresas Latinoamericanas y de Estados Unidos.Estamos en la búsqueda de tu talento comoSite Reliability Engineer (SRE)Requisitos:Licenciatura o Ingeniería en Sistemas, Informática o afín.+5 años de experiencia en roles de SRE, DevOps o Ingeniería de Software.Experiencia...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Encora A tiempo completo

    Important Information:Years of Experience: 5+ yearsJob Mode: Full-timeWork Mode: Remote within MexicoJob Summary:We are seeking a Site Reliability Engineer to ensure the reliability, scalability, and performance of custom platforms running on AWS infrastructure and Kubernetes. This role focuses on Tier 3 issue resolution, operational readiness for new...


  • Ciudad de México, Ciudad de México itD Tech A tiempo completo

    itD is seeking a Site Reliability Engineer who will report to the Sr. Engineering Manager for a client in the gaming and entertainment space. As a Site Reliability Engineer, you will focus on designing, deploying, and operating resilient, secure, and globally scalable services in AWS, with , TypeScript, Kubernetes, GitLab, Argo CD (CI/CD).This long-term W2...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Sur A tiempo completo

    As the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform. You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Tech Mahindra A tiempo completo

    We're Hiring We are seeking a talented Site Reliability Engineer (SRE) CDMX with robust experience in Azure environments, Kubernetes, and DevOps practices.Your mission will be to ensure the reliability, scalability, and automation of our critical platforms. If you thrive on solving complex challenges, automating processes, and ensuring seamless operations,...


  • Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    Senior Site Reliability Engineer (SRE)Are you passionate about the chance to bring your experience to a world-class company that is market-leading or both content and technology? If yes, we're looking for you.Join our team Senior Site Reliability Engineer (SRE) will be implement Site Reliability Engineering and DevOps best practices. Feed non-functional...


  • Ciudad de México, Ciudad de México Capgemini A tiempo completo

    Our Client is one of the United States' largest insurers, providing a wide range of insurance and financial services products with gross written premiums well over US$25 Billion (P&C). They proudly serve more than 10 million U.S. households with more than 19 million individual policies across all 50 states through the efforts of over 48,000 exclusive and...


  • Ciudad de México, Ciudad de México Mastercard A tiempo completo

    Our PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships...


  • Ciudad de México, Ciudad de México Mastercard A tiempo completo

    Our PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we're helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships...


  • Ciudad de México, Ciudad de México AXA Group Operations A tiempo completo

    Main missionsBeing part of our global team as a Linux Engineer and become a key member of the SRO Squad (Site Reliability Operations), collaborating with a diverse group of experts to ensure robust and secure Linux (RHEL) infrastructure worldwide.Engineer (Build) and test solutions, document accordingly and handover to operations team. Provide 3rd level...