Senior Reliaibility Engineer
hace 2 semanas
At Truelogic, we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we've been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals.
Our team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects. Whether collaborating with Fortune 500 giants or scaling startups, we deliver results that make a difference.
By applying for this position, you're taking the first step in joining a dynamic team that values your expertise and aspirations. We aim to align your skills with opportunities that foster exceptional career growth and success while contributing to transformative projects that shape the future.
Our ClientA data-driven technology company that partners with high-growth brands to optimize customer acquisition and retention. It specializes in delivering high-LTV audiences and enrichment data to increase repeat purchase rates. The company collaborates with major platforms and agencies such as Shopify, Experian, TransUnion, and top media partners, all focused on driving profitable revenue growth.
Job Summary
The Site Reliability Engineer plays a key role in operating, observing, and improving the reliability of existing distributed systems running on AWS and Kubernetes, with a strong emphasis on observability, operational maturity, and automated responses to system behavior. Rather than focusing on provisioning infrastructure from scratch, this role concentrates on understanding how services behave in production, detecting when they are not operating correctly, and enabling automated scaling, recovery, and remediation using existing platforms and tooling. The engineer partners closely with backend and platform teams to evolve observability practices, define reliability signals, and improve how the platform responds to operational and performance concerns, driving overall system resilience and reliability.
ResponsibilitiesDesigns, implements, and continuously improves observability strategies across services, including metrics, logs, traces, alerts, and dashboards.
Focuses on understanding system behavior in production, identifying failure modes, performance bottlenecks, and reliability risks.
Evolves and maintains shared AWS CDK and CDK8s constructs, with emphasis on observability, autoscaling, and operational safeguards rather than basic infrastructure provisioning.
Maintains and operates core platform components such as VPC, EKS clusters, RDS, OpenSearch, and MSK, ensuring they expose meaningful operational signals.
Operates and enhances Kubernetes cluster addons such as ingress controllers, cert-manager, autoscalers, and monitoring, logging, and tracing stacks.
Defines and maintains SLIs, SLOs, and alerting strategies that clearly distinguish between symptoms, root causes, and actionable operational events.
Improves automated operational responses, including autoscaling, self-healing mechanisms, and runbook-driven remediation.
Ensures high reliability through structured alerting systems (Prometheus, CloudWatch), noise reduction, alert quality improvements, and recovery mechanisms.
Collaborates with engineering teams to investigate production incidents, perform root cause analysis, and drive long-term reliability improvements.
Owns CI/CD pipelines for Infrastructure as Code (IaC) and observability-related platform components.
Applies Site Reliability Engineering (SRE) principles—including observability-first design, error budgets, and operational readiness—to shared platform services.
Supports IAM roles, secrets management, and tenant isolation best practices.
Has 5+ years of experience in Site Reliability Engineering, Platform Engineering, or Infrastructure roles, with significant hands-on experience operating and supporting production systems.
Demonstrates strong experience in observability operations, including defining metrics, logs, traces, dashboards, alerts, and reliability indicators for complex systems.
Has hands-on experience with AWS services such as VPC, IAM, RDS, MSK, S3, and CloudWatch, as well as Kubernetes components like Helm, RBAC, and ServiceAccounts.
Demonstrates fluency in Python and experience with Infrastructure-as-Code using AWS CDK, CDK8s, or equivalent frameworks.
Possesses a strong understanding of Prometheus, Grafana, alert tuning, alert fatigue reduction, and incident-driven monitoring improvements.
Has experience improving existing systems rather than building greenfield infrastructure, with a focus on operational excellence and system reliability.
Shows a proven track record of using observability data to drive automation, scaling decisions, and operational improvements.
Has experience designing reusable infrastructure or observability patterns, or contributing to internal developer or platform tooling.
Has experience supporting Spark on Kubernetes, Argo, or Kafka-based batch pipelines (nice to have).
100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection.
Highly Competitive USD Pay: Earn an excellent, market-leading compensation in USD, that goes beyond typical market offerings.
Paid Time Off: We value your well-being. Our paid time off policies ensure you have the chance to unwind and recharge when needed.
Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock.
Work with Top American Companies: Grow your expertise working on innovative, high-impact projects with Industry-Leading U.S. Companies.
A Culture That Values You: We prioritize well-being and work-life balance, offering engagement activities and fostering dynamic teams to ensure you thrive both personally and professionally.
Diverse, Global Network: Connect with over 600 professionals in 25+ countries, expand your network, and collaborate with a multicultural team from Latin America.
Team Up with Skilled Professionals: Join forces with senior talent. All of our team members are seasoned experts, ensuring you're working with the best in your field.
Apply now
-
Senior Data Engineer
hace 2 semanas
Ciudad de México, Ciudad de México Azkait A tiempo completoAzkait es una empresa Mexicana que busca y conecta el mejor talento IT con empresas Latinoamericanas y de Estados Unidos.Estamos en la búsqueda de tu talento comoSenior Data Engineer / BI Engineer.Requisitos:Licenciatura o Ingeniería pasante / titulado.Inglés conversacional intermedio avanzado (B2, C1, C2).Experiencia de 5 años o mayor como Data Engineer...
-
Senior Backend Engineer
hace 2 días
Ciudad de México, Ciudad de México Alvos A tiempo completoSenior Backend Engineer (Fintech / Lending Platform)We're looking for a Senior Backend Engineer who will help design and build a new platform from scratch.What you'll doDesign and implement backend services for loan origination, core loans, payments, risk and internal backoffice tools.Own end-to-end features: from architecture and data model to APIs, tests...
-
Senior Software Engineer
hace 24 horas
Ciudad de México, Ciudad de México Magmalabs A tiempo completoSenior Software Engineer (Ruby on Rails)Join MagmaLabs, a leading provider of expert software engineers, dedicated to helping companies achieve their goals across diverse and impactful industries. We are actively seeking a seasoned Senior Software Engineer (Ruby on Rails)About the RoleAre you a seasoned Senior Software Engineer (Ruby on Rails) passionate...
-
Senior Prompt Engineer
hace 1 semana
Ciudad de México, Ciudad de México Welocalize A tiempo completoWelocalize is seeking a Senior Prompt Engineer with world-class Python expertise and a sharp eye for data quality, engineering rigor, and visualization fidelity. In this role, you will create realistic datasets, write intuitive prompts, and develop high-quality "golden plots" to power insightful data tasks. You'll bridge technical precision with real-world...
-
Senior Software Engineer
hace 7 días
Ciudad de México, Ciudad de México Nelo A tiempo completoSenior Software Engineer | New York City | HybridYou like building things that don't fall apart when scale hits. We're a fintech building systems that move fast but stay reliable and we need a Senior Software Engineer who can keep both in balance.You'll work on distributed systems that power credit and marketplace products across LATAM. The problems are...
-
Senior Software Engineer
hace 2 semanas
Ciudad de México, Ciudad de México Canals AI A tiempo completoSenior Software EngineerRemote – Mexico | Full-Time | Canals AIAbout CanalsCanals is a bootstrapped, profitable startup transforming wholesale distribution (trillion dollar industry) with AI. Our platform seamlessly integrates with the systems distributors already use, automating tedious tasks and reducing failure points in moving physical goods across the...
-
Senior AI Data Engineer
hace 1 semana
Ciudad de México, Ciudad de México Welocalize A tiempo completoWelocalize is seeking a Senior Data Engineer with world-class Python expertise and a sharp eye for data quality, engineering rigor, and visualization fidelity. In this role, you will create realistic datasets, write intuitive prompts, and develop high-quality "golden plots" to power insightful data tasks. You'll bridge technical precision with real-world...
-
Senior Cloud Engineer
hace 2 semanas
Ciudad de México, Ciudad de México Value2Biz A tiempo completoAttention everyoneWe're looking for a Senior Cloud Service Engineer for designing, deploying, and supporting cloud infrastructure with a primary focus on Microsoft Azure and secondary expertise in AWS. For setup, configuration, monitoring, security, automation, optimization, and documentationA must:Azure-Focused Architecture, Deployment & Management•...
-
Senior Software Engineer
hace 4 días
Ciudad de México, Ciudad de México Tech9 A tiempo completoSenior Software Engineer (.NET Framework / WPF) About UsTech9 is shaking up a 20-year-old industry, and we're not slowing down. Recognized by Inc. 5000 as one of the nation's fastest-growing companies, we are dedicated to building innovative, high-quality software solutions. Our team is passionate about delivering technology that makes an impact. We offer a...
-
Senior Software Engineer
hace 24 horas
Ciudad de México, Ciudad de México Nir Yu A tiempo completoThe Role:Our Customer Experience team is looking for a kind and curious Senior Software Engineer (Full Stack) who enjoys solving challenging problems.Our awesome team of friendly humans is responsible for making it easy and compelling for individuals to consign items. We place a strong focus on individual growth and personal development on our collaborative...