Senior Site Reliability Engineer, Observability

hace 3 semanas

distrito federal, México Chainlink A tiempo completo

Senior Site Reliability Engineer, Observability About Chainlink Chainlink is the industry‑standard oracle platform bringing the capital markets on‑chain and powering the majority of decentralized finance (DeFi). The Chainlink stack provides the essential data, interoperability, compliance, and privacy standards needed to power advanced blockchain use cases for institutional tokenized assets, lending, payments, stablecoins, and more. Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions in transaction value and now secures the vast majority of DeFi. Many of the world’s largest financial services institutions have adopted Chainlink’s standards and infrastructure, including Swift, Euroclear, Mastercard, Fidelity International, UBS, S&P Dow Jones Indices, FTSE Russell, WisdomTree, ANZ, and top protocols such as Aave, Lido, GMX and many others. The Observability Team enables Chainlink development and empowers engineers to continue building and supporting crucial products and services that have a profound impact in the blockchain industry. Reliability is vital to the success of our company. As a Senior SRE, you will help us accelerate and enable other engineering teams by increasing self‑service and decreasing cognitive load. This job would be perfect for someone who has a strong DevOps mentality, is passionate about building and maintaining a mature GitOps environment, and has experience focusing on observability. The entire engineering team is expanding, and you would have plenty of opportunities to build, learn, and grow. We all have different backgrounds and are determined to help you succeed no matter where you are or who you are. If you think you would do a great job at Chainlink, we are looking forward to speaking with you, even if you don’t match 100 % of the job requirements. Your Impact Build and orchestrate a modern OTEL‑based observability platform Support multiple telemetry types, like metrics, logs, and traces Define and support modern governance in observability and problems at scale Ensure reliability, security, and performance exceed our defined SLAs Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action Ingest, aggregate, transform, and utilize data from a multitude of sources in our real‑time data pipeline Oversee the availability, performance, and supportability of our observability infrastructure Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release Champion reliability and security by taking the time to do your work right the first time Requirements 7+ years of relevant professional experience, typically on a devops, infrastructure, SRE, or platform team Ability to develop software outside the scope of typical infrastructure requirements and configurations Experience programming in C, C++, Java, Python, Go, Perl, or Ruby Expert knowledge in all aspects of designing, developing, and managing large real‑time systems Experience with monitoring and logging: export metrics using Prometheus, build Grafana dashboards, use a centralized logging solution such as an ELK stack, Splunk, or Grafana stack Experience with distributed systems and container orchestration, including building and maintaining Kubernetes clusters Strong communication skills, able to give and receive constructive feedback, and not shy away from planning meetings and code reviews Desired Qualifications Excitement for blockchain, Web 3.0, and similar decentralized technologies Experience running infrastructure in the blockchain/Web3 space Ability to scale systems sustainably through automation and evolving systems to improve reliability and velocity Experience working remotely in a distributed team A strong desire to grow and challenge yourself, constantly finding ways to improve and automate services to reduce toil Tools and Services Used Daily AWS; Terraform/Terragrunt; Kubernetes, Calico and ArgoCD; Prometheus and Grafana; GitHub Actions; Packer Commitment to Equal Opportunity Chainlink Labs is an equal opportunity employer. All qualified applicants will receive equal consideration for employment in compliance with applicable laws, regulations, or ordinances. If you need assistance or accommodation due to a disability or special need when applying for a role or in our recruitment process, please contact us via this form. Global Data Privacy Notice for Job Candidates and Applicants Information collected and processed as part of your Chainlink Labs Careers profile, and any job applications you choose to submit, is subject to our Privacy Policy. By submitting your application, you are agreeing to our use and processing of your data as required. #J-18808-Ljbffr

Site Reliability Engineer

hace 3 semanas

distrito federal, México Sur Global A tiempo completo

Site Reliability Engineer - 100% Remote in Mexico As the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission‑critical SaaS platform. You must be confident in operating and debugging both modern infrastructure (cloud‑native, containerized services) and classic Windows production environments (IIS, SQL...
Site Reliability Engineer

hace 4 semanas

distrito federal, México W3Global A tiempo completo

Site Reliability Engineer Join to apply for the Site Reliability Engineer role at W3Global Required qualifications: AWS experience Gitlab Terraform or AWS CDK Python Familiarity with GO Linux OS administration advanced scripting - bash Windows OS administration advanced scripting - powershell Seniority level Entry level Employment type Full-time Job function...
Site Reliability Engineer ID45689

hace 4 semanas

distrito federal, México AgileEngine A tiempo completo

Join to apply for the Site Reliability Engineer role at AgileEngine. AgileEngine is an Inc. 5000 company that creates award‑winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in application development and AI/ML, and our people‑first culture has earned us multiple Best Place to Work awards....
Site Reliability Engineer

hace 3 semanas

distrito federal, México Coforge A tiempo completo

Job Title / Role : SRE Lead Key Skills : Azure, AWS, Terraform, ARM templates Experience : 10+ Location : Mexico City, Mexico. Shift : General Mode : On-Site We at Coforge are seeking “ SRE Lead ” with the following skill-set : Role Overview We are seeking an experienced SRE and DevOps Lead to drive reliability, scalability, and automation across...
Site Reliability Engineer

hace 3 semanas

distrito federal, México Azka IT A tiempo completo

Site Reliability Engineer (SRE) AZKAIT es una empresa mexicana que busca y conecta el mejor talento IT con empresas Latinoamericanas y de Estados Unidos. Requisitos Licenciatura o Ingeniería en Sistemas, Informática o afín. +5 años de experiencia en roles de SRE, DevOps o Ingeniería de Software. Experiencia programando en Python. Experiencia con Docker...
Senior SRE: Cloud-Native Reliability

hace 4 semanas

distrito federal, México AgileEngine A tiempo completo

A leading technology firm in Mexico City is looking for a Site Reliability Engineer to design secure, scalable cloud-native systems. You will develop infrastructure solutions, enhance CI/CD practices, and ensure system reliability through effective DevSecOps strategies. This role offers the chance to work with innovative technologies in a collaborative...
Senior AWS SRE: Cloud Reliability

hace 2 semanas

distrito federal, México HSBC A tiempo completo

A global financial institution is seeking an experienced AWS Site Reliability Engineer to support a resilient and scalable AWS infrastructure. Responsibilities include monitoring AWS services, conducting root cause analysis, and collaborating with engineering teams to enhance system performance. Ideal candidates must have a strong understanding of AWS...
Director, Site Reliability Engineering

hace 2 semanas

distrito federal, México Mastercard A tiempo completo

Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships...
Remote Senior Backend Engineer

hace 2 semanas

distrito federal, México Lumenalta A tiempo completo

A tech company is seeking a Senior Backend Engineer specializing in Python to design and maintain scalable backend systems. The role involves implementing microservices, managing serverless applications, and ensuring system reliability. Ideal candidates have over 7 years of backend experience, are familiar with AWS, and can communicate effectively within a...
Senior Software Engineer II

hace 2 semanas

distrito federal, México RELX A tiempo completo

1 week ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Are you ready to translate complex product needs into innovative software designs?How would you like to leverage your expertise in resolving technical issues and mentoring less-senior developers?About Our TeamLexisNexis Legal & Professional, which...

Américas

Europa

Asia / Oceanía

África

Senior Site Reliability Engineer, Observability