Site Reliability Engineer

hace 3 semanas


Ciudad de México, Ciudad de México Ford Motor Company A tiempo completo
Job Summary

We are seeking a highly skilled Site Reliability Engineer to join our team at Ford Motor Company. As an SRE, you will be responsible for designing, implementing, and maintaining our observability solutions to ensure optimal performance and reliability of our IT systems and applications.

Main Responsibilities
  • Utilize Observability and Monitoring tools to detect and resolve issues affecting user experience.
  • Automate alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.
  • Work with Splunk query language and monitoring database connection health using Splunk DB connect health dashboards, log parsing, complex Splunk searches, including external table lookups, Splunk data flow, components, features, and product capability.
  • Observability:
    • Implement comprehensive monitoring and alerting solutions using GCP monitoring services and external services.
    • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
    • Build vital and efficient tooling to lower the barrier of entrance for engineering teams to plug in and enjoy the benefits of Reliability focused on Observability.
    • Configure dashboards, alerts, and notifications to ensure timely identification and resolution of issues.
    • Troubleshoot issues and outages, working closely with development and operations teams to identify root causes and develop solutions.
    • Monitor server, network infrastructure, and application performance metrics, and identify patterns and trends to improve system performance and reliability.
    • Develop and integrate tools for logging, monitoring, and alerting to enhance visibility into system performance.
    • Participate in strategic planning for the technology roadmap, including scalability, cost-effectiveness, and risk management considerations related to observability infrastructure.
Requirements
  • 6+ years of SRE observability engineering experience.
  • 6+ years of experience in observability best practices working with Dynatrace or similar tools (NewRelic, DataDog, AppDynamics, or other similar APM suites), delivering solutions across all environments, and integrating platforms and applications with monitoring and APM tools.
  • Knowledge of CI/CD tools such as Puppet, Jenkins, Terraform, Ansible.
  • Minimum 4 to 5 years' working experience in OpenShift and Docker/K8s.
  • Proficiency in implementing monitoring and observability solutions using GCP monitoring services such as Cloud Monitoring, Logging, and Tracing.
  • Deep understanding of IT infrastructure monitoring and observability best practices.
  • Experience with gathering and organizing large amounts of data to use for instrumentation into an Enterprise monitoring solution.
  • Experience with recommending baseline monitoring thresholds and performance monitoring KPIs and SLAs.
  • At least 4 years of experience in the development of Grafana dashboards, developing metrics/monitoring standardization - metrics, collection, dashboards with Grafana a must.
  • 3-5 years of experience with SQL and familiarity with at least one managed Kubernetes platform (EKS, AKS, GKE).
  • Strong background in software engineering, with expertise in relevant programming languages (like Python, Java, Go) and cloud platforms (like AWS, GCP, Azure).
  • Experience with container orchestration tools like Kubernetes.
Competencies and Skills
  • Strong interpersonal and organizational skills.
  • Strong verbal and written skills.
  • Attention to detail.
  • Excellent time management.
  • Extraordinary teamwork and collaborative skills.

Language: English (en-US)



  • Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Thomson Reuters. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud-based...

  • Site Reliability Engineer

    hace 3 semanas


    Ciudad de México, Ciudad de México Atos SE A tiempo completo

    Site Reliability EngineerEviden, a global leader in digital transformation, is seeking a skilled Site Reliability Engineer to join its team. As a trusted partner to the Atos Group, Eviden brings deep expertise in data-driven, trusted, and sustainable digital transformation.Key ResponsibilitiesDevelop and maintain high-quality software using scripting...


  • Ciudad de México, Ciudad de México Azka IT Consulting A tiempo completo

    Azka IT Consulting is a leading IT services company that connects top talent with Latin American and US companies.We are seeking a skilled Site Reliability Engineer to join our team.Job SummaryThe Site Reliability Engineer plays a critical role in designing, implementing, and maintaining highly available, scalable, and reliable systems.Key...

  • Site Reliability Engineer

    hace 3 semanas


    Ciudad de México, Ciudad de México Azka IT Consulting A tiempo completo

    Azka IT Consulting is a leading IT talent connector between Latin America and the United States.We are seeking a skilled Site Reliability Engineer to join our team.Job SummaryThe Site Reliability Engineer plays a critical role in designing, implementing, and maintaining highly available, scalable, and reliable systems.Key ResponsibilitiesDevelop and maintain...


  • Ciudad de México, Ciudad de México Thales A tiempo completo

    Job DescriptionThales is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of our large-scale ODC services.ResponsibilitiesDesign, build, and maintain scalable and reliable infrastructure using Infrastructure as a Code...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Azka IT Consulting A tiempo completo

    Azka IT Consulting is a Mexican company that connects top IT talent with Latin American and United States companies.We are seeking a skilled Site Reliability Engineer to join our team.Job RequirementsThe Site Reliability Engineer plays a crucial role in designing, implementing, and maintaining highly available, scalable, and reliable systems.Technical...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Thomson Reuters. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based applications and infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly...


  • Ciudad de México, Ciudad de México Refinitiv A tiempo completo

    Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Refinitiv. As a key member of our Cloud Operations team, you will be responsible for designing, building, and maintaining scalable and highly available cloud-based systems.About the RoleIn this exciting opportunity, you will:Develop and...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Ford Motor Company A tiempo completo

    Job Title: Site Reliability EngineerAt Ford Motor Company, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, configuring, and maintaining our observability solutions to ensure optimal performance and reliability of our IT systems and applications.Key...


  • Ciudad de México, Ciudad de México LatamCent A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at LatamCent. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure.Key ResponsibilitiesDesign and implement scalable and highly available systemsCollaborate with cross-functional teams to...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Virtualent A tiempo completo

    {"h2": "Site Reliability Engineer at Virtualent", "p": "At Virtualent, we're passionate about connecting top talent with the best opportunities. We're looking for a Site Reliability Engineer to join our team and help us deliver high-quality services to our clients.", "ul": [{"li": "Design, implement, and maintain scalable and highly available...


  • Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Thomson Reuters. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud-based...

  • Site Reliability Engineer

    hace 3 semanas


    Ciudad de México, Ciudad de México Epam A tiempo completo

    {"h3": "About the Role", "p": "As a Site Reliability Engineer at EPAM, you will play a crucial part in ensuring the reliability and efficiency of our cloud infrastructure. Your expertise in designing, building, testing, and deploying changes to existing software will be invaluable in guiding teams and enhancing our IT infrastructure security protocols.",...

  • Site Reliability Engineer

    hace 3 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Thomson Reuters. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our cloud-based applications.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud infrastructure on...

  • Site Reliability Engineer

    hace 3 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Thomson Reuters. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud-based...


  • Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleIn this opportunity as a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems. You will work closely with cross-functional teams to design, implement, and maintain our cloud infrastructure, ensuring that it meets the needs of our business.Key...


  • Ciudad de México, Ciudad de México Ford Motor Company A tiempo completo

    Job SummaryFord Motor Company is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, configuring, monitoring, implementing, and maintaining our observability solutions and troubleshooting Ford Credit IT systems and applications to ensure optimal performance and...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleAs a Site Reliability Engineer at Thomson Reuters, you will play a critical role in ensuring the high availability and performance of our cloud-based services. You will be responsible for designing, implementing, and maintaining scalable and reliable systems, as well as troubleshooting and resolving issues in a timely manner.Key...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México Thales A tiempo completo

    About ThalesThales is a leading provider of digital security solutions, helping organizations protect their identities, data, and services in a rapidly changing digital landscape.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team in Mexico City. As a Site Reliability Engineer, you will play a critical role in ensuring the...

  • Site Reliability Engineer

    hace 3 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Thomson Reuters. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our cloud-based applications and infrastructure.Key Responsibilities:Provide technical support and delivery capability for the design,...