Site Reliability Engineering Expert

hace 2 semanas


Ciudad de México, Ciudad de México Ford Motor Company A tiempo completo

As a Site Reliability Engineering Expert at Ford Motor Company, you will play a crucial role in designing, configuring, and maintaining our observability solutions.

Main Responsibilities

  • You will utilize Observability and Monitoring tools to detect and resolve issues affecting user experience.
  • Automate alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.
  • Implement comprehensive monitoring and alerting solutions using GCP monitoring services and external services.
  • Gather and analyze metrics from operating systems and applications to assist in performance tuning and fault finding.
  • Develop vital and efficient tooling to lower the barrier of entrance for engineering teams to plug in and enjoy the benefits of Reliability focused on Observability.
  • Configure dashboards, alerts, and notifications to ensure timely identification and resolution of issues.
  • Troubleshoot issues and outages, working closely with development and operations teams to identify root causes and develop solutions.
  • Monitor Server, network infrastructure, and application performance metrics, and identify patterns and trends to improve system performance and reliability.

Requirements

  • 6+ years of SRE observability engineering experience.
  • 6+ years of experience in observability best practices working with Dynatrace or similar tools.
  • Knowledge of CI/CD tools such as Puppet, Jenkins, Terraform, Ansible.
  • Minimum 4 to 5 years' working experience in OpenShift and Docker/K8s.
  • Proficiency in implementing monitoring and observability solutions using GCP monitoring services.
  • Deep understanding of IT infrastructure monitoring and observability best practices.
  • Experience with gathering and organizing large amounts of data to use for instrumentation into an Enterprise monitoring solution.
  • Experience with recommending baseline monitoring thresholds and performance monitoring KPIs and SLAs.
  • 3-5 years of experience with SQL and familiarity with at least one managed Kubernetes platform.
  • Strong background in software engineering, with expertise in relevant programming languages and cloud platforms.

Competencies and Skills

  • Strong interpersonal and organizational skills.
  • Strong verbal and written skills.
  • Attention to detail.
  • Excellent time management.
  • Extraordinary teamwork and collaborative skills.


  • Ciudad de México, Ciudad de México FPC Franchise A tiempo completo

    Reliability Engineering ExpertFPC Bangor is seeking a Reliability Engineering Expert to provide technical and strategic expertise in life cycle asset management. The successful candidate will report to the Production Manager and be responsible for process mapping and improvements in equipment reliability, availability, and maintainability.Key...


  • Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleThis is an exciting opportunity to work as a Senior Site Reliability Engineer at Thomson Reuters. As a key member of our team, you will be responsible for designing, implementing, and maintaining highly available and scalable systems that meet the needs of our business.Key Responsibilities:Develop and maintain monitoring and alerting systems to...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    Unlock the Power of Cloud OperationsThomson Reuters is seeking a skilled Site Reliability Engineer to join our team. As a key member of our Cloud Operations team, you will be responsible for ensuring the reliability and performance of our cloud-based services.About the RoleWe are looking for a highly motivated and experienced Site Reliability Engineer who...


  • Ciudad de México, Ciudad de México Crunchyroll A tiempo completo

    About the RoleWe are seeking a skilled Staff Site Reliability Engineer to join our Data Engineering team in Mexico City. This is an exceptional opportunity to shape the future of anime and contribute to the success of Crunchyroll.


  • Ciudad de México, Ciudad de México Ellation US A tiempo completo

    About the RoleAs a Staff Site Reliability Engineer for the Data Engineering team at Crunchyroll, you will be responsible for maintaining and enhancing the reliability of our data infrastructure. Your work will directly impact the availability and performance of our data services, enabling the organization to make better decisions.Crunchyroll is growing and...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México 1210 Kyndryl Mexico S. de R.L. de C.V. A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Kyndryl Mexico. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, resiliency, and innovation of our information systems and ecosystems.Key ResponsibilitiesDesign and implement application monitoring to ensure reliability and...

  • Site Reliability Engineer

    hace 3 semanas


    Ciudad de México, Ciudad de México Epam A tiempo completo

    About the RoleWe are seeking a skilled Site Reliability Engineer to join our team at EPAM. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure.ResponsibilitiesDesign, build, test, and deploy changes to our existing cloud infrastructureEnhance the security...


  • Ciudad de México, Ciudad de México Oracle A tiempo completo

    Site Reliability and Automation ExpertiseWe are seeking a seasoned site reliability engineer to join our team at Oracle. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our critical systems.Key ResponsibilitiesSolve complex problems related to Linux infrastructure and Oracle...


  • Ciudad de México, Ciudad de México Svitla Systems A tiempo completo

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Svitla Systems. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Responsibilities:Design and implement automation to reduce toil and improve...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México Ford Motor Company A tiempo completo

    Job Title: Site Reliability EngineerAt Ford Motor Company, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, configuring, and maintaining our observability solutions to ensure optimal performance and reliability of our IT systems and applications.Key...


  • Ciudad de México, Ciudad de México Crunchyroll, LLC A tiempo completo

    About CrunchyrollAt Crunchyroll, we're committed to delivering the art and culture of anime to our global community. As a Staff Site Reliability Engineer on our Data Engineering team, you'll play a pivotal role in ensuring the reliability, scalability, and performance of our data infrastructure.About the RoleWe're looking for a highly skilled engineer to...


  • Ciudad de México, Ciudad de México FICO A tiempo completo

    Site Reliability Engineering - Engineer IIAt FICO, we're seeking a skilled Site Reliability Engineering Specialist to join our global team. As a key member of our operations team, you'll be responsible for providing 24x7 support for our cloud-based solutions. Your exceptional diagnostic and problem-solving skills will enable you to drive incidents to...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Thales A tiempo completo

    Thales is a global leader in digital security. Our solutions empower organizations to securely interact with people, objects, and services. As a Site Reliability Engineer, you will contribute to the development and maintenance of our large-scale ODC services. Your focus will be on ensuring the reliability, availability, and performance of these systems. This...

  • Site Reliability Engineer

    hace 3 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleIn this opportunity as a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our cloud-based infrastructure. This includes designing, implementing, and maintaining scalable and secure systems that meet the needs of our business.Key ResponsibilitiesDesign and implement scalable and secure...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleIn this exciting opportunity as a Site Reliability Engineer, you will play a crucial role in ensuring the smooth operation of our cloud-based services. Your primary responsibility will be to design, test, deliver, support, and maintain production services in our technical operations environment.Key ResponsibilitiesProvide skilled technical...


  • Ciudad de México, Ciudad de México FPC Franchise A tiempo completo

    RELIABILITY ENGINEERFPC Bangor is seeking a highly skilled Reliability Engineer to provide technical expertise in life cycle asset management. The successful candidate will be responsible for process mapping and improvements in equipment reliability, availability, and maintainability.Key ResponsibilitiesDevelop equipment system and location hierarchy.Analyze...


  • Ciudad de México, Ciudad de México Medallia A tiempo completo

    About the RoleMedallia is the pioneer and market leader in Experience Management. Our award-winning SaaS platform, Medallia Experience Cloud, leads the market in understanding and managing experience for candidates, customers, employees, patients, citizens, and residents.We are committed to creating a culture that values every person and every experience....

  • Reliability Expert

    hace 4 semanas


    Ciudad de México, Ciudad de México FPC Franchise A tiempo completo

    Reliability Engineer PositionFPC Bangor is seeking a highly skilled Reliability Engineer to provide technical expertise in life cycle asset management. The successful candidate will report to the Production Manager and be responsible for process mapping and improvements in equipment reliability, availability, and maintainability.Key ResponsibilitiesDevelop...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Thales A tiempo completo

    At Thales, we rely on talented individuals to architect digital security solutions. As a Site Reliability Engineer, you will play a vital role in ensuring the reliability, availability, and performance of our large-scale services. Collaborating closely with development teams, you will design, build, and maintain scalable infrastructure, automate processes,...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México Epam A tiempo completo

    About the RoleWe are seeking a skilled Site Reliability Engineer to join our team at EPAM. As a key member of our infrastructure team, you will be responsible for designing, building, testing, and deploying changes to our existing software.ResponsibilitiesGuide teams in designing, building, testing, and deploying changes to existing softwareEnhance the...