Site Reliability Engineer

hace 1 mes


Ciudad de México, Ciudad de México Ford Motor Company A tiempo completo
Job Title: Site Reliability Engineer

At Ford Motor Company, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, configuring, and maintaining our observability solutions to ensure optimal performance and reliability of our IT systems and applications.

Key Responsibilities:
  • Utilize Observability and Monitoring tools to detect and resolve issues affecting user experience.
  • Automate alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.
  • Implement comprehensive monitoring and alerting solutions using GCP monitoring services and external services.
  • Gather and analyze metrics from operating systems and applications to assist in performance tuning and fault finding.
  • Build vital and efficient tooling to lower the barrier of entrance for engineering teams to plug in and enjoy the benefits of Reliability focused on Observability.
  • Configure dashboards, alerts, and notifications to ensure timely identification and resolution of issues.
  • Troubleshoot issues and outages, working closely with development and operations teams to identify root causes and develop solutions.
  • Monitor server, network infrastructure, and application performance metrics, and identify patterns and trends to improve system performance and reliability.
  • Develop and integrate tools for logging, monitoring, and alerting to enhance visibility into system performance.
  • Participate in strategic planning for the technology roadmap, including scalability, cost-effectiveness, and risk management considerations related to observability infrastructure.
Requirements:
  • 6+ years of SRE observability engineering experience.
  • 6+ years of experience in observability best practices working with Dynatrace or similar tools, delivering solutions across all environments, and integrating platforms and applications with monitoring and APM tools.
  • Knowledge of CI/CD tools such as Puppet, Jenkins, Terraform, Ansible.
  • Should have a minimum 4 to 5 years' working experience in OpenShift and Docker/K8s.
  • Proficiency in implementing monitoring and observability solutions using GCP monitoring services such as Cloud Monitoring, Logging, and Tracing.
  • Deep understanding of IT infrastructure monitoring and observability best practices.
  • Experience with gathering and organizing large amounts of data to use for instrumentation into an Enterprise monitoring solution.
  • Experience with recommending baseline monitoring thresholds and performance monitoring KPIs and SLAs.
  • Experience of at least 4 + years of experience in development of Grafana Dashboards, develop Metrics / monitoring Standardization - Metrics, collection, Dashboards with Grafana a must.
  • 3-5 years of experience with SQL and familiarity with at least one managed Kubernetes platforms (EKS, AKS, GKE).
  • Strong background in software engineering, with expertise in relevant programming languages (like Python, Java, Go) and cloud platforms (like AWS, GCP, Azure).
  • Experience with container orchestration tools like Kubernetes.
Competencies and Skills:
  • Strong interpersonal, and organizational skills.
  • Strong verbal and written skills.
  • Attention to detail.
  • Excellent time management.
  • Extraordinary teamwork and collaborative skills.


  • Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    Unlock the Power of Cloud OperationsThomson Reuters is seeking a skilled Site Reliability Engineer to join our team. As a key member of our Cloud Operations team, you will be responsible for ensuring the reliability and performance of our cloud-based services.About the RoleWe are looking for a highly motivated and experienced Site Reliability Engineer who...


  • Ciudad de México, Ciudad de México Svitla Systems A tiempo completo

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Svitla Systems. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Responsibilities:Design and implement automation to reduce toil and improve...


  • Ciudad de México, Ciudad de México Ellation US A tiempo completo

    About the RoleAs a Staff Site Reliability Engineer for the Data Engineering team at Crunchyroll, you will be responsible for maintaining and enhancing the reliability of our data infrastructure. Your work will directly impact the availability and performance of our data services, enabling the organization to make better decisions.Crunchyroll is growing and...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleIn this opportunity as a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our cloud-based infrastructure. This includes designing, implementing, and maintaining scalable and secure systems that meet the needs of our business.Key ResponsibilitiesDesign and implement scalable and secure...


  • Ciudad de México, Ciudad de México Oracle A tiempo completo

    Site Reliability and Automation ExpertiseWe are seeking a seasoned site reliability engineer to join our team at Oracle. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our critical systems.Key ResponsibilitiesSolve complex problems related to Linux infrastructure and Oracle...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México FICO A tiempo completo

    Site Reliability Engineering - Engineer IIAt FICO, we're seeking a skilled Site Reliability Engineer to join our global team responsible for providing 24x7 operational support of our Cloud, SaaS, ASP, and hosted solutions. As a Site Reliability Engineer, you'll have the opportunity to work with a diverse set of technologies and resolve incidents that can...


  • Ciudad de México, Ciudad de México Thales A tiempo completo

    About the RoleThales is seeking a highly skilled Site Reliability Engineer L2 to join our team. As a key member of our organization, you will be responsible for ensuring the reliability, availability, and performance of our large-scale ODC services.You will work closely with our development teams to design, build, and maintain scalable and reliable...


  • Ciudad de México, Ciudad de México Azka IT Consulting A tiempo completo

    Azka IT Consulting is a leading IT talent connector in Latin America and the United States.We are seeking a skilled Site Reliability Engineer to join our team.Job RequirementsThe ideal candidate will have experience in programming languages such as Java, Python, Perl, or Ruby.Familiarity with production supervision systems and monitoring tools is also...


  • Ciudad de México, Ciudad de México F5 A tiempo completo

    At F5, we're looking for a skilled Site Reliability Engineer III to join our team. As a software engineer specializing in site reliability, you will bring a software engineering and automated solution mindset to your work.The Site Reliability Engineer III will be responsible for ensuring the reliability, availability, and scalability of critical systems,...

  • Site Reliability Engineer

    hace 3 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleIn this exciting opportunity, we're seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our technical operations environment, you will be responsible for the design, testing, delivery, support, and maintenance of production services.Key Responsibilities:• Provide skilled technical support and delivery...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México 1210 Kyndryl Mexico S. de R.L. de C.V. A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Kyndryl Mexico. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, resiliency, and innovation of our information systems and ecosystems.Key ResponsibilitiesDesign and implement application monitoring to ensure reliability and...


  • Ciudad de México, Ciudad de México Medallia A tiempo completo

    About the RoleMedallia is a pioneer in Experience Management, and we're looking for a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our global SaaS platform.Key ResponsibilitiesEducate application and infrastructure management teams on SRE best...


  • Ciudad de México, Ciudad de México Epam A tiempo completo

    About the RoleWe are seeking a skilled Site Reliability Engineer to join our team at EPAM. As a key member of our infrastructure team, you will be responsible for designing, building, testing, and deploying changes to our existing software.ResponsibilitiesGuide teams in designing, building, testing, and deploying changes to existing softwareEnhance the...


  • Ciudad de México, Ciudad de México Crunchyroll, LLC A tiempo completo

    About CrunchyrollAt Crunchyroll, we're committed to delivering the art and culture of anime to our global community. As a Staff Site Reliability Engineer on our Data Engineering team, you'll play a pivotal role in ensuring the reliability, scalability, and performance of our data infrastructure.About the RoleWe're looking for a highly skilled engineer to...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México Epam A tiempo completo

    About the RoleWe are seeking a skilled Site Reliability Engineer to join our team at EPAM. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure.ResponsibilitiesDesign, build, test, and deploy changes to our existing cloud infrastructureEnhance the security...

  • Site Reliability Engineer

    hace 4 semanas


    Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleIn this opportunity as a Site Reliability Engineer, you will be responsible for delivering high-quality solutions for SRE teams. Your primary focus will be on ensuring the reliability, scalability, and performance of cloud-based systems.Develop and implement automation scripts to improve efficiency and reduce downtime.Collaborate with...

  • Site Reliability Engineer

    hace 2 semanas


    Ciudad de México, Ciudad de México Thales A tiempo completo

    Thales is a global leader in digital security. Our solutions empower organizations to securely interact with people, objects, and services. As a Site Reliability Engineer, you will contribute to the development and maintenance of our large-scale ODC services. Your focus will be on ensuring the reliability, availability, and performance of these systems. This...


  • Ciudad de México, Ciudad de México Thomson Reuters A tiempo completo

    About the RoleThis is an exciting opportunity to work as a Senior Site Reliability Engineer at Thomson Reuters. As a key member of our team, you will be responsible for designing, implementing, and maintaining highly available and scalable systems that meet the needs of our business.Key Responsibilities:Develop and maintain monitoring and alerting systems to...


  • Ciudad de México, Ciudad de México Medallia A tiempo completo

    About the RoleMedallia is the pioneer and market leader in Experience Management. Our award-winning SaaS platform, Medallia Experience Cloud, leads the market in understanding and managing experience for candidates, customers, employees, patients, citizens, and residents.We are committed to creating a culture that values every person and every experience....


  • Ciudad de México, Ciudad de México Trax A tiempo completo

    About TraxTrax is a global company that enables brands and retailers to harness the power of digital technologies to produce the best shopping experiences imaginable. Our retail platform allows customers to understand what is happening on shelf, in every store, all the time, so they can focus on what they do best – delighting shoppers.Job DescriptionThe...