Senior Cloud Reliability Engineer

hace 7 días


San Pedro Garza García, Nuevo León, México SAP A tiempo completo

Unleash Your Potential

SAP is at the forefront of innovation, empowering over four hundred thousand clients globally to collaborate more effectively and leverage business insights with greater efficacy. Renowned for our leadership in enterprise resource planning (ERP) software, SAP has transformed into a premier provider of comprehensive business application software and associated services in database management, analytics, intelligent technologies, and experience management. As a cloud-centric organization with two hundred million users and a workforce exceeding one hundred thousand employees worldwide, we are driven by purpose and focused on the future, fostering a highly collaborative team culture and a commitment to individual growth. Our mission is to connect global industries, individuals, and platforms, ensuring that every challenge is met with the right solution. At SAP, you have the opportunity to unleash your potential.

Global Cloud Infrastructure & Delivery (GCID) is responsible for developing and delivering cloud infrastructure and operational services to SAP Lines of Business (LoB) and, by extension, our external clientele. We facilitate the cloud adoption journey for LoBs and their customers across four major hyperscaler public clouds and SAP's Infrastructure-as-a-Service offerings.

Service Reliability Engineering (SRE) is a vital team within the GCID organization, dedicated to ensuring the reliability and availability of SAP cloud services—both internal and external. This is achieved through the development and enhancement of observability tools designed to prevent or isolate incidents. The SRE team proactively automates and optimizes processes, operating globally in a follow-the-sun model.

We are seeking a Senior AI Observability Engineer (SRE) who will focus on both the software and hardware layers of our global operations.

Role Overview:
You will become a part of a global and multidisciplinary SRE team of DevOps engineers, contributing to the creation of AI solutions that drive a suite of diverse observability services utilizing Machine Learning and Large Language Models. This position involves rethinking our approach to managing alerts, metrics, and logs by integrating deep learning and natural language processing to enhance our reliability services. You will also play a key role in troubleshooting major incidents related to our global cloud infrastructure, ensuring excellence in triage and resolution. Your contributions will help the team reduce critical KPIs related to Mean Time to Detect (MTTD), Mean Time to Recovery (MTTR), Signal to Noise Ratio, and other relevant metrics through these advanced methodologies.

Key Responsibilities:

  • Collaborate with engineering and product management teams, adhering to Agile methodologies such as SCRUM.
  • Prioritize and deliver high-quality developments within tight deadlines.
  • Ensure seamless operations and maximize service uptime.
  • Participate in on-call rotational coverage, including weekends and holidays, with compensation aligned with local policies. We operate on a global follow-the-sun model with local daytime coverage.
  • Share knowledge and expertise across the team.
  • Engage in data analysis and generation activities.
  • Support AI research and development initiatives.
  • Train and fine-tune AI models.

Required Qualifications:

  • Rapid adaptation to cutting-edge technologies.
  • Advanced analytical and problem-solving capabilities.
  • Strong team player with exceptional communication skills.
  • Self-motivated individual who acts with urgency to efficiently and effectively address issues.
  • Proficient in spoken and written English.

Required Experience:

  • Development: A minimum of 4 years of experience in professional or enterprise development. Strong proficiency in Python and JavaScript programming languages. Proven experience in REST API implementation using Flask or FastAPI. Familiarity with microservice-based development.
  • DevOps: Understanding of CI/CD pipelines using Azure, Jenkins, Travis, or similar tools. Hands-on experience with Docker containers and Kubernetes. Experience with public cloud environments such as GCP, AWS, or Azure. Solid understanding of JSON, YAML, and GitHub. Strong familiarity with enterprise-class fault monitoring and performance management tools.
  • Artificial Intelligence: Experience with ML frameworks like PyTorch, TensorFlow, or similar. Knowledge in prompt engineering, large language models, retrieval-augmented generation (RAG), and embeddings. Good understanding of machine learning supervised and unsupervised models, as well as algorithms, data structures, and data patterns.
  • Education: Bachelor's degree or equivalent in Software Engineering, Computer Science, or a related field.

Preferred Qualifications:

  • Familiarity with knowledge graphs, graph databases, and graph theory.
  • Experience with Elasticsearch, Splunk, or similar technologies.
  • Experience in web development frameworks.
  • Knowledge of Terraform, HelmChart, Ansible, or similar tools.
  • Understanding of Kubeflow, MLFlow, Dataflow, or similar technologies.
  • Industry technical certifications (CKA, Elastic Certified Engineer, RHCE, CCNA, AZ-900, etc.) and ITIL-related coursework are advantageous.

Unleash Your Potential

SAP is at the forefront of innovation, empowering over four hundred thousand clients globally to collaborate more effectively and leverage business insights with greater efficacy. Renowned for our leadership in enterprise resource planning (ERP) software, SAP has transformed into a premier provider of comprehensive business application software and associated services in database management, analytics, intelligent technologies, and experience management. As a cloud-centric organization with two hundred million users and a workforce exceeding one hundred thousand employees worldwide, we are driven by purpose and focused on the future, fostering a highly collaborative team culture and a commitment to individual growth. Our mission is to connect global industries, individuals, and platforms, ensuring that every challenge is met with the right solution. At SAP, you have the opportunity to unleash your potential.



  • San Pedro Garza García, Nuevo León, México SAP A tiempo completo

    About SAPSAP is a leading provider of enterprise software solutions, helping over 400,000 customers worldwide to work more efficiently and make better business decisions. Our company culture is built on collaboration, innovation, and a passion for helping others.Our MissionWe are committed to empowering our customers to achieve their goals by providing...


  • San Pedro Garza García, Nuevo León, México Sap Se A tiempo completo

    **About Us**SAP SE is a global leader in enterprise software and software-related services. Our company culture is built on collaboration, innovation, and a shared passion to help businesses run better. We foster a workplace that values diversity, flexibility, and a purpose-driven approach to work.**Our Team**Global Cloud Infrastructure & Delivery (GCID) is...


  • San Pedro Garza García, Nuevo León, México SAP A tiempo completo

    About SAPSAP is a leading provider of enterprise software solutions, helping over 400,000 customers worldwide to work more efficiently and effectively. With a strong presence in enterprise resource planning (ERP) software, SAP has evolved to become a market leader in end-to-end business application software and related services for database, analytics,...


  • San Pedro Garza García, Nuevo León, México Clarios A tiempo completo

    Job Summary:We are seeking a highly motivated Cloud Engineer Associate to join our team at Clarios. As a key member of our engineering team, you will play a crucial role in designing and implementing cloud architectures, developing and deploying Python scripts and AI models, and providing technical support to our platform users.Key Responsibilities:Cloud...


  • San Pedro Garza García, Nuevo León, México Chubb A tiempo completo

    About the RoleChubb is undergoing a significant technology transformation to expand its lead in a market enhanced by technology. We are seeking a talented Information Technology professional to join the Personal Risk Services (PRS) in a senior engineering position.Key ResponsibilitiesDeveloping Front-End Web ApplicationsUtilize expertise in front-end...


  • San Pedro Garza García, Nuevo León, México SAP A tiempo completo

    About SAPSAP is a leading provider of enterprise software solutions, helping over 400,000 customers worldwide to work more efficiently and effectively. With a strong focus on innovation, SAP has evolved from its roots in enterprise resource planning (ERP) software to become a market leader in end-to-end business application software and related...


  • San Pedro Garza García, Nuevo León, México SAP SE A tiempo completo

    About the Role:We are seeking a highly skilled Cloud Systems Integration Engineer - Observability Specialist to join our global team specializing in Reliability Engineering and Services. As a key member of our team, you will collaborate with DevOps Engineers and Operational Experts to develop and implement cutting-edge Observability solutions powered by Big...


  • San Pedro Garza García, Nuevo León, México Sap Se A tiempo completo

    Job SummaryWe are seeking a skilled Cloud Operations Engineer to join our dynamic IT team at SAP SE. As a crucial member of our team, you will play a key role in promoting operational excellence through leading improvements and innovation.Key ResponsibilitiesLiaison with Cloud Providers: Act as the primary interface between our internal teams and cloud...


  • San Pedro Garza García, Nuevo León, México SAP A tiempo completo

    About the Role:We are seeking a highly skilled Windows Senior DevOps Engineer to join our team at SAP. As a key member of our infrastructure team, you will be responsible for designing, implementing, and managing Windows-based systems and infrastructure to support our business applications and services.Key Responsibilities:Windows Clustering: Design,...


  • San Pedro Garza García, Nuevo León, México SAP A tiempo completo

    About the Role:The Cloud Compliance Engineer will play a crucial part in ensuring the security compliance lifecycle of cloud infrastructure, services, and products within the scope of SAP's Global Cloud Infrastructure and Delivery. This role will involve collaborating with cross-functional teams, including IT, Legal, and Finance, to address technology and...


  • San Pedro Garza García, Nuevo León, México Sap Se A tiempo completo

    About UsSAP SE is a leading provider of enterprise software solutions, helping over 400,000 customers worldwide to work more efficiently and effectively. With a strong focus on innovation and customer satisfaction, we strive to make a positive impact on the world.Job SummaryWe are seeking a highly skilled Cloud Security DevOps Engineer to join our team. As a...


  • San Pedro Garza García, Nuevo León, México SAP A tiempo completo

    Unlock Your Potential as a Windows Senior DevOps EngineerSAP is seeking a highly skilled Windows Senior DevOps Engineer to join our team. As a key member of our operations team, you will be responsible for designing, configuring, and managing Windows Server Failover Clustering (WSFC) for high availability and disaster recovery. Your expertise in Active...


  • San Pedro Garza García, Nuevo León, México SAP SE A tiempo completo

    About SAP SESAP SE is a leading enterprise software company that enables businesses to run better. Our company culture is built on collaboration, innovation, and a shared passion to help organizations succeed. We focus on building a workplace that values diversity, flexibility, and a strong focus on learning and development.Our Global Cloud Infrastructure...


  • San Pedro Garza García, Nuevo León, México Sap A tiempo completo

    About UsSAP is a global leader in enterprise software and software-related services. Our company culture is focused on collaboration and a shared passion to help the world run better.Our MissionWe focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our...


  • San Pedro Garza García, Nuevo León, México SAP A tiempo completo

    About the RoleWe are seeking a highly skilled Cloud Engineer Associate to join our team at SAP. As a key member of our Hyperscaler Engineering team, you will play a critical role in building and running our next-generation multicloud technologies on a global scale.Key ResponsibilitiesDevelop and maintain expertise in hyperscaler environments such as AWS,...


  • San Pedro Garza García, Nuevo León, México SAP SE A tiempo completo

    About SAP SESAP SE is a leading provider of enterprise software solutions, helping over 400,000 customers worldwide to work more efficiently and effectively. With a strong focus on innovation, SAP has evolved from its roots in enterprise resource planning (ERP) software to become a market leader in end-to-end business application software and related...


  • San Pedro Garza García, Nuevo León, México SAP A tiempo completo

    About the RoleWe are seeking a highly skilled Senior Cloud Analytics Developer to join our team at SAP. As a key member of our Global Cloud Infrastructure & Delivery team, you will be responsible for leveraging SAP Analytics Cloud (SAC) to develop and deliver insightful reports and data visualizations that drive informed decision-making across our...


  • San Pedro Garza García, Nuevo León, México SAP A tiempo completo

    About UsSAP is a global leader in enterprise software and related services. Our innovations help over 400,000 customers worldwide work together more efficiently and use business insight more effectively.Our MissionWe empower our customers to achieve their goals by providing end-to-end business application software and related services. Our purpose-driven and...


  • San Pedro Garza García, Nuevo León, México Sap A tiempo completo

    Transforming the Future of Cloud ComputingAt SAP, we're on a mission to revolutionize the way businesses operate in the cloud. As a member of our Hyperscaler Engineering team, you'll play a critical role in shaping the future of cloud infrastructure and delivery.What We DoWe manage SAP's cloud infrastructure, including capacity, architecture, hyperscaler...


  • San Pedro Garza García, Nuevo León, México Sap A tiempo completo

    About UsSAP is a global leader in enterprise software and software-related services. Our company culture is focused on collaboration and a shared passion to help the world run better.Our TeamGlobal Cloud Infrastructure & Delivery (GCID) is responsible for SAP's infrastructure backend & technical foundation, including state-of-the-art data centers and...