Site Reliability Engineer
hace 3 semanas
Job Family:SoftwareWe are a leading global software company dedicated to the world of computer aided design, 3D modeling and simulation— helping innovative global manufacturers design better products, faster With the resources of a large company, and the energy of a software start-up, we have fun together while creating a world class software portfolio. Our culture encourages creativity, welcomes fresh thinking, and focuses on growth, so our people, our business, and our customers can achieve their full potential.Organization OverviewThe DISW SRE organization is dedicated to enhancing service and application availability, optimizing processes by automating manual and repetitive tasks, and addressing complex technical challenges in a dynamic, collaborative, inclusive, and iterative environment. This position plays a crucial role in developing automated solutions and processes that support and sustain best-in-class cloud-based applications.Position OverviewThe candidate will support the Siemens Xcelerator platform and will be for coordinating major incident response, maintaining stakeholder communication during service-impacting events, and facilitating resolution in compliance with service level agreement (SLA). Strong communication & coordination skills are necessary to support core objectives. This roles success will be defined by product teams within DISW business units meeting their SLAs.Responsibilities/TasksIncident Management:Act as the primary point of contact and leader during major incidents, coordinating the response, communication, and resolution efforts across all involved teams.Incident Response:Quickly assess the severity of incidents, determine the impact, and drive the appropriate response to restore services as quickly as possible.Communication:Ensure clear, concise, and timely communication with stakeholders, including technical teams, management, and customers, throughout the incident lifecycle.Post-Incident Analysis:Lead post-incident reviews to identify root causes, drive improvements, and implement preventive measures to reduce the likelihood of recurrence.Collaboration:Work closely with SRE, DevOps, Development, and other relevant teams to ensure that incident management processes are well-defined and continuously improved.Training & Preparedness:Conduct regular incident response drills, train teams on incident management processes, and ensure readiness for handling high-severity incidents.Documentation:Maintain and update incident management documentation, ensuring that all procedures are up-to-date and accessible to all relevant teams.Monitoring & Alerts:Collaborate with SRE and monitoring teams to define and refine alerting criteria, ensuring that incidents are detected and escalated promptly.Continuous Improvement:Identify opportunities to improve system reliability, scalability, and performance based on lessons learned from incidents.24x7 On-call rotation:Participate in 24x7 on-call rotation.Required Knowledge/Skills, Education, and ExperienceCommunication:Outstanding English communication skills, both verbal and written, as well as, listening and synthesis skills.Incident Response:Quickly assess the severity of incidents, determine the impact, and drive the appropriate response to restore services as quickly as possible.Problem-Solving:Excellent troubleshooting and problem-solving skills, with the ability to quickly analyze complex systems.Calm Under Pressure:Ability to remain calm, focused, and effective in high-pressure situations. The ability to make quick, confident decisions.Leadership:Demonstrated experience in leading incident response efforts and managing cross-functional teams during critical situations.Technical Skills:Familiar with Jira Service Mgmt (or equivalent ie. ServiceNOW), Datadog (or equivalent ie. Grafana), PagerDuty (or equivalent), Atlassian Statuspage (or equivalent).Driven Learner:Highly motivated and driven to learn new technologies, skillsets, and methodologies, continuously seeking to expand your knowledge and adapt to evolving industry trends.Preferred Knowledge/Skills, Education, and ExperienceStakeholder Management:Experience aligning with cross-functional teams including business and product stakeholders during and after incidents.Metrics Ownership:Ability to define and track incident-related KPIs (e.g., MTTR, MTTD) to drive accountability and improvement.Experience:Enterprise IT environment with distributed environments.Technical Skills:Familiar with cloud infrastructure (AWS, GCP, Azure), containerization (Docker, Kubernetes)Certifications:Relevant certifications (e.g., AWS Certified Solutions Architect, Certified Kubernetes Administrator) are a plus.Automation:Experience with automation tools and scripting languages (e.g., Python, Bash) to streamline incident response and remediation.Why us?Working at Siemens Software means flexibility - Choosing between working at home and the office at other times is the norm here. We offer great benefits and rewards, as you'd expect from a world leader in industrial software.A collection of over 377,000 minds building the future, one day at a time in over 200 countries. We're dedicated to equality, and we welcome applications that reflect the diversity of the communities we work in. All employment decisions at Siemens are based on qualifications, merit, and business need. Bring your curiosity and creativity and help us shape tomorrowSiemens Software.Transform the EverydaySiemens is dedicated to quality, equality, and valuating diversity, and we welcome applications that reflect the diversity of the communities within which we work.Please note that, due to the current integration framework, this opportunity is currently available exclusively to employees of Altair and DISW. While there is a possibility that the position may be made available to all Siemens employees through a future external posting, this is not guaranteed. We appreciate your understanding and cooperation during this transitional period. This communication does not constitute a promise or guarantee of future employment opportunities beyond the current scopeOrganization:Digital IndustriesJob Type:Full-timeCategory:Information Technology#J-*****-Ljbffr
-
Site Reliability Engineer
hace 3 semanas
Xico, México Quantum World Technologies Inc. A tiempo completoRole: Site Reliability Engineer (SRE) – Database Services Location: Open to LATAM About the Role We are looking for a Site Reliability Engineer (SRE) to join the Database Engineering team and contribute to the reliability, resilience, and automation of mission-critical PostgreSQL environments.This role is ideal for an SRE who wants to grow into database...
-
Site Reliability Engineer Lead
hace 3 semanas
Xico, México Royal Caribbean Group A tiempo completoCombine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group.We are proud to be the vacation-industry leader with global brands — including Royal Caribbean International, Celebrity Cruises and Silversea Cruises — the most innovative fleet and private destinations, and the best people.Royal...
-
Site Reliability Engineer
hace 3 semanas
Xico, México Quantum World Technologies Inc. A tiempo completoRole: Site Reliability Engineer (SRE) – Database Services. Location: Open to LATAM. About the Role We are looking for a Site Reliability Engineer (SRE) to join the Database Engineering team and contribute to the reliability, resilience, and automation of mission‑critical PostgreSQL environments. This role is ideal for an SRE who wants to grow into...
-
Site Reliability Engineer
hace 3 semanas
Xico, México Quantum World Technologies Inc. A tiempo completoRole: Site Reliability Engineer (SRE) – Database Services. Location: Open to LATAM. About the Role We are looking for a Site Reliability Engineer (SRE) to join the Database Engineering team and contribute to the reliability, resilience, and automation of mission‑critical PostgreSQL environments. This role is ideal for an SRE who wants to grow into...
-
Site Reliability Engineer
hace 4 semanas
Xico, México Royal Caribbean Group A tiempo completoTalent Acquisition @Royal Caribbean Group Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group. We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world. We are proud to...
-
Site Reliability Engineer
hace 4 semanas
Xico, México Royal Caribbean Group A tiempo completoTalent Acquisition @Royal Caribbean Group Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group. We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world. We are proud to...
-
Site Reliability Engineer
hace 4 semanas
Xico, México Royal Caribbean Group A tiempo completoTalent Acquisition @Royal Caribbean Group Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group. We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world. We are proud to...
-
Site Reliability Engineer
hace 3 semanas
Xico, México Royal Caribbean Group A tiempo completoPress Tab to Move to Skip to Content LinkSelect how often (in days) to receive an alert:Site Reliability EngineerJourney with us!Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group.We are proud to offer a competitive compensation and benefits package, and excellent career development...
-
Senior Site Reliability Engineer
hace 3 semanas
Xico, México Royal Caribbean Group A tiempo completoJoin to apply for the Senior Site Reliability Engineer role at Royal Caribbean Group.1 week ago Be among the first 25 applicants.Journey with us!Combine your career goals and sense of adventure by joining our incredible team at Royal Caribbean Group.We offer a competitive compensation and benefits package, along with excellent career development...
-
Site Reliability Engineer
hace 4 días
Xico, México Coderoad Inc A tiempo completoOverviewSenior Site Reliability Engineer / Observability Engineer At CodeRoad, we're more than just a software development company—we're your gateway to the global tech world. We offer end-to-end software development services and give you the opportunity to work on exciting, real-world projects in a supportive environment. Whether it's staff augmentation,...