Lead Site Reliability Engineer
hace 2 semanas
Journey with us
Combine your career goals and sense of adventure by joining our incredible team of employees at
Royal Caribbean Group
. We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world.
We are proud to be the vacation-industry leader with global brands — including Royal Caribbean International, Celebrity Cruises and Silversea Cruises — the most innovative fleet and private destinations, and the best people. Together, we are dedicated to turning the vacation of a lifetime into a lifetime of vacations for our guests.
Royal Caribbean Group's
Global eCommerce
has an exciting career opportunity for a full time
Lead Site Reliability Engineer
reporting to the
Sr. Manager, Site Reliability Engineer
This position will work on-site in Mexico City .
Position Summary:
Lead Site Reliability Engineer (Lead SRE) will assist the SRE Manager in support of the Royal Caribbean website ($183M gross revenue in 2021) using application and user performance data to guide informed decision making. The Lead SRE will use site performance metrics collected by various sources and tools to support the following tasks: the initial triage of critical production incidents, analysis of bugs, implementing best practices in site reliability engineering, optimizing infrastructure, ensuring seamless collaboration between internal teams and external service providers, among other operational initiatives.
Essential Duties and Responsibilities
:
Critical Incident Support
- Review ticket analysis and approve closure of tickets/incidents
- Understands architecture of Royal website and escalates incidents as needed to the appropriate team for further triage.
- Synthesizes and communicates incident details to the production team, stakeholders, including executive level stakeholders.
- Review postmortem / RCA document and follow up
Monitor and Optimize Systems
- Builds case for prioritizing bug and enhancement tickets
- Create reports on new deployment build performance for product teams to ensure quality.
Ensure System Reliability and Performance
- Adjust health thresholds and other monitoring settings based on historical performance.
- Creates and maintains performance dashboards used by support and product teams.
- Maintains alerting, communication, and documentation tool chain to ensure it is up to date and efficient.
Collaboration with Cross-Functional Teams
Establish and maintain clear communication channels (e.g., Slack, Teams) with the scrum and marketing teams.
Ensure all team members are informed about relevant updates and changes that may affect the website.
Qualifications, Knowledge and Skills
:
Experience
- Minimum Years of Experience
: 10+ years in Site Reliability Engineering (SRE), DevOps, or a related IT operations role. - Management Experience:
At least 3 years of experience managing teams and collaborating with external service providers.
Skills and Abilities
Technical Expertise:
- Proficiency in cloud platforms such as AWS, AWS Elastic Beanstalk.
- Understanding of API design principles: REST, SOAP, Graph
- Advanced knowledge of monitoring and logging tools (AppDynamics, DataDog,
- Splunk, New Relic, etc).
Problem-Solving Skills:
- Strong analytical and troubleshooting skills to diagnose and resolve complex
- production issues swiftly.
Ability to develop and implement effective incident response plans.
Communication and Collaboration
:Excellent written and verbal communication skills for effective interaction with cross-
functional teams and documentation.
- Ability to collaborate with Development, QA, IT, and external managed service
providers to ensure seamless operations.
Education
- Bachelor's Degree:
In Computer Science, Information Technology, Engineering, or a related field.
Certifications
- Preferred Certifications
: - Any monitoring and alerting tools equivalent to certification
- Any certification or equivalent knowledge of IT service management
We know there's a lot to consider.
As you go through the application process, our recruiters will be glad to provide guidance, and more relevant details to answer any additional questions. Thank you again for your interest in Royal Caribbean Group. We'll hope to see you onboard soon
It is the policy of the Company to ensure equal employment and promotion opportunity to qualified candidates without discrimination or harassment on the basis of race, color, religion, sex, age, national origin, disability, sexual orientation, sexuality, gender identity or expression, marital status, or any other characteristic protected by law. Royal Caribbean Group and each of its subsidiaries prohibit and will not tolerate discrimination or harassment.
LI-SS1-
Lead Site Reliability Engineer
hace 5 días
Ciudad de México, Ciudad de México Pathlock A tiempo completo $900,000 - $1,200,000 al añoAbout Pathlock:Pathlock is a leader in application security, access governance, and compliance automation. Our cloud-based solutions help organizations secure critical applications, mitigate risk, and enforce policies across a diverse IT landscape.Job Summary:Join Pathlock, a fast-growing leader in Governance, Access and Compliance, where you'll help shape...
-
Site Reliability Engineer
hace 2 semanas
Ciudad de México, Ciudad de México UST A tiempo completo $60,000 - $120,000 al añoRole DescriptionSite Reliability EngineerLead I - Software EngineeringWho We AreBorn digital, UST transforms lives through the power of technology. We walk alongside our clients and partners, embedding innovation and agility into everything they do. We help them create transformative experiences and human-centered solutions for a better world.UST is a...
-
Site Reliability Engineer
hace 7 días
Ciudad de México, Ciudad de México Thomson Reuters A tiempo completoSite Reliability Engineer - Digital Pay Team Are you passionate about the chance to bring your experience to a world-class company that is market-leading or both content and technology? If yes, we're looking for you About the roleThis role is part of a high performing team of talented Site Reliability Engineers who provide world-class support for Commercial...
-
Site Reliability Engineering
hace 20 horas
Ciudad de México, Ciudad de México Encora A tiempo completoImportant InformationExperience: +5 yearsJob Mode: Full-timeWork Mode: Work from homeJob SummaryAs a Site Reliability Engineer, you will work in a scrum DevOps team developing solutions that will help drive developer efficiency and productivity while maintaining security and governance. This is a highly visible role that will lead key design decisions on...
-
Senior Site Reliability Engineer
hace 19 horas
Ciudad de México, Ciudad de México Thomson Reuters México A tiempo completoAre you passionate about the chance to bring your experience to a world-class company that is market-leading or both content and technology? If yes, we're looking for you.Join our team Senior Site Reliability Engineer (SRE) will be implement Site Reliability Engineering and DevOps best practices. Feed non-functional requirements into the product backlog,...
-
FBS Site Reliability Engineer
hace 7 días
Ciudad de México, Ciudad de México Capgemini A tiempo completo $120,000 - $240,000 al añoOur Client is one of the United States' largest insurers, providing a wide range of insurance and financial services products with gross written premiums well over US$25 Billion (P&C). They proudly serve more than 10 million U.S. households with more than 19 million individual policies across all 50 states through the efforts of over 48,000 exclusive and...
-
Site Reliability Engineer
hace 21 horas
Ciudad de México, Ciudad de México NielsenIQ A tiempo completoCompany Description R25_0014950Remote in MexicoAs a Site Reliability Engineer, you are part of a team building and maintaining NielsenIQ's retail infrastructure in a scalable, automated, reliable and efficient manner. As core part of our retail products the platform enables our standard products to activate with goals of speed, operational efficiency,...
-
Linux Site Reliability Engineer
hace 5 días
Ciudad de México, Ciudad de México AXA Group Operations A tiempo completo $850,000 - $1,700,000 al añoMain missionsBeing part of our global team as a Linux Engineer and become a key member of the SRO Squad (Site Reliability Operations), collaborating with a diverse group of experts to ensure robust and secure Linux (RHEL) infrastructure worldwide.Engineer (Build) and test solutions, document accordingly and handover to operations team. Provide 3rd level...
-
Site Reliability Developer 3
hace 2 semanas
Ciudad de México, Ciudad de México Oracle A tiempo completo GBP90,000 - GBP120,000 al añoDescriptionSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems....
-
Site Reliability Engineer
hace 7 días
Santiago de Querétaro, Querétaro de Arteaga, México RELEX Solutions A tiempo completo $450,000 - $550,000 al añoTechnical Service Consultant/Site Reliability EngineerBased at: RELEX office in MexicoEmployment type: Permanent, full-timeTravel: Some ad hoc travel to client sites and the Atlanta office may be requiredThe RELEX team in the Americas is growing, and we're now looking for a Technical Consultant/Site Reliability Engineer. You'll join our global continuous...