Staff Site Reliability Engineer
hace 1 semana
**About Crunchyroll**:
WE HELP EVERYONE BELONG. IT'S OUR PURPOSE.
Founded by fans, Crunchyroll delivers the art and culture of anime to a passionate community. We super-serve over 100 million anime and manga fans across 200+ countries and territories, and help them connect with the stories and characters they crave. Whether that experience is online or in-person, streaming video, theatrical, games, merchandise, events and more, it's powered by the anime content we all love.
Join our team, and help us shape the future of anime
**Who We Are**:
We're a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our collection of brands.
**About the Team**:
The Site Reliability Engineering (SRE) team is dedicated to ensuring the reliability, scalability, and performance of our data infrastructure. We focus on standardizing and implementing monitoring and alerting across all datastores to track key metrics like errors, latency, and throughput, and to ensure critical systems are covered. Our team also leads efforts to keep databases up-to-date, implements Infrastructure as Code (IaC) for high availability and performance, and automates key processes to enhance operational efficiency.
We lead and evangelize the principle of 100% automation. Additionally, we define and document operational requirements, develop incident response processes, and automate monitoring and compliance checks to maintain a secure and reliable data environment. By continuously improving load testing and optimizing data governance practices, we support the overall health and efficiency of our data systems.
**About the Role**
Crunchyroll is growing and changing, presenting unique challenges and opportunities to support millions of anime fans around the world. The Data Engineering team provides seamless help to our internal stakeholders, ensuring an exceptional experience for all Crunchyroll fans.
As a Staff Site Reliability Engineer for the Data Engineering team, you will be responsible for maintaining and enhancing the reliability of our data infrastructure. Your work will directly impact the availability and performance of our data services, enabling the organization to better decisions. You will collaborate closely with data engineers, and software engineers to develop and drive 100% automation, best practices for deep monitoring and alerting. This role will report to our Director of Data Engineering and will be based out of our Mexico City office.
**About You**:
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- 12+ years of experience in site reliability engineering, database operations, or a related role with a focus on data platforms, data stores, data operations.
- Extensive experience with AWS cloud platform and their data-related services.
- Proficiency in monitoring tools (e.g., Datadog, CloudWatch, DevOps Guru, DB Performance Insights).
- Proficiency in one or more programming languages (e.g. Python, Java)
- Proficiency in automation frameworks (e.g., Terraform, Cloud Formation).
- Strong understanding of various performance metrics both at a high level and at a low level like Disk/IO saturation.
- Experience in identifying and eliminating the bottlenecks in the system.
- Strong understanding of database internals like types of indexes, schemas, query plans.
- Strong understanding of database systems (e.g., SQL, NoSQL) and experience in managing large-scale data infrastructures.
- Strong understanding and hands-on implementation of CI/CD pipelines and DataOps practices.
- Experience with data governance, compliance, and lifecycle management.
- Ability to own and execute projects while effectively collaborating with the team to influence and shape the vision of the data engineering organization.
LifeAtCrunchyroll #LI-Hybrid
**About our Values**:
We want to be everything for someone rather than something for everyone and we do this by living and modeling our values in all that we do. We value
- Courage. We believe that when we overcome fear, we enable our best selves.
- Curiosity. We are curious, which is the gateway to empathy, inclusion, and understanding.
- Service. We serve our community with humility, enabling joy and belonging for others.
- Kaizen. We have a growth mindset committed to constant forward progress.
**Our commitment to diversity and inclusion**:
Our mission of helping people belong reflects our commitment to diversity & inclusion. It's just the way we do business.
We are an equal opportunity employer and value diversity at Crunchyroll. Pursuant to applicable law, we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran statu
-
Staff Site Reliability Engineer
hace 3 días
Ciudad de México, Ciudad de México Ellation US A tiempo completoAbout the RoleAs a Staff Site Reliability Engineer for the Data Engineering team at Crunchyroll, you will be responsible for maintaining and enhancing the reliability of our data infrastructure. Your work will directly impact the availability and performance of our data services, enabling the organization to make better decisions.Crunchyroll is growing and...
-
Site Reliability Engineer
hace 3 semanas
Ciudad de México, Ciudad de México Thomson Reuters A tiempo completoUnlock the Power of Cloud OperationsThomson Reuters is seeking a skilled Site Reliability Engineer to join our team. As a key member of our Cloud Operations team, you will be responsible for ensuring the reliability and performance of our cloud-based services.About the RoleWe are looking for a highly motivated and experienced Site Reliability Engineer who...
-
Staff Site Reliability Engineer
hace 7 días
Ciudad de México, Ciudad de México Crunchyroll, LLC A tiempo completoAbout CrunchyrollAt Crunchyroll, we're committed to delivering the art and culture of anime to our global community. As a Staff Site Reliability Engineer on our Data Engineering team, you'll play a pivotal role in ensuring the reliability, scalability, and performance of our data infrastructure.About the RoleWe're looking for a highly skilled engineer to...
-
Site Reliability Engineer
hace 1 mes
Ciudad de México, Ciudad de México Thomson Reuters A tiempo completoAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Thomson Reuters. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud-based...
-
Site Reliability Engineer
hace 1 mes
Ciudad de México, Ciudad de México Azka IT Consulting A tiempo completoAzka IT Consulting is a leading IT services company that connects top talent with Latin American and US companies.We are seeking a skilled Site Reliability Engineer to join our team.Job SummaryThe Site Reliability Engineer plays a critical role in designing, implementing, and maintaining highly available, scalable, and reliable systems.Key...
-
Site Reliability Engineer
hace 3 semanas
Ciudad de México, Ciudad de México Svitla Systems A tiempo completoJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Svitla Systems. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Responsibilities:Design and implement automation to reduce toil and improve...
-
Site Reliability Engineer
hace 2 meses
Ciudad de México, Ciudad de México Azka IT Consulting A tiempo completoAzka IT Consulting is a leading IT talent connector between Latin America and the United States.We are seeking a skilled Site Reliability Engineer to join our team.Job SummaryThe Site Reliability Engineer plays a critical role in designing, implementing, and maintaining highly available, scalable, and reliable systems.Key ResponsibilitiesDevelop and maintain...
-
Site Reliability Engineer
hace 4 semanas
Ciudad de México, Ciudad de México Azka IT Consulting A tiempo completoAzka IT Consulting is a leading IT services company that connects top talent with Latin American and US companies.We are seeking a skilled Site Reliability Engineer to join our team.Job SummaryThe Site Reliability Engineer plays a critical role in designing, implementing, and maintaining highly available, scalable, and reliable systems.Key...
-
Site Reliability Engineer
hace 1 mes
Ciudad de México, Ciudad de México Thales A tiempo completoJob DescriptionThales is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of our large-scale ODC services.ResponsibilitiesDesign, build, and maintain scalable and reliable infrastructure using Infrastructure as a Code...
-
Site Reliability Engineer
hace 2 meses
Ciudad de México, Ciudad de México Azka IT Consulting A tiempo completoAzka IT Consulting is a Mexican company that connects top IT talent with Latin American and United States companies.We are seeking a skilled Site Reliability Engineer to join our team.Job RequirementsThe Site Reliability Engineer plays a crucial role in designing, implementing, and maintaining highly available, scalable, and reliable systems.Technical...
-
Site Reliability Engineer
hace 3 semanas
Ciudad de México, Ciudad de México Thales A tiempo completoJob DescriptionThales is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of our large-scale ODC services.ResponsibilitiesDesign, build, and maintain scalable and reliable infrastructure using Infrastructure as a Code...
-
Site Reliability Engineer
hace 4 semanas
Ciudad de México, Ciudad de México Thales A tiempo completoJob DescriptionThales is a leading provider of digital security solutions, and we're seeking a skilled Site Reliability Engineer to join our team.About the RoleAs a Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of our large-scale ODC services. You will work closely with development teams...
-
Site Reliability Engineer
hace 3 semanas
Ciudad de México, Ciudad de México Ford Motor Company A tiempo completoJob Title: Site Reliability EngineerAt Ford Motor Company, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, configuring, and maintaining our observability solutions to ensure optimal performance and reliability of our IT systems and applications.Key...
-
Site Reliability Engineer
hace 1 mes
Ciudad de México, Ciudad de México Thomson Reuters A tiempo completoAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Thomson Reuters. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud-based...
-
Site Reliability Engineer
hace 3 semanas
Ciudad de México, Ciudad de México Epam A tiempo completoAbout the RoleWe are seeking a skilled Site Reliability Engineer to join our team at EPAM. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.ResponsibilitiesDesign, build, test, and deploy changes to existing softwareEnhance the company's IT infrastructure...
-
Site Reliability Engineer
hace 3 semanas
Ciudad de México, Ciudad de México Thomson Reuters A tiempo completoAbout the RoleIn this exciting opportunity as a Site Reliability Engineer, you will play a crucial role in ensuring the smooth operation of our cloud-based services. Your primary responsibility will be to design, test, deliver, support, and maintain production services in our technical operations environment.Key ResponsibilitiesProvide skilled technical...
-
Senior Site Reliability Engineer
hace 4 semanas
Ciudad de México, Ciudad de México Medallia A tiempo completoOverviewMedallia is a pioneer in Experience Management, offering a leading SaaS platform, Medallia Experience Cloud, that helps organizations understand and manage experiences for various stakeholders. Our mission is to create a culture that values every person and experience, fostering a diverse and inclusive workforce.The Role and TeamThe Site Reliability...
-
Site Reliability Engineer SRE
hace 4 semanas
Ciudad de México, Ciudad de México Hitachi Vantara Corporation A tiempo completoAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Hitachi Vantara Corporation. As a Site Reliability Engineer, you will be responsible for ensuring the stability and performance of our cloud infrastructure, particularly in the Azure platform.Key ResponsibilitiesManage and troubleshoot pipelines for client onboarding...
-
Site Reliability Engineer
hace 4 semanas
Ciudad de México, Ciudad de México Thomson Reuters A tiempo completoAbout the RoleAs a Site Reliability Engineer at Thomson Reuters, you will play a critical role in ensuring the reliability and scalability of our cloud-based systems. You will work closely with cross-functional teams to design, implement, and maintain high-quality software systems that meet the needs of our customers.Key ResponsibilitiesDesign and implement...
-
Site Reliability Engineer
hace 2 meses
Ciudad de México, Ciudad de México Thomson Reuters A tiempo completoAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Thomson Reuters. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud-based...