Staff Site Reliability Engineer
hace 4 semanas
About Crunchyroll WE HELP EVERYONE BELONG. IT'S OUR PURPOSE. Founded by fans, Crunchyroll delivers the art and culture of anime to a passionate community. We super-serve over 100 million anime and manga fans across 200+ countries and territories, and help them connect with the stories and characters they crave. Whether that experience is online or in-person, streaming video, theatrical, games, merchandise, events and more, it's powered by the anime content we all love. Join our team, and help us shape the future of anime Who We Are We're a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our collection of brands. About the Team The Site Reliability Engineering (SRE) team is dedicated to ensuring the reliability, scalability, and performance of our data infrastructure. We focus on standardizing and implementing monitoring and alerting across all datastores to track key metrics like errors, latency, and throughput, and to ensure critical systems are covered. Our team also leads efforts to keep databases up-to-date, implements Infrastructure as Code (IaC) for high availability and performance, and automates key processes to enhance operational efficiency. We lead and evangelize the principle of 100% automation. Additionally, we define and document operational requirements, develop incident response processes, and automate monitoring and compliance checks to maintain a secure and reliable data environment. By continuously improving load testing and optimizing data governance practices, we support the overall health and efficiency of our data systems. About the Role Crunchyroll is growing and changing, presenting unique challenges and opportunities to support millions of anime fans around the world. The Data Engineering team provides seamless help to our internal stakeholders, ensuring an exceptional experience for all Crunchyroll fans. As a Staff Site Reliability Engineer for the Data Engineering team, you will be responsible for maintaining and enhancing the reliability of our data infrastructure. Your work will directly impact the availability and performance of our data services, enabling the organization to better decisions. You will collaborate closely with data engineers, and software engineers to develop and drive 100% automation, best practices for deep monitoring and alerting. This role will report to our Director of Data Engineering and will be based out of our Mexico City office. About You Bachelor's degree in Computer Science, Information Technology, or a related field. 12+ years of experience in site reliability engineering, database operations, or a related role with a focus on data platforms, data stores, data operations. Extensive experience with AWS cloud platform and their data-related services. Proficiency in monitoring tools (e.g., Datadog, CloudWatch, DevOps Guru, DB Performance Insights). Proficiency in one or more programming languages (e.g. Python, Java) Proficiency in automation frameworks (e.g., Terraform, Cloud Formation). Strong understanding of various performance metrics both at a high level and at a low level like Disk / IO saturation. Experience in identifying and eliminating the bottlenecks in the system. Strong understanding of database internals like types of indexes, schemas, query plans. Strong understanding of database systems (e.g., SQL, NoSQL) and experience in managing large-scale data infrastructures. Strong understanding and hands‑on implementation of CI / CD pipelines and DataOps practices. Experience with data governance, compliance, and lifecycle management. Ability to own and execute projects while effectively collaborating with the team to influence and shape the vision of the data engineering organization. LifeAtCrunchyroll #LI-Hybrid About our Values We want to be everything for someone rather than something for everyone and we do this by living and modeling our values in all that we do. We Value Courage. We believe that when we overcome fear, we enable our best selves. Curiosity. We are curious, which is the gateway to empathy, inclusion, and understanding. Service. We serve our community with humility, enabling joy and belonging for others. Kaizen. We have a growth mindset committed to constant forward progress. Our Commitment to Diversity and Inclusion Our mission of helping people belong reflects our commitment to diversity & inclusion. It's just the way we do business. We are an equal opportunity employer and value diversity at Crunchyroll. Pursuant to applicable law, we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran statu #J-18808-Ljbffr
-
Site Reliability Engineer
hace 4 semanas
WorkFromHome, México Epam A tiempo completoA leading digital services company in Mexico City seeks a Site Reliability Engineer to enhance communication between operational and developmental sides of software. You will guide teams in designing, building, testing, and deploying software changes while maintaining and improving cloud infrastructure. Ideal candidates are proficient in Site Reliability...
-
Staff Data SRE
hace 4 semanas
WorkFromHome, México Crunchyroll, Llc A tiempo completoA leading media company in Veracruz, Mexico seeks a Staff Site Reliability Engineer to enhance data infrastructure reliability. This role involves collaboration with data and software engineers to develop automation and monitoring best practices. The ideal candidate has 12+ years in site reliability engineering and extensive AWS experience, ensuring...
-
Site Reliability Engineer
hace 4 semanas
WorkFromHome, México Canonical A tiempo completoSite Reliability Engineer Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and...
-
Site Reliability Engineer
hace 3 semanas
WorkFromHome, México Mastercard A tiempo completoOur Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build asustainableeconomy where everyone can prosper. We support a wide range of digital payments choices, making transactionssecure, simple, smart and accessible. Our technology and innovation, partnerships...
-
Senior Site Reliability Engineer
hace 3 semanas
WorkFromHome, México AgileEngine A tiempo completoA leading software development company in Mexico City is seeking a Site Reliability Engineer to enhance cloud-native systems. This role focuses on designing reliable infrastructure, mentoring teams in best practices, and optimizing CI/CD workflows. Ideal candidates have 8-10 years of experience, strong skills in AWS and Terraform, and a passion for driving...
-
Site Reliability Engineer
hace 4 semanas
WorkFromHome, México Canonical A tiempo completo3 days ago Be among the first 25 applicants Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's...
-
Remote Site Reliability Engineer: Kubernetes
hace 4 semanas
WorkFromHome, México Canonical A tiempo completoA leading tech firm is seeking a Site Reliability Engineer for a globally remote role. The successful candidate will enhance DevOps practices and automate infrastructure for customers utilizing Kubernetes and OpenStack. Key qualifications include a degree in software engineering, Python experience, and operational Linux skills. The position offers a...
-
Site Reliability Engineer
hace 4 semanas
WorkFromHome, México Canonical A tiempo completoCanonical is a leading provider of open‑source software and operating systems. Our flagship platform, Ubuntu, powers breakthrough enterprises in public cloud, data science, AI, engineering, and IoT. As a Site Reliability Engineer you will help make our world‑scale infrastructure robust and automated. Your work will span the full stack—from bare‑metal...
-
Senior Site Reliability Engineer
hace 2 semanas
WorkFromHome, México Canonical A tiempo completoA leading open source software company is seeking a Senior Site Reliability Engineer to enhance their cloud operations. This remote role requires Python software development expertise, experience with Linux, and operational capabilities in high-pressure environments. Successful candidates will contribute to automation and infrastructure management,...
-
Site Reliability Engineer
hace 2 semanas
WorkFromHome, México F5 Networks, Inc. A tiempo completoAt F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to modernize innovation. Reliability Engineer – Site...