Site Reliability Engineer
hace 3 días
**Overview**We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA.**Engagement Details**Our client is a USA-based company producing video solutions with the mission to advance scientific research and education. Their institutional clients comprise over 1,000 universities, colleges, and biopharma companies, including such leaders as Harvard, MIT, Yale, and Stanford. As a rapidly growing company, with offices in the USA, UK, Australia, and India servicing clients in over 60 countries, our client is seeking talented individuals to join their company.Our client is looking for an amazing Site Reliability Engineer who will be part of their centralized Site Reliability Team. You will play an integral role in the deployment of highly scalable systems, optimization, documentation, and support of the infrastructure components of their software products hosted on AWS. Cloud Infrastructure and Operations are critical in enabling us to provide users with their technology offerings.**What’s in it for you?**- Learn and evolve your skills using the latest and greatest technology tools in a rapidly growing company.- Learn from the best people around you. We constantly challenge the status quo and invent new ways of building a great product.- 100% remote. Work anywhere, whether it is remotely in the comfort of your home, in a shared co-working space, in an RV on the beach, or while being a nomad in another country.- Work on challenging problems, innovate, and positively impact many people's lives while having fun doing it.**Required Qualifications**- Upper-intermediate to fluent speaking and writing English. Able to have a real-time conversation.- 3+ years of full-time hands-on Site Reliability Engineer experience.- 3+ years of full-time hands-on DevOps experience.- 3+ years of full-time hands-on AWS experience.- 2+ years of full-time hands-on Docker experience.- 2+ years of full-time hands-on Kubernetes experience.- 2+ years of full-time hands-on IAC (Infrastructure as code) experience.- 2+ years of full-time hands-on Software Developer experience.- 2+ years of full-time hands-on Javascript experience.- 1+ years of full-time hands-on Terraform experience.- 1+ years of full-time hands-on PHP experience.- Extensive in-depth experience with cloud-based provisioning, monitoring, troubleshooting, and related SRE and DevOps technologies, in addition to networking knowledge.- MUST have working experience with cloud-native infrastructure such as AWS or GCP (ideally AWS).- MUST understand AWS VPC, subnets, Network ACLs, Security Groups, IAM Role, and EKS.- Experience configuring Kubernetes RBAC Authorization, Ingress controller, ServiceAccount, and AWS role annotations.- Strong Experience with CI/CD automation and configuration management.- Experience with monitoring, and observability systems such as New Relic, DataDog, Grafana, Kibana, CloudWatch, and Kafka.- Ability to triage and resolve incidents and lead incident investigations.- Experience with security practices, credential rotations, and secrets management systems like the Vault project.- Must be able to ensure Agile/Scrum concepts and principles are adhered to and be a voice of reason.- Experience working in a 24/7 on-call, highly transactional, or streaming production environment.**Nice to Haves**- Working knowledge of GitOps, FluxCD, or ArgoCD.- Building Kubernetes Operator is a plus.- Go (programming language) expertise.- Crossplane experience.- Bachelor’s degree in Computer Science or equivalent demonstrated ability.**Frequently Asked Questions**- What are your typical clients?_The majority of our clients are venture-backed startups at the growth stage. Usually, at this stage, the company already achieved a product-market fit and is looking to expand rapidly. That’s where we bring the best engineering practices, strong architecture, the latest technologies, and consistent processes to help companies scale.- What is the length of your engagements?_Most of our long-term full-time engagements last multiple years. It allows you to evolve your career with the client company taking on more responsibilities.- What’s your company size?_The Right Balance team is 60+ engineers going to 100+ by the end of the year. The current client size team is 584+ people. The timing is great to be a part of a rapidly growing team making meaningful contributions.- What happens if the engagement is completed?_Most of our engagements are long-term in nature. That said, if the current engagement is ramping down, we’ll present you with more long-term opportunities to transition into.- What are your core values?_- Client First: we only win when our clients win. We treat client challenges as our own.- Ownership: we embrace responsibility, taking on challenges, getti
-
Site Reliability Engineer
hace 1 semana
Desde casa, México thegetch mexico A tiempo completo**Función: Site Reliability Engineer****Aperturas: más de 10 contrataciones****Ubicación: - any city with TCS Office presence (Queretaro, Guadalajara, Mexico City or Monterrey)****Salario:- 25-33 USD/hr****Comunicación en inglés: avanzado****Experiência: 4+ años****Responsabilidades de Site Reliability Engineer**:Reúna y analice métricas de sistemas...
-
Site Reliability Engineer
hace 4 días
Desde casa, México Right Balance A tiempo completo**Overview** We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA. **Engagement Details** Our client is a USA-based company producing video solutions with the mission to advance scientific...
-
Senior Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are seeking an experienced **Senior Site Reliability Engineer**to join our team. As a key member of the Reliability Tooling team, you will be responsible for writing and reviewing code, contributing to critical technical decisions, and mentoring engineers within your squad. This role requires a deep understanding of SRE principles and best practices, as...
-
Lead Site Reliability Engineer
hace 3 semanas
Desde casa, México Tekshapers Inc A tiempo completo**Position : Lead Site Reliability Engineer****Location : Remote****Duration : Contract**- Lead and mentor a team of SREs to ensure operational excellence and maximize the reliability and availability of client systems.- Minimum 10 years of work experience in DevOps/SRE, including leadership roles.- Architect and design highly scalable and available...
-
Lead Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are looking for an experienced **Lead Site Reliability Engineer**to join our team. In this role, you will play a pivotal part in the Reliability Tooling team, taking responsibility for writing and reviewing code, making key technical decisions, and mentoring engineers within your squad. This position requires a strong grasp of SRE principles and best...
-
Site Reliability Engineer
hace 1 día
Desde casa, México Luxoft A tiempo completo**Project description**: Do you like to work with existing and new software product development teams? This position is to instrument end-to-end observability and visibility for business-critical systems with log ingestion, metrics, and traces. You will function as a site reliability engineer (SRE) that will collaborate with product teams, infrastructure...
-
Senior Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems.**Responsibilities**- Troubleshoot and resolve complex incidents to maintain system uptime- Ensure reliability and performance of Azure-based enterprise infrastructure- Implement observability, monitoring, and logging solutions-...
-
Lead Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems.**Responsibilities**- Resolve complex incidents to ensure system availability- Maintain reliability and performance of Azure-based enterprise infrastructure- Deploy observability, monitoring, and logging tools- Automate...
-
Senior Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems. **Responsibilities** - Troubleshoot and resolve complex incidents to maintain system uptime - Ensure reliability and performance of Azure-based enterprise infrastructure - Implement observability, monitoring, and logging...
-
Lead Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems. **Responsibilities** - Resolve complex incidents to ensure system availability - Maintain reliability and performance of Azure-based enterprise infrastructure - Deploy observability, monitoring, and logging tools - Automate...