Site Reliability Engineer with Java
hace 1 semana
**DESCRIPTION**:
Join EPAM as a remote **Site Reliability Engineer specializing in Java.**
In this role, you'll provide 24/7 on-call support for Java backend services, prepare and deploy patches, and assist in establishing top-of-the-line metrics and dashboards.
If you have 5-8 years of experience as a DevOps/SRE, proficiency in Java, and experience with Amazon DynamoDB, Amazon ElastiCache, and Amazon Web Services, we'd love to hear from you.
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
RESPONSIBILITIES
- Provide follow-the-sun, 24/7 on-call support for the entirety of the Java backend services currently owned by the customer backend - including owning API Gateway observability
- Prepare and deploy patches to the issues found both in the Java code and related service cloud infrastructure
- Assist in establishing top-of-the-line metrics and dashboards which enable this group and customer backend team to quickly identify/establish overall platform health
- Assist in establishing/improving runbooks for all EOS Backend services
- Assist in monitoring SLOs of all involved backend services submitting code changes which improve SLO as errors occur
**REQUIREMENTS**:
- 5 - 8 years of experience as DevOps/SRE
- Proficiency in coding with Java
- Experience with Amazon DynamoDB, Amazon ElastiCache, Amazon Web Services
- Experience troubleshooting complex systems efficiently using logs & telemetry - identifying and resolving root causes
- Able to communicate operational issues clearly and concisely in writing as part of live incident response
- Motivated to track and improve SLO across several systems through repeatable processes
WE OFFER
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- International Mobility Plan within 25 countries
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days plus 4 floating days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Relocation bonus: transportation, 2 weeks of accommodation for you and your family and more
- Monthly non-taxable amount for the electricity and internet bills
CONDITIONS
-
Site Reliability Engineer
hace 2 semanas
Desde casa, México Right Balance A tiempo completo**Overview** We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA. **Engagement Details** Our client is a USA-based company producing video solutions with the mission to advance scientific...
-
Lead Site Reliability Engineer
hace 36 segundos
Desde casa, México Tekshapers Inc A tiempo completo**Position : Lead Site Reliability Engineer** **Location : Remote** **Duration : Contract** - Lead and mentor a team of SREs to ensure operational excellence and maximize the reliability and availability of client systems. - Minimum 10 years of work experience in DevOps/SRE, including leadership roles. - Architect and design highly scalable and available...
-
Senior Site Reliability Engineer
hace 3 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are seeking an experienced **Senior Site Reliability Engineer**to join our team.As a key member of the Reliability Tooling team, you will be responsible for writing and reviewing code, contributing to critical technical decisions, and mentoring engineers within your squad. This role requires a deep understanding of SRE principles and best practices, as...
-
Lead Site Reliability Engineer
hace 3 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are looking for an experienced **Lead Site Reliability Engineer**to join our team.In this role, you will play a pivotal part in the Reliability Tooling team, taking responsibility for writing and reviewing code, making key technical decisions, and mentoring engineers within your squad. This position requires a strong grasp of SRE principles and best...
-
Lead Site Reliability Engineer
hace 4 horas
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are seeking a Lead Site Reliability Engineer to join our team.In this role, you will help drive the reliability and performance of critical systems for a leading client. You will work in a collaborative environment focused on innovation and operational excellence. Please note, the client operates in the US Central Time Zone from 8 am CST to 5 pm...
-
Lead Site Reliability Engineer
hace 4 minutos
Desde casa, México EPAM Systems, Inc. A tiempo completoWe are looking for an experienced **Site Reliability Engineer (SRE)** to take a leadership role in ensuring the stability, scalability, and performance of our cloud infrastructure on **Google Cloud Platform (GCP)**. As an SRE, you will be at the forefront of optimizing system reliability, automating processes, and collaborating with engineering teams to...
-
Site Reliability Engineer
hace 3 semanas
Desde casa, México Synechron A tiempo completoSynechron is a self-funded, leading digital transformation Consulting firm focused on the financial services industry working to accelerate digital initiatives for Banks, Asset Managers and Insurance. We achieve this by providing our clients with innovative solutions that solve their most complex business challenges and combining Synechron’s unique,...
-
Site Reliability Engineer Iii
hace 2 semanas
Desde casa, México Cabify A tiempo completoDo you want to change the world? At Cabify, that’s what we’re doing. We aim to make cities better places to live by improving mobility for the people living in them, connecting riders to drivers, providing mobility alternatives such as scooters and mopeds and many others to come, all at the touch of a button. Maybe one day cities will be places where...
-
Site Reliability Engineer
hace 2 semanas
Desde casa, México EPAM Systems, Inc. A tiempo completoJoin our team as a **Site Reliability Engineer,** where you will focus on cloud infrastructure, containerization, and monitoring using Kubernetes and Microsoft Azure.**Responsibilities**- Deploy and maintain Kubernetes resource manifests in clusters such as Kind, GKE, or AKS- Troubleshoot and analyze logs to identify and resolve system events and issues-...
-
Java Software Engineer with GCP
hace 4 horas
Desde casa, México EPAM Systems A tiempo completoWe are looking for a skilled Software Engineer with strong Java development experience and hands-on expertise in Google Cloud Platform (GCP). The ideal candidate is comfortable building, deploying, and maintaining cloud-native services and enjoys working in a collaborative, fast-paced environment. Experience with modern data and API technologies is a strong...