Senior Azure Site Reliability Engineer

hace 2 semanas


Desde casa, México Pinnacle A tiempo completo

**Job Title**: Senior Azure Site Reliability Engineer
**Reports** **To**: Azure Site Reliability Lead

**About us**:
Welcome to Pinnacle, the ultimate destination for sports enthusiasts seeking an exhilarating sportsbook and gaming experience Established in 1998, we have solidified our position as one of the globe's foremost licensed online gaming companies. With our cutting-edge offerings, we guarantee an electrifying experience that will keep you on the edge of your seat.

Pinnacle invites you to join our team and become an instrumental figure in the exciting realm of sportsbetting. Our vibrant team is fueled by passion and driven by innovation, working together to redefine the landscape of sports betting and gaming. Together, we constantly strive to surpass limitations and deliver unparalleled experiences to sports enthusiasts worldwide. Prepare yourself for a thrilling journey and discover sports in an entirely new dimension with Pinnacle

**Role Overview**:
**Key Responsibilities**:

- Deploy and manage Azure cloud services including Virtual Machines, Storage, Redis, Azure SQL databases, virtual networks, and AKS clusters (Azure Kubernetes Service).
- Automate provisioning, configuration, and deployments using PowerShell, Bash, and Ansible.
- Deliver and deploy Azure infrastructure using Infrastructure as Code (IaC), specifically Azure bicep
- Review, Configure and implement monitoring functionalities to provide best visibility and transparency to level 1 support teams.
- Maintain system reliability using Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana, Splunk, Ops-Genie, Slack.
- Optimize performance and cost efficiency of Azure resources.
- Train junior members of the team to deliver best of breed solutions on top of Azure public cloud.
- Review, manage, and troubleshoot Azure Kubernetes Service (AKS) clusters.
- Review and Manage Cloud and On-Prem servers including AKS in terms of OS, RMQ Upgrades, Security Patches, Application Service support.
- Respond to system alerts, failures, and security incidents Perform root cause analysis (RCA) and implement preventive measures
- Provide Level 2 support in on-call capacity based on pre-approved schedule (including weekends)
- Continuously promote better ways to deliver Infrastructure solutions on Azure cloud.
- Propose adoption of new approaches, patterns, techniques, and ideas recommended by industry standards and industry trends.
- Work closely with Software development and network teams to enhance platform reliability and identity better approaches.

**Experience and Qualifications**:

- 5+ years of proven experience in delivering infrastructure solutions on Azure cloud.
- 5+ years of hands-on with infrastructure design and deployment utilizing PaaS, SaaS and IaaS cloud offerings.
- 5+ years of hands-on with Linux and Windows Server Cloud and On-Prem
- 3+ years working with Azure ARM templates and Azure Biceps
- 3+ years of hands-on experience designing, building, and deploying containerized runtime environments based on Azure Kubernetes Services
- 1+ years of proven experience administering RabbitMQ clusters and Nginx
- Provent experience with scripting languages like: PowerShell, Python, JavaScript, Bash
- Experience using Splunk, Grafana, Ops-Genie is an asset

**Skills**:

- AZ-104 (Microsoft Certified: Azure Administrator Associate) - required
- AZ-305 (Designing Microsoft Azure Infrastructure Solutions) - required
- CKA (Certified Kubernetes Administrator ) - required
- LPIC-1 (Linux Essentials) - required
- AZ-800 (Windows Server Hybrid Core Infrastructure) - preferred

Work Location: Remote



  • Desde casa, México Pinnacle A tiempo completo

    **Job Title**: Senior Azure Site Reliability Engineer**Reports** **To**: Azure Site Reliability Lead**About us**:Welcome to Pinnacle, the ultimate destination for sports enthusiasts seeking an exhilarating sportsbook and gaming experience! Established in 1998, we have solidified our position as one of the globe's foremost licensed online gaming companies....


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems. **Responsibilities** - Troubleshoot and resolve complex incidents to maintain system uptime - Ensure reliability and performance of Azure-based enterprise infrastructure - Implement observability, monitoring, and logging...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems.**Responsibilities**- Troubleshoot and resolve complex incidents to maintain system uptime- Ensure reliability and performance of Azure-based enterprise infrastructure- Implement observability, monitoring, and logging solutions-...


  • Desde casa, México thegetch mexico A tiempo completo

    **Función: Site Reliability Engineer****Aperturas: más de 10 contrataciones****Ubicación: - any city with TCS Office presence (Queretaro, Guadalajara, Mexico City or Monterrey)****Salario:- 25-33 USD/hr****Comunicación en inglés: avanzado****Experiência: 4+ años****Responsabilidades de Site Reliability Engineer**:Reúna y analice métricas de sistemas...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    We are seeking an experienced **Senior Site Reliability Engineer**to join our team. As a key member of the Reliability Tooling team, you will be responsible for writing and reviewing code, contributing to critical technical decisions, and mentoring engineers within your squad. This role requires a deep understanding of SRE principles and best practices, as...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Site Reliability Engineer,** where you will focus on cloud infrastructure, containerization, and monitoring using Kubernetes and Microsoft Azure. **Responsibilities** - Deploy and maintain Kubernetes resource manifests in clusters such as Kind, GKE, or AKS - Troubleshoot and analyze logs to identify and resolve system events and...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Senior Site Reliability Engineer**, where you will maintain and improve our product monitoring system, manage incident responses, and facilitate collaboration between operations and development teams. **Responsibilities** - Maintain and improve the product monitoring system - Manage incident response including troubleshooting,...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Senior Site Reliability Engineer**, where you will maintain and improve our product monitoring system, manage incident responses, and facilitate collaboration between operations and development teams.**Responsibilities**- Maintain and improve the product monitoring system- Manage incident response including troubleshooting, resolution,...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems. **Responsibilities** - Resolve complex incidents to ensure system availability - Maintain reliability and performance of Azure-based enterprise infrastructure - Deploy observability, monitoring, and logging tools - Automate...


  • Desde casa, México EPAM Systems, Inc. A tiempo completo

    Join our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems.**Responsibilities**- Resolve complex incidents to ensure system availability- Maintain reliability and performance of Azure-based enterprise infrastructure- Deploy observability, monitoring, and logging tools- Automate...