Site Reliability Engineer
hace 4 semanas
SRE Software Engineer is responsible for designing, configuring, monitoring, implementing, and maintaining our observability solutions and troubleshooting Ford Credit IT systems and applications to ensure optimal performance and reliability.
MAJOR RESPONSIBILITIES- Utilizing Observability and Monitoring tools to detect and resolve issues affecting positive user experience.
- Automating alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.
- Working with Splunk query language and monitoring database connection health by using Splunk DB connect health dashboards, log parsing, complex Splunk searches, including external table lookups, Splunk data flow, components, features, and product capability.
- Observability:
- Implement comprehensive monitoring and alerting solutions using GCP monitoring services and external services.
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
- Build vital and efficient tooling to lower the barrier of entrance for engineering teams to plug in and enjoy the benefits of Reliability focused on Observability.
- Configure dashboards, alerts, and notifications to ensure timely identification and resolution of issues.
- Troubleshoot issues and outages, working closely with development and operations teams to identify root causes and develop solutions.
- Monitor server, network infrastructure, and application performance metrics, and identify patterns and trends to improve system performance and reliability.
- Develop and integrate tools for logging, monitoring, and alerting to enhance visibility into system performance.
- Participate in strategic planning for the technology roadmap, including scalability, cost-effectiveness, and risk management considerations related to observability infrastructure.
- 6+ years of SRE observability engineering experience.
- 6+ years of experience in observability best practices working with Dynatrace or similar tools (NewRelic, DataDog, AppDynamics, or other similar APM suites), delivering solutions across all environments, and integrating platforms and applications with monitoring and APM tools.
- Knowledge of CI/CD tools such as Puppet, Jenkins, Terraform, Ansible.
- Minimum 4 to 5 years' working experience in OpenShift and Docker/K8s.
- Proficiency in implementing monitoring and observability solutions using GCP monitoring services such as Cloud Monitoring, Logging, and Tracing.
- Deep understanding of IT infrastructure monitoring and observability best practices.
- Experience with gathering and organizing large amounts of data to use for instrumentation into an Enterprise monitoring solution.
- Experience with recommending baseline monitoring thresholds and performance monitoring KPIs and SLAs.
- At least 4 years of experience in the development of Grafana dashboards, developing metrics/monitoring standardization - metrics, collection, dashboards with Grafana a must.
- 3-5 years of experience with SQL and familiarity with at least one managed Kubernetes platform (EKS, AKS, GKE).
- Strong background in software engineering, with expertise in relevant programming languages (like Python, Java, Go) and cloud platforms (like AWS, GCP, Azure).
- Experience with container orchestration tools like Kubernetes.
- Strong interpersonal and organizational skills.
- Strong verbal and written skills.
- Attention to detail.
- Excellent time management.
- Extraordinary teamwork and collaborative skills.
-
Senior Site Reliability Engineer
hace 4 semanas
distrito federal, México Trimble A tiempo completoYour Title: Senior Site Reliability Engineer Location: Mexicali, Mexico We are seeking a skilled and motivated Senior Site Reliability Engineer to join our team in Trimble’s Core Cloud Platform. The ideal candidate will have a strong background in cloud platforms, infrastructure as code, and automation via programming/scripting languages. You will embed...
-
Site Reliability Engineer
hace 4 semanas
distrito federal, México Trax A tiempo completoAbout The Position The Position Site Reliability Engineer Job Description The Site Reliability Engineer (SRE) is responsible for implementing and maintaining the Cloud Infrastructure which runs services developed by Trax. SREs are responsible for the reliability and scalability of Trax services. This includes supporting both our production-critical systems...
-
Senior Site Reliability Engineer
hace 1 semana
distrito federal, México Refinitiv A tiempo completoSenior Site Reliability Engineer page is loaded Senior Site Reliability Engineer Apply remote type Remote Job: Hybrid locations MEX-Distrito Federal-Reforma 26 time type Full time posted on Posted 13 Days Ago time left to apply End Date: November 8, 2024 (4 days left to apply) job requisition id JREQ177645 ...
-
Site Reliability Engineer III/Network
hace 4 semanas
distrito federal, México F5 A tiempo completoPosition Summary: Software engineering is a core discipline at F5 for many roles. As a software engineer specializing in site reliability, you will bring a software engineering and automated solution mindset to your work. The Site Reliability Engineer III will be responsible for ensuring the reliability, availability, and scalability of critical systems,...
-
Site Reliability Engineer
hace 4 semanas
distrito federal, México Thales A tiempo completoThales people architect identity management and data protection solutions at the heart of digital security. Business and governments rely on us to bring trust to the billions of digital interactions they have with people. Our technologies and services help banks exchange funds, people cross borders, energy become smarter and much more. More than 30,000...
-
Site Reliability Engineer
hace 3 semanas
distrito federal, México Kyndryl A tiempo completoWho We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities. The Role ...
-
Staff Site Reliability Engineer
hace 1 semana
distrito federal, México Crunchyroll, LLC A tiempo completoAbout Crunchyroll WE HELP EVERYONE BELONG. IT'S OUR PURPOSE. Founded by fans, Crunchyroll delivers the art and culture of anime to a passionate community. We super-serve over 100 million anime and manga fans across 200+ countries and territories, and help them connect with the stories and characters they crave. Whether that experience is online or...
-
Site Reliability Engineer
hace 4 semanas
distrito federal, México Epam A tiempo completoAre you a DevOps expert with a passion for improving communication between operational and developmental sides of the software development process? Do you thrive in dynamic, collaborative environments? If so, we have an exciting opportunity for you! We're currently seeking a Site Reliability Engineer to join our vibrant team. This role offers the chance to...
-
Senior Site Reliability Engineer
hace 3 semanas
distrito federal, México Valtech A tiempo completoValtech is a Global Digital Company with offices in 21 countries, focused on business transformation through digital innovation. We design and build unique experiences, execute continuous improvement efforts and live, nurture and drive business transformation across the digital world. We support our clients in the research, design and commercialization of...
-
Principal Site Reliability Engineer
hace 4 semanas
distrito federal, México https:www.energyjobline.comsitemap.xml A tiempo completoResponsibilities Solve complex problems related to Linux infrastructure and Oracle Cloud Infrastructure. Act as a partner concern point for critical issues that may not have a detailed procedure and provide Root Cause Analysis (RCA). Understand the end-to-end configuration, technical dependencies, and characteristics of production infrastructure and...
-
Site Reliability Engineer
hace 3 semanas
distrito federal, México Thales A tiempo completoThales people architect identity management and data protection solutions at the heart of digital security. Business and governments rely on us to bring trust to the billions of digital interactions they have with people. Our technologies and services help banks exchange funds, people cross borders, energy become smarter and much more. More than 30,000...
-
Infrastructure Specialist
hace 3 semanas
distrito federal, México 1210 Kyndryl Mexico S. de R.L. de C.V. A tiempo completoInfrastructure Specialist (Site Reliability Engineer) Who We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward, always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our...
-
Staff Site Reliability Engineer
hace 6 días
distrito federal, México Ellation US A tiempo completoWE HELP EVERYONE BELONG. IT’S OUR PURPOSE. Founded by fans, Crunchyroll delivers the art and culture of anime to a passionate community. We super-serve over 100 million anime and manga fans across 200+ countries and territories, and help them connect with the stories and characters they crave. Whether that experience is online or in-person, streaming...
-
Site Reliability Engineer
hace 6 días
distrito federal, México Improving A tiempo completoImproving is committed to building a great place to work by cultivating an environment that fosters professional and personal relationships. We value open communication, personal growth, and shared rewards, which result in sustainable success. Voted “best place to work” numerous times, Improving strives to create and maintain a culture that exemplifies...
-
Service Reliability Engineer
hace 4 semanas
distrito federal, México Thales Group A tiempo completoService Reliability Engineer Service Reliability Engineer This is a hybrid position within Mexico City, Mexico. Thales is looking for a Service Reliability Engineer who is primarily responsible to ensure the best customer experience by assuring services reliability from the customer's perspective and making sure Incident/Service Requests are resolved in the...
-
Reliability Engineer
hace 4 semanas
distrito federal, México Wipro A tiempo completoRole: Reliability Engineer Great opportunity to work in Global Company of Education / Publishing sector. Location: Hybrid in Mexico City (1 or 2 days per week in office) Required Skills and Experience: Bachelor’s in computer systems, Informatics, or alike. 3 to 5 years of experience/knowledge in a similar role: Cloud Engineering: Hands-on design,...
-
Infrastructure Specialist
hace 4 semanas
distrito federal, México Kyndryl Inc. A tiempo completoInfrastructure Specialist (Site Reliability Engineer) Apply Remote Type: Partially Remote Location: Mexico City, Distrito Federal, Mexico Time Type: Full time Posted On: Posted Yesterday Job Requisition ID: R-24815 Who We Are At Kyndryl, we design, build, manage, and modernize the mission-critical technology systems that the world depends on every day....
-
Service Reliability Engineer
hace 4 semanas
distrito federal, México Thales A tiempo completoPosition Summary This is a hybrid position within Mexico City, Mexico. Thales is looking for a Service Reliability Engineer who is primarily responsible to ensure the best customer experience by assuring services reliability from customer eyes and making sure Incident/Service Requests are resolved in the shortest timeframe. In this position, you will also...
-
Senior Site Reliability Engineer
hace 6 días
distrito federal, México Thomson Reuters A tiempo completoThomson Reuters is the Answer Company. We provide authoritative content, advanced technologies, and human expertise to help our customers find trusted answers. We enable professionals in the legal, tax and accounting, and media markets to make the decisions that matter most, all powered by the world's most trusted news organization. The Legal group is...
-
Senior Site Reliability Engineer
hace 4 semanas
distrito federal, México Refinitiv A tiempo completoSr Support Engineer L3 (.Net) Sr Support Engineer L3 (.Net) Thomson Reuters is seeking a Sr Support Engineer L3 (.Net) . This role will be a part of a high performing team of talented SRE specialists who provide world-class support for Commercial Engineering. This team manages ongoing incident detection and resolution, change planning and implementation, and...