Databricks Data Engineer
hace 7 meses
About Us:
At Derevo, we are dedicated to empowering businesses and individuals to unleash the value of data within organizations. We achieve this by implementing analytics processes and platforms with a comprehensive approach covering the entire cycle necessary to achieve it.
Derevo started in 2010 with a simple idea - to create more than a company, but a community and a space where everyone has the opportunity to build a dream.
At Derevo, we believe in human talent that is free and creative. Being human is our superpower
**Databricks Data Engineer**
**Summary**:
The desired profile should have at least 5 years hands-on experience in designing, establishing, and maintaining data management and storing systems. Skilled in collecting, processing, cleaning, and deploying large datasets, understanding ER data models, and integrating with multiple data sources. Efficient in analyzing, communicating, and proposing different ways of building Data Warehouses, Data Lakes, End-to-End Pipelines, and Big Data solutions to clients, either in batch or streaming strategies.
**Technical Proficiencies**:
- SQL:
Data Definition Language, Data Manipulation Language, Intermediate/advanced queries for analytical purpose, Subqueries, CTEs, Data types, Joins with business rules applied, Grouping and Aggregates for business metrics, Indexing and optimizing queries for efficient ETL process, Stored Procedures for transforming and preparing data, SSMS, DBeaver
- Python:
Experience in object-oriented programming, Management and processing datasets, Use of variables, lists, dictionaries and tuples, Conditional and iterating functions, Optimization of memory consumption, Structures and data types, Data ingestion through various structured and semi-structured data sources, Knowledge of libraries such as pandas, numpy, sqlalchemy, Must have good practices when writing code
- Databricks / Pyspark:
Intermediate knowledge in
Understanding of narrow and wide transformations, actions, and lazy evaluations
How DataFrames are transformed, executed, and optimized in Spark
Use DataFrame API to explore, preprocess, join, and ingest data in Spark
Use Delta Lake to improve the quality and performance of data pipelines
Use SQL and Python to write production data pipelines to extract, transform, and load data into
tables and views in the Lakehouse
Understand the most common performance problems associated with data ingestion and how to
mitigate them
Monitor Spark UI: Jobs, Stages, Tasks, Storage, Environment, Executors, and Execution Plans
Configure a Spark cluster for maximum performance given specific job requirements
Configure Databricks to access Blob, ADL, SAS, user tokens, Secret Scopes and Azure Key Vault
Configure governance solutions through Unity Catalog and Delta Sharing
Use Delta Live Tables to manage an end-to-end pipeline with unit and integrations test
- Azure:
Intermediate/Advanced knowledge in
Azure Storage Account:
Provision Azure Blob Storage or Azure Data Lake instances
Build efficient file systems for storing data into folders with static or parametrized names, considering possible security rules and risks
Experience identifying use cases for open-source file formats like parquet, AVRO, ORC
Understanding optimized column-oriented file formats vs optimized row-oriented file formats
Implementing security configurations through Access Keys, SAS, AAD, RBAC, ACLs
Azure Data Factory:
Provision Azure Data Factory instances
Use Azure IR, Self-Hosted IR, Azure-SSIS to establish connections to distinct data sources
Use of Copy or Polybase activities for loading data
Build efficient and optimized ADF Pipelines using linked services, datasets, parameters, triggers, data movement activities, data transformation activities, control flow activities and mapping data flows
Build Incremental and Re-Processing Loads
- CICD (deseable)
**Process Automation**: Automate the deployment, scaling, and de-scaling of Azure Databricks clusters using tools like ARM Templates, Terraform, or Azure DevOps Pipelines.
**Monitoring and Performance Optimization**: Set up alerts and monitor key performance metrics in Azure Databricks using Azure Monitor and other monitoring tools. Optimize cluster and workload performance to ensure efficiency and scalability.
**Security and Compliance**: Implement security controls and compliance policies in Azure Databricks
**Integration with Azure Services**: Integrate Azure Databricks with other Azure services such as Azure Data Lake Storage, Azure SQL Database, Azure Synapse Analytics, and Azure DevOps to create end-to-end data analytics solutions.
**Configuration and Secrets Management**: Manage configurations and sensitive secrets using Azure Key Vault or other secrets management solutions. Ensure the security of credentials and access keys.
**Training and Support**: Provide training and technical support to development and data analytics teams in the effective use of Azure Databricks. Documen
-
Databricks Solutions Engineer
hace 2 meses
Ciudad de México, Ciudad de México Valce A tiempo completoWe are seeking a skilled Databricks Solutions Engineer to contribute to the design, development, and implementation of data-driven solutions using cutting-edge technology.**Job Overview**
-
Cloud-databricks - Big Data
hace 7 meses
México Valce A tiempo completoWe are seeking a talented Databricks Developer to join our team and contribute to the design, development, and implementation of Databricks solutions. As a Databricks Developer, you will play a crucial role in building and optimizing data pipelines, creating scalable data models, and enabling advanced analytics and machine learning capabilities using...
-
Databricks Data Engineer
hace 6 meses
México Derevo A tiempo completo**Databricks Data Engineer** **Summary**: The desired profile should have at least 5 years hands-on experience in designing, establishing, and maintaining data management and storing systems. Skilled in collecting, processing, cleaning, and deploying large datasets, understanding ER data models, and integrating with multiple data sources. Efficient in...
-
Azure Data Engineer
hace 7 meses
México Talent Accelerator A tiempo completoUna compañía mexicana líder en Software as a Service y Estudios de Mercado está buscando a un Azure Data Engineer. Tu responsabilidad principal será el desarrollo de pipelines para la extracción, transformación y carga de datos **(ETL)** en la nube mediante **Azure.** Esto implica utilizar herramientas como: - Azure SQL Server - Azure Blob Storage -...
-
Data Engineer
hace 3 meses
México, Zac. Neoris A tiempo completoEn NEORIS es un acelerador Digital que ayuda a las compañías a entrar en el futuro, teniendo 20 años de experiência como Socios Digitales de algunas de las mayores compañías del mundo. Somos más de 4,000 profesionales en 11 países, con nuestra cultura multicultural de startup en donde cultivamos innovación, aprendizaje continuo para crear soluciones...
-
Sr Data Engineer Databricks y Python
hace 3 semanas
Ciudad de México Multiplica Talent A tiempo completo**Responsabilidades**: - Diseñar, desarrollar y mantener pipelines de datos eficientes y escalables en Databricks. - Utilizar Python y otras herramientas asociadas para realizar procesamiento y análisis de grandes volúmenes de datos. - Integrar, transformar y cargar datos (ETL) desde diversas fuentes hacia plataformas de almacenamiento (Data Lakes, Data...
-
Data Engineer
hace 2 meses
México, Zac. Neoris A tiempo completoEn NEORIS es un acelerador Digital que ayuda a las compañías a entrar en el futuro, teniendo 20 años de experiência como Socios Digitales de algunas de las mayores compañías del mundo. Somos más de 4,000 profesionales en 11 países, con nuestra cultura multicultural de startup en donde cultivamos innovación, aprendizaje continuo para crear soluciones...
-
Sr Data Engineer with Databricks
hace 7 días
Ciudad de México Globant A tiempo completoWe are a digitally native company that helps organizations reinvent themselves and unleash their potential. Our innovative approach combines design, engineering, and scale to create cutting-edge solutions for our clients. Right now, we are looking for **Data Engineers with Databricks**to join our team at Globant! **You will get the chance to**: - Design...
-
Data Engineer
hace 5 días
Ciudad de México NTT DATA A tiempo completo**Req ID**: 217549 We are currently seeking a Data Engineer to join our team in Ciudad de México, México (MX-MEX), Mexico (MX). - Analyze and understand data sources & APIs - Design and Develop methods to connect & collect data from different data sources - Design and Develop methods to filter/cleanse the data - Design and Develop SQL, Hive queries, APIs...
-
Data Engineer
hace 4 meses
México, Zac. Chubb A tiempo completo**Data Engineer **Responsibilities**: - Develop and maintain robust, efficient, and scalable ETL processes for data extraction, transformation, and loading tasks. - Design, implement, and optimize data models to meet business requirements and ensure data integrity and accuracy. - Collaborate with cross-functional teams, including Data Scientists, Software...
-
Data Engineer
hace 17 horas
Ciudad de México Everest Technologies A tiempo completo**Description**: The primary responsibility of Senior Data Management Engineer is to build data pipelines, model and prepare data, perform complex data analysis to answer Business questions, build and automate data pipeline and quality framework to enable and promote self-service data pipelines, assist in operationalizing the AI / ML Engineering solutions....
-
Data Engineer
hace 3 meses
Ciudad de México, CDMX Finvero A tiempo completo**Finvero es el primer Marketplace de Crédito que ofrece a negocios, financieras, bancos, etc. La posibilidad de expandir sus productos financieros a un mercado no bancarizado.** **Funciones**: - Diseñar, desarrollar e implementar pipelines de datos eficientes y escalables para la adquisición, transformación y carga de datos desde diversas fuentes...
-
Data Engineer
hace 7 meses
Ciudad de México Hashmap A tiempo completo**Req ID**:247158 We are currently seeking a Data Engineer to join our team in CDMX, Ciudad de México (MX-CMX), Mexico (MX). Description: - Design, develop, test, deploy, support, enhance data integration solutions seamlessly to connect and integrate ThermoFisher enterprise systems in our Enterprise Data Platform. - Innovate for data integration in...
-
Sr Data Engineer with Talend
hace 2 meses
Ciudad de México, CDMX Globant A tiempo completoWe are a digitally native company that helps organizations reinvent themselves and unleash their potential. Our innovative approach combines design, engineering, and scale to create cutting-edge solutions for our clients. Right now, we are looking for **Data Engineers with Databricks** to join our team at Globant! **_You will get the chance to:_** - Work...
-
Data Engineer
hace 3 meses
México, Zac. Valce Talent Solutions A tiempo completoHola! Estamos en búsqueda de un **ingeniero de datos** con al menos **2+ **años de experiência y conocimiento en: - 1+ exp Google Cloud Plataform (**GCP**) - Desarrollar pipelines de integración de datos usando Spark/Dataproc y Apache Beam/Data Flow en Google Cloud para cargar repositorios en Big Query y SQL Servery automatizarlos con DAGs en Composer -...
-
Data Engineer
hace 3 meses
Ciudad de México, CDMX NTT DATA A tiempo completoNTT Data Company, somos todas las personas que la formamos. Un equipo de más de 139.000 profesionales, tan diverso cómo diversos son los 50 países en los que estamos presentes y los diferentes sectores en los que desarrollamos nuestra actividad; telecomunicaciones, entidades financieras, industria, utilities, energía, administración pública y...
-
AI Fund | Machine Learning Engineer
hace 14 horas
méxico AI Fund A tiempo completoWho we are: Factored was conceived in Palo Alto, California by Andrew Ng and a team of highly experienced AI researchers, educators, and engineers to help address the significant shortage of qualified AI & Machine-Learning engineers globally. We know that exceptional technical aptitude, intelligence, communication skills, and passion are equally distributed...
-
Data Engineer
hace 3 meses
Ciudad de México, CDMX NTT DATA A tiempo completo**Req ID**: 296957 We are currently seeking a Data Engineer - AWS to join our team in CDMX, Ciudad de México (MX-CMX), Mexico (MX). **Position overview** **Position's General Duties and Tasks**: - Assists in defining standards, guidelines, best practices and metrics. - Provides regular reporting of metrics including recommendations to improve...
-
Sr. Data Warehouse Engineer
hace 1 semana
Ciudad de México Trinity Industries A tiempo completo**Trinity Industries Corporate IT **is seeking a **Sr Data Warehouse Engineer **in our **Queretaro, MX** office. This role will be part of the Enterprise Data Warehouse team. **What you'll do**: - Design and Implement ETL pipelines - Develop and manage dashboards - Work with business stake holders to understand the business requirements and translate them...
-
Lead Data Engineer
hace 7 meses
México Chubb A tiempo completo**With you Chubb is better!** Chubb is the world’s largest publicly traded P&C insurance company and a leading commercial lines insurer in the United States. With operations in 54 countries and territories, Chubb provides commercial and personal property and casualty insurance, personal accident and supplemental health insurance, reinsurance, and life...