Databricks Data Engineer
hace 3 semanas
About Us:At Derevo, we are dedicated to empowering businesses and individuals to unleash the value of data within organizations. We achieve this by implementing analytics processes and platforms with a comprehensive approach covering the entire cycle necessary to achieve it.Derevo started in 2010 with a simple idea - to create more than a company, but a community and a space where everyone has the opportunity to build a dream.At Derevo, we believe in human talent that is free and creative. Being human is our superpower**Databricks Data Engineer****Summary**:The desired profile should have at least 5 years hands-on experience in designing, establishing, and maintaining data management and storing systems. Skilled in collecting, processing, cleaning, and deploying large datasets, understanding ER data models, and integrating with multiple data sources. Efficient in analyzing, communicating, and proposing different ways of building Data Warehouses, Data Lakes, End-to-End Pipelines, and Big Data solutions to clients, either in batch or streaming strategies.**Technical Proficiencies**:- SQL:Data Definition Language, Data Manipulation Language, Intermediate/advanced queries for analytical purpose, Subqueries, CTEs, Data types, Joins with business rules applied, Grouping and Aggregates for business metrics, Indexing and optimizing queries for efficient ETL process, Stored Procedures for transforming and preparing data, SSMS, DBeaver- Python:Experience in object-oriented programming, Management and processing datasets, Use of variables, lists, dictionaries and tuples, Conditional and iterating functions, Optimization of memory consumption, Structures and data types, Data ingestion through various structured and semi-structured data sources, Knowledge of libraries such as pandas, numpy, sqlalchemy, Must have good practices when writing code- Databricks / Pyspark:Intermediate knowledge inUnderstanding of narrow and wide transformations, actions, and lazy evaluationsHow DataFrames are transformed, executed, and optimized in SparkUse DataFrame API to explore, preprocess, join, and ingest data in SparkUse Delta Lake to improve the quality and performance of data pipelinesUse SQL and Python to write production data pipelines to extract, transform, and load data intotables and views in the LakehouseUnderstand the most common performance problems associated with data ingestion and how tomitigate themMonitor Spark UI: Jobs, Stages, Tasks, Storage, Environment, Executors, and Execution PlansConfigure a Spark cluster for maximum performance given specific job requirementsConfigure Databricks to access Blob, ADL, SAS, user tokens, Secret Scopes and Azure Key VaultConfigure governance solutions through Unity Catalog and Delta SharingUse Delta Live Tables to manage an end-to-end pipeline with unit and integrations test- Azure:Intermediate/Advanced knowledge inAzure Storage Account:Provision Azure Blob Storage or Azure Data Lake instancesBuild efficient file systems for storing data into folders with static or parametrized names, considering possible security rules and risksExperience identifying use cases for open-source file formats like parquet, AVRO, ORCUnderstanding optimized column-oriented file formats vs optimized row-oriented file formatsImplementing security configurations through Access Keys, SAS, AAD, RBAC, ACLsAzure Data Factory:Provision Azure Data Factory instancesUse Azure IR, Self-Hosted IR, Azure-SSIS to establish connections to distinct data sourcesUse of Copy or Polybase activities for loading dataBuild efficient and optimized ADF Pipelines using linked services, datasets, parameters, triggers, data movement activities, data transformation activities, control flow activities and mapping data flowsBuild Incremental and Re-Processing Loads- CICD (deseable)**Process Automation**: Automate the deployment, scaling, and de-scaling of Azure Databricks clusters using tools like ARM Templates, Terraform, or Azure DevOps Pipelines.**Monitoring and Performance Optimization**: Set up alerts and monitor key performance metrics in Azure Databricks using Azure Monitor and other monitoring tools. Optimize cluster and workload performance to ensure efficiency and scalability.**Security and Compliance**: Implement security controls and compliance policies in Azure Databricks**Integration with Azure Services**: Integrate Azure Databricks with other Azure services such as Azure Data Lake Storage, Azure SQL Database, Azure Synapse Analytics, and Azure DevOps to create end-to-end data analytics solutions.**Configuration and Secrets Management**: Manage configurations and sensitive secrets using Azure Key Vault or other secrets management solutions. Ensure the security of credentials and access keys.**Training and Support**: Provide training and technical support to development and data analytics teams in the effective use of Azure Databricks. Documen
-
Data Engineer Snowflake
hace 2 semanas
méxico Perficient A tiempo completoGet AI-powered advice on this job and more exclusive features. Direct message the job poster from Perficient The Data Engineer will be responsible for designing, developing, and maintaining data pipelines and solutions to enable our client to harness the power of the data for further analytics. We’re looking for passionate individuals with knowledge on...
-
Databricks Engineer
hace 2 semanas
méxico Everscale Group A tiempo completoAs a Databricks Engineer, you will be responsible for designing, implementing, and maintaining scalable data processing and analytics solutions using Databricks Unified Analytics Platform. The ideal candidate possesses a deep understanding of big data technologies, proficient coding skills, and a strong background in data engineering and analytics....
-
Databricks Engineer
hace 2 semanas
méxico Everscale Group A tiempo completoA global technology company is looking for a Databricks Engineer to design and implement scalable data processing solutions. The ideal candidate is proficient in big data technologies, especially Databricks, with strong coding abilities in Python, Scala, or Java. This role involves collaborating with cross-functional teams to enhance data processing...
-
Data Engineer
hace 2 semanas
Ciudad de México Information Technologies Consultant Home S.A. de C.V. A tiempo completoData Engineer (Azure Databricks) «Senior» Experience Required in: As a Data Engineer, you will be responsible for designing, developing, and maintaining data pipelines and data models using your expertise in Data Modeling Techniques and Methodologies. You will work closely with the Analytics team to ensure that data is properly collected, processed, and...
-
Databricks Data Engineer
hace 3 semanas
México Derevo A tiempo completo**Databricks Data Engineer****Summary**:The desired profile should have at least 5 years hands-on experience in designing, establishing, and maintaining data management and storing systems. Skilled in collecting, processing, cleaning, and deploying large datasets, understanding ER data models, and integrating with multiple data sources. Efficient in...
-
Databricks Data Engineer
hace 1 semana
México Derevo A tiempo completoAbout Us: At Derevo, we are dedicated to empowering businesses and individuals to unleash the value of data within organizations. We achieve this by implementing analytics processes and platforms with a comprehensive approach covering the entire cycle necessary to achieve it. Derevo started in 2010 with a simple idea - to create more than a company, but a...
-
Lead Data Platform Engineer | AI-Driven, AWS
hace 2 semanas
méxico HCLTech A tiempo completoA global technology company is seeking a Principal Data Engineer to work within the Data Engineering teams. This role focuses on developing robust IT solutions and database architectures for advanced analytics. Candidates should have strong skills in AWS, Databricks, PySpark/Python, and SQL. The position includes a competitive benefits package, such as life...
-
Data Engineer
hace 2 semanas
11650, Ciudad de México, CDMX Finvero A tiempo completoOportunidad laboral - Data Engineer - 2 dias en oficina - Construir y mantener infraestructuras de datos, incluyendo sistemas de almacenamiento distribuido, bases de datos, clústeres de procesamiento y herramientas de monitorización basados en soluciones Azure. - Microsoft Azure, incluyendo Azure Data Factory, Azure Databricks, Azure Synapse Analytics,...
-
Databricks Engineer
hace 3 semanas
Ciudad de México Cognizant A tiempo completo**We’re hiring!**At Cognizant we have an ideal opportunity for you to be part of one of the largest companies in the digital sector worldwide. A Great Place To Work where we look for people who contribute new ideas, experiencing a dynamic and growing environment. At Cognizant we promote an inclusive culture, where we value different perspectives providing...
-
Data Engineer
hace 57 minutos
México Neoris A tiempo completoNEORIS es un acelerador Digital que ayuda a las compañías a entrar en el futuro, teniendo 20 años de experiência como Socios Digitales de algunas de las mayores compañías del mundo. Somos más de 4,000 profesionales en 11 países, con nuestra cultura multicultural de startup en donde cultivamos innovación, aprendizaje continuo para crear soluciones de...