Cloud Data Engineer

Tech Stack

PYTHON
AWS
DATABRICKS
ETL
SQL
PYSPARK
AZURE

Job Description

Responsibilities:Implement and manage data ingestion pipelines from diverse sources such as Kafka, RDBMS (Postgres) using CDC (Change Data Capture), and file systems (CSV) following Medalion Architecture principlesDevelop and optimize data transformations using PySpark and SQL to handle data ranging from MB to GB, depending on the sourceConduct unit testing and integration testing to ensure the accuracy and reliability of data transformations and pipelinesWork with AWS technologies, including S3 for data storage and Docker on AWS for containerized applicationsImplement and manage infrastructure using Terraform, such as creating S3 buckets, managing Databricks Service Principals, and deploying infrastructure as codeDeploy and manage solutions using CI/CD pipelines, particularly with CircleCI, to ensure seamless and automated deployment processes Requirements:Minimum 4-5 years of a professional experienceProficiency in SQL and PythonStrong experience with AWS cloud servicesHands-on experience with DataBricksKnowledge of ETL ProcessingEffective communication skills in English (minimum B2 level)Knowledge of system designUnderstanding of Medalion ArchitectureNice to have:Familiarity with Kedro and Airbyte Knowledge of Machine Learning Offer:Private medical careCo-financing for the sport cardTraining & learning opportunitiesConstant support of dedicated consultantTeam-building events organized by DCGEmployee referral program