Description of Project
The main goal of this project is to prepare developers for future projects related to ETL on Spark/Kubernetes stack. We developed two Python applications. The first one to connect to the BigQuery instance on Google Cloud and generate a table with sample data. The second to perform data extraction from BigQuery and transformation using PySpark and load results as a file into Google Cloud Storage. PySpark application run on minikube and deployed on Google Kubernetes Engine cluster.
Responsibilities
Develop Python applications.
Technology Stack
Python, PySpark, Minikube, Docker, Github, BigQuery, Google Cloud Storage, Google Kubernetes Engine, Artifact Registry.
Period
01.2019 —
until now
(6 years 5 months)