1.1 - About this course
- Content
- Scope
1.2 - Prerequisites
- What do you need to take this course?
- Registration in Superprof website
1.3 - Costs
1.4 - Resources
- Course resources
1.5 - Setup
- Software installation
2.1 - Virtual environments (1 hour)
- How to use virtual environments
- [DEMO]
- [ACTIVITY] Version corrections
2.2 - Introduction to Python (1 hour)
2.2.1 - Recap basics Python
- Data types
- Functions
- [DEMO] Download CDMX files
- [ACTIVITY] Refactor code
2.2.2 - Documentation
- Coding conventions (PEP8)
- Annotations
- Best practices in code
- Docstrings
2.3 - Task automation (2 hours)
- Organize files
- Read and write files
- Debugging
2.4 - POO (Optional)
3.1 - Introduction
- Fundamentals of software development
- Gather requirements
- Design and development of a data engineering project
- Software deployment
3.2 - Jupyter Notebooks
- Google Colab development
- Local development
3.3 - Folder structure
- Working folder structure
3.4 - Version control
- Intro to GitHub
- Git fundamentals
- Collaborating on GitHub
- README.md
4.1 - Concepts
- Intro to CI
4.2 - Tools for CI
- Linting y formatting
- Pre-commits
- Unit Test (Pytest)
5.1 - Cloud providers
- Google Cloud Platform
- Amazon Web Services
- Azure
5.2 - Common Cloud Resources for Data Engineering (GCP)
- Google Cloud Storage
- Google Compute Engine
- BigQuery
- Cloud Functions
6.1 - Data Pipelines
- ETL vs ELT
- Extraction
- Transformation
- Load
6.2 - Batch processing
- Pandas
- PySpark
- Spark Dataframes
- Spark SQL
- Dataproc
6.3 - Orchestration
- Google Cloud Scheduler
- Shipyard
- Alerting (email)
6.4 - Data Warehousing
- Fundamentals of Data Warehouse
- Data Warehousing with BigQuery
- Partitioning
6.5 - Reporting
- Looker (Data Studio)
7.1 - Kafka
- Intro to Kafka
- Schemas
- Kafka Streams
7.2 - Streaming on cloud
- PubSub
- Data Enrichment (BigQuery + PubSub)x
- Apache Beam
8.1 - Concepts
- Intro to CD
8.2 - Docker
- Docker
- Artifact Registry (GCP)
8.3 - Infrastructure as Code
- Terraform
8.4 - Tools for CD
- GitHub Actions
- Automated deployment
9.1 - Concepts
- Intro to Analytics Engineering
9.2 - dbt
9.3 - dbt & BigQuery
9.4 - dbt models
9.5 - Testing and documentation
9.6 - Cloud and local environments