Data Platform Guide
This is a collection of books and courses I can recommend personally. They are great for every data engineering learner and I have used or own these books during my professional work.
In order to implement the robust Data Platform Design framework, combing Data Engineering and Automation for Data Platoform Operations an Analytics.
Data Engineering Fundamentals
What is Data Platform Design ?
Data Platform Design Framework beyonds the traditional Data Scaling.
Data Platform Design is a set of practices and processes for managing the data lifecycle, from data ingestion to processing and analysis, in a way that ensures high quality and reliability.
The Data Platform Design framework provides a variety of different tools to manage data lifecycle, automatically processing and analysis data as well as maintain high quality of data. Help company to reduce the effort of data operations and take advantage of Data Insights.
Goals: Improve collaboration between data professionals, enhance data quality, and speed up data-related tasks.
Set of expectations for data platform design:
- Highly-available, redundant configuration services run within platform.
- Zero-downtime capability with granular monitoring.
- Auto-scaling across services
- All services are maintained, governed by Governance which is backbone of the platform.
I write of Data Platform Design Framework with 5 layers to help readers what is concctepture and contextualize the Data Platform Design Framework.
I’ve started the DataPods - Open Source Data Platform Ops to help readers to understand the Data Platform Design Framework and how to implement it. It has the following components: DataPods is a comprehensive starter kit that provides:
def datapods(feature):
match feature:
case "configurations":
return "Production-like configurations"
case "deployment":
return "Easy deployment options (K8s/Docker)"
case "tools":
return "Best-in-class open source tools"
Resources: From Internet
- Designing Data-Intensive Applications - Legit
- Building the Data Warehouse, Bill Inmon - Legit
- Data Engineering Nanodegree (Udacity) - Overview, Demo
- Big Data Specialization (Coursera)
- Learning Spark
- The Data Warehouse Lifecycle Toolkit by Ralph Kimball and Laura Reeves - Legit
- Data Engineering
- Pattern of DE Online Data Engineering Design Pattern by Simon
- Usecases of DE Vu Trinh-Substack
- Open Mordern Data Platform Starburst Galaxy
- Summary of Books I have read DEH-Books
… Sections below are under updates now. Please back soon!
Design Data Platform
Case studies
Bonus
Additional Recommendations:
- Certifications: Consider certifications like AWS Certified Data Engineer, GCP Certified Data Engineer, or Azure Data Engineer Associate.
- Open-source projects: Contribute to open-source data engineering projects to gain practical experience.
- Online communities: Engage with data engineering communities on platforms like Stack Overflow, Reddit, and LinkedIn.
- Networking: Build relationships with other data engineers to learn from their experiences.
- Remember: This is a general roadmap. The specific courses, books, and practices may vary depending on your experience level, industry, and technology stack.