Choose Your Reading Style
A professional-level summary covering key definitions, frameworks, and exam-relevant points.
Data Lake vs Data Warehouse Comparison
| Aspect | Data Lake | Data Warehouse |
|---|---|---|
| Data types | Structured, semi-structured, unstructured | Structured only |
| Schema | Schema-on-read (defined at query time) | Schema-on-write (defined at ingestion) |
| Processing | Raw storage; processing at query time | Pre-processed; optimised for queries |
| Users | Data scientists; ML engineers; analysts | Business analysts; BI users; executives |
| Query language | SQL + Python/Spark/ML frameworks | SQL |
| Cost | Low storage cost; high processing cost | Higher storage cost; lower query cost |
| Governance risk | Data swamp risk without governance | Lower risk; structured by design |
CDMP Exam Relevance
Data lakes are tested in the Data Warehousing & Business Intelligence knowledge area (10% of the CDMP exam) and the Big Data & Data Science knowledge area (6%). Key exam topics include: the definition and architecture of a data lake, the difference between a data lake and a data warehouse, the concept of schema-on-read vs schema-on-write, the risk of a "data swamp" (a poorly governed data lake), and the governance requirements for data lakes (metadata management, data quality, access control).