Choose Your Reading Style
A professional-level summary covering key definitions, frameworks, and exam-relevant points.
Data Lake vs Data Warehouse Comparison
| Characteristic | Data Lake | Data Warehouse |
|---|---|---|
| Data format | Raw (any format) | Processed (structured) |
| Schema | Schema-on-read | Schema-on-write |
| Data types | Structured, semi-structured, unstructured | Structured only |
| Users | Data scientists, engineers | Business analysts, executives |
| Query type | Exploratory, ML, ad-hoc | Predefined, structured reporting |
| Cost | Low (object storage) | Higher (specialised compute) |
| Data quality | Variable (raw data) | High (processed, validated) |
| Governance risk | High (data swamp risk) | Low (structured governance) |
CDMP Exam Relevance
Data lakes and the lakehouse architecture are tested in the Data Warehousing & BI (10%) and Data Architecture (6%) knowledge areas. Key exam topics include: the differences between data lakes and data warehouses, the schema-on-read vs schema-on-write distinction, the data swamp problem and how governance prevents it, and the role of metadata management in making data lakes usable. The lakehouse concept is increasingly relevant as a modern data architecture pattern.