Data Modeling Explained: Conceptual, Logical, and Physical Models

What Is Data Modeling?

Data modeling is the process of creating a visual representation of an information system or parts of it to communicate connections between data points and structures. A data model defines the structure of data — the entities (things we want to store data about), their attributes (properties), and the relationships between them.

Data modeling is fundamental to database design, system integration, and data governance. A well-designed data model makes data easier to understand, query, and maintain. A poorly designed model leads to data redundancy, update anomalies, and performance problems.

The Three Levels of Data Modeling

Data models exist at three levels of abstraction, each serving a different audience and purpose:

1. Conceptual Data Model

The conceptual data model is the highest level of abstraction. It represents the business view of data — the key entities and their high-level relationships — without any technical detail. It is designed to communicate with business stakeholders, not technical implementers.

A conceptual model typically shows:

Major business entities (e.g., Customer, Product, Order)
High-level relationships between entities (e.g., "Customer places Order")
No attributes, no data types, no keys

The conceptual model answers the question: What are the key things the business needs to store data about, and how are they related?

2. Logical Data Model

The logical data model adds detail to the conceptual model — defining the attributes of each entity, the data types, and the relationships in precise terms — but remains independent of any specific database technology.

A logical model typically shows:

All entities and their attributes
Primary keys and foreign keys
Relationship cardinalities (one-to-one, one-to-many, many-to-many)
Data types (but not database-specific types)
Normalisation applied (typically to 3NF or BCNF)

The logical model answers the question: What data needs to be stored, in what structure, and with what constraints?

3. Physical Data Model

The physical data model is the implementation-specific version of the logical model. It translates the logical design into the specific syntax and features of the target database platform (e.g., MySQL, Oracle, PostgreSQL).

A physical model typically shows:

Table names and column names (following database naming conventions)
Database-specific data types (e.g., VARCHAR(255), BIGINT, TIMESTAMP)
Indexes, partitioning, and storage parameters
Stored procedures, triggers, and views
Performance optimisations

Entity-Relationship (ER) Diagrams

The most widely used notation for data modeling is the Entity-Relationship (ER) diagram, developed by Peter Chen in 1976. ER diagrams use standardised symbols to represent entities (rectangles), attributes (ovals), and relationships (diamonds or lines with cardinality notation).

Modern ER diagrams typically use Crow's Foot notation, which uses lines with symbols at each end to show cardinality:

A single vertical line (|) means "exactly one"
A crow's foot (three lines) means "many"
A circle means "zero"

So a line with a crow's foot at one end and a single line at the other represents a one-to-many relationship.

Normalisation

Normalisation is the process of organising data in a relational database to reduce redundancy and improve data integrity. It involves decomposing tables into smaller, more focused tables and defining relationships between them.

The normal forms are:

First Normal Form (1NF): Each column contains atomic (indivisible) values; no repeating groups
Second Normal Form (2NF): In 1NF + every non-key attribute is fully functionally dependent on the entire primary key (eliminates partial dependencies)
Third Normal Form (3NF): In 2NF + no transitive dependencies (non-key attributes do not depend on other non-key attributes)
Boyce-Codd Normal Form (BCNF): A stricter version of 3NF — every determinant is a candidate key

Most production databases are normalised to 3NF or BCNF. Higher normal forms (4NF, 5NF) exist but are rarely used in practice.

Dimensional Modeling

Dimensional modeling, developed by Ralph Kimball, is a data modeling technique optimised for analytical (OLAP) workloads — data warehouses and business intelligence systems. Unlike normalisation (which minimises redundancy), dimensional modeling deliberately denormalises data to optimise query performance.

The two key components of dimensional modeling are:

Fact tables: Store the measurable, quantitative data about business events (e.g., sales transactions). Each row represents one event. Fact tables contain foreign keys to dimension tables and numeric measures (e.g., quantity, revenue).
Dimension tables: Store the descriptive attributes that provide context for the facts (e.g., Customer, Product, Date, Store). Dimension tables are typically wide (many columns) and relatively small.

The star schema arranges one fact table at the centre surrounded by dimension tables — like a star. The snowflake schema normalises dimension tables into multiple related tables — like a snowflake. Star schemas are simpler and faster to query; snowflake schemas use less storage but are more complex.

Slowly Changing Dimensions (SCD)

Slowly Changing Dimensions (SCD) handle the challenge of tracking how dimension attributes change over time. The most common SCD types are:

Type 0: Retain original — the attribute never changes
Type 1: Overwrite — update the existing record; no history is kept
Type 2: Add new row — create a new record for each change; full history is preserved
Type 3: Add new attribute — add a column for the previous value; only one level of history
Type 4: History table — maintain a separate history table
Type 6: Hybrid — combines Types 1, 2, and 3

Type 2 is the most commonly used SCD type in data warehouses because it preserves full history while allowing point-in-time analysis.

Premium Reading Styles — Locked

You've read this topic 3 ways. Premium unlocks 3 more.

Simple, Analogy, and Overview give you the foundation. Premium members get Memory Hack, Exam Focus, and Deep Dive — the three styles that actually make you score 95%+ on the CDMP exam.

PREMIUM

Memory Hack

Stick it forever

Vivid mnemonics, acronyms, and mental shortcuts that make every DMBOK concept impossible to forget — even under exam pressure.

e.g. "Data Governance is a COUNTRY — Constitution, Operations, Undertakings, Nations, Treaties, Rights, Yields"

Unlock to read →

PREMIUM

Exam Focus

Know what's tested

Exactly which concepts appear in CDMP questions, how they are phrased, the most common wrong-answer traps, and the precise wording examiners use.

e.g. "CDMP tests the DIFFERENCE between data stewardship and data ownership — here's the exact line"

Unlock to read →

PREMIUM

Deep Dive

Master every detail

Exhaustive breakdowns with real-world examples, edge cases, cross-topic connections, and practitioner-level nuance that separates 70% scorers from 95%+ scorers.

e.g. "Why the 6 Data Quality dimensions overlap — and how CDMP questions exploit that overlap"

Unlock to read →

Premium also includes:

Sim Office

50 realistic workplace tasks — apply DMBOK to real data management scenarios

Interview Prep

10 DMBOK interview tracks with STAR-format model answers

12 Simulation Exams

60 questions each, timed, with full explanations for every option

Per-Topic Quizzes

Mastery quizzes after every topic so you know your gaps before exam day

95%+ Score Target

Our methodology is built around elite scores, not just passing

All 21 DMBOK Topics

Complete coverage of every DAMA DMBOK v2 knowledge area

CDMP Score Levels:

Associate60–69%

Practitioner70–79%

Master80–89%

Elite (Our Target)90–100%

Cancel anytime · Instant access · 95%+ score methodology

Share this article