What Is Metadata?
Metadata is data about data. It provides the context, structure, and meaning that makes raw data understandable, discoverable, and usable. Without metadata, a column of numbers in a database is meaningless — metadata tells you that those numbers represent "Annual Revenue in USD for fiscal year 2024, sourced from the ERP system, last updated 2024-12-31".
The DAMA DMBOK v2 defines metadata as "data that describes other data, providing context and meaning". This seemingly simple definition encompasses an enormous range of information — from the technical specifications of a database column to the business definition of a KPI to the access history of a sensitive data asset.
The Three Types of Metadata
Business Metadata
Business metadata describes the business meaning, context, and rules associated with data. It is the metadata that business users need to understand and use data effectively.
Examples include:
- Business definitions of data elements (e.g., "Active Customer: a customer who has made at least one purchase in the last 12 months")
- Data ownership (who is responsible for this data?)
- Data stewardship (who maintains the quality of this data?)
- Business rules and constraints
- Data classification (public, internal, confidential, restricted)
- Regulatory and compliance tags
Technical Metadata
Technical metadata describes the technical characteristics of data — how it is structured, stored, and processed.
Examples include:
- Data types, formats, and lengths (e.g., VARCHAR(255), DATE, DECIMAL(10,2))
- Table and column names, primary keys, foreign keys
- Data lineage — where data came from, how it was transformed, where it flows to
- ETL/ELT job definitions and transformation logic
- Database schemas and entity-relationship diagrams
- API specifications
Operational Metadata
Operational metadata describes the operational history and usage of data — how it has been used, when it was last updated, and how it is performing.
Examples include:
- Last updated timestamp and update frequency
- Data volume and row counts
- Access logs — who accessed what data, when
- ETL job run history — success/failure, duration, records processed
- Data quality scores and quality check history
- Usage statistics — which reports and dashboards consume this data
What Is a Data Catalog?
A data catalog is an organised inventory of an organisation's data assets, enriched with metadata to make those assets discoverable, understandable, and trustworthy. Think of it as the library catalogue for your organisation's data — it tells you what data exists, where it is, what it means, and whether it is fit for your purpose.
A modern enterprise data catalog typically provides:
- Search and discovery: Business users can search for data assets using business terms, not just technical names
- Business glossary integration: Links data assets to their business definitions
- Data lineage visualisation: Shows where data comes from and where it goes
- Data quality scores: Displays quality metrics alongside data assets
- Collaboration features: Users can rate, comment on, and certify data assets
- Access request workflows: Users can request access to data assets directly from the catalog
The Business Glossary
The business glossary is one of the most valuable components of a metadata management programme. It is a curated collection of agreed, authoritative definitions for business terms — the vocabulary that the organisation uses to talk about its data.
Without a business glossary, different departments use the same term to mean different things. "Revenue" might mean gross revenue to the sales team, net revenue to finance, and recognised revenue to accounting. These definitional inconsistencies cause data conflicts, reporting discrepancies, and poor decision-making.
A well-maintained business glossary resolves these conflicts by providing a single, agreed definition for each term, along with the business context, related terms, and the data assets that implement the concept.
Data Lineage
Data lineage is the metadata that tracks the origin, movement, and transformation of data throughout its lifecycle. End-to-end lineage shows: where data was created (source systems), how it was transformed (ETL/ELT jobs, business rules), and where it is consumed (reports, dashboards, downstream systems).
Data lineage is critical for:
- Impact analysis: If a source system changes, lineage shows which downstream reports and systems will be affected
- Root cause analysis: When a data quality issue is detected, lineage helps trace it back to its source
- Regulatory compliance: GDPR requires organisations to know where personal data is stored and how it flows — lineage provides this
- Trust: Users trust data more when they can see where it came from and how it was processed
Metadata Management and the CDMP Exam
Metadata Management carries an 11% weighting in the CDMP exam — one of the highest of any knowledge area. Key exam topics include: the three types of metadata and examples of each, the role of the data catalog and business glossary, data lineage and its uses, the metadata architecture (active vs. passive metadata repositories), and the relationship between metadata management and data governance.