By the end of Day 8 you had quality gates that block bad data from reaching your XBRL generator. But a quality gate that fails only helps you if someone on the team can quickly answer three questions: What is this column? Where did this data come from? Who is allowed to see it?
If your answer to any of those questions is “open a Confluence page, search Jira, ask the data engineer who built it three years ago” — you have a metadata problem. And in regulated EU banking, a metadata problem is a supervisory finding waiting to happen.
EBA’s IRRBB Guidelines, BCBS 239 Principles 1–5, and the ECB’s SREP data quality expectations all require that banks maintain documented, machine-readable metadata for every data element used in regulatory reports. “Machine-readable” is the operative phrase. A Word document is not machine-readable. OpenMetadata is.
This post builds the catalog layer for the COREP pipeline: auto-discovery of all tables, an EBA regulatory glossary, GDPR PII tagging, lineage federation from Marquez, and the connector configuration that wires it all together.
1. The Three Layers of Metadata
The financial industry’s metadata problem is always described as “we don’t know what our data means.” That statement hides three separate and distinct problems that require different tooling to solve.
| Layer | Name | The question it answers | Who cares | Tool |
|---|---|---|---|---|
| Layer 1 | Technical metadata | “What tables exist? What are the columns and types?” | Data engineers, DBAs | OpenMetadata auto-discovery |
| Layer 2 | Business / semantic metadata | “What does tier mean? Is ‘CET1’ the same as ‘Common Equity Tier 1’ in the EBA glossary?” | Risk officers, auditors, regulators | OpenMetadata Glossary + EBA concept IDs |
| Layer 3 | Operational metadata | “When was this table last refreshed? What pipeline produced it? Did quality pass?” | Compliance, audit, ECB SREP inspectors | OpenMetadata + Marquez lineage + GX quality |
Most banks have Layer 1 covered — they have a data dictionary somewhere. Very few have Layer 2 automated at the column level. Almost none have Layer 3 accessible to non-engineers. BCBS 239 requires all three, and ECB SREP Q&A teams ask questions that span all three in a single question: “Show me the definition of Own Funds in your data dictionary, trace it to your COREP C 01.00 submission, and show me the last time it was quality-checked.”
LAYER 1 — Technical Metadata ┌─────────────────────────────────────────────────────────────────┐ │ PostgreSQL schemas: raw · staging · intermediate · mart │ │ Tables: capital_instruments, rwa_exposures, corep_c0100 ... │ │ Columns: instrument_id VARCHAR, tier TEXT, amount NUMERIC ... │ │ Constraints: PK, NOT NULL, CHECK │ │ ← OpenMetadata PostgreSQL connector auto-discovers all of this │ └─────────────────────────────────────────────────────────────────┘ │ ▼ LAYER 2 — Business / Semantic Metadata ┌─────────────────────────────────────────────────────────────────┐ │ column tier → Glossary term "Capital Tier Classification" │ │ column amount → EBA concept c0010 · unit EUR · decimals -3 │ │ column cet1_ratio→ EBA concept c0050 · floor 4.5% · CRR Art.92│ │ table corep_c0100→ EBA DPM 4.0 Template C 01.00 Own Funds │ │ tag: PII=FALSE → GDPR Article 4 — no personal data in mart │ │ ← Manually curated EBA glossary + column-level tag assignments │ └─────────────────────────────────────────────────────────────────┘ │ ▼ LAYER 3 — Operational Metadata ┌─────────────────────────────────────────────────────────────────┐ │ Last refreshed: 2026-05-07T08:14:33Z (Airflow DAG run) │ │ Lineage: capital_instruments.csv → raw → stg → int → mart │ │ Quality: raw_capital_suite PASS 12/12 · mart_corep_suite PASS │ │ Column lineage: stg.amount → int.total_amount → c0100.own_funds│ │ ← OpenMetadata Lineage API + Marquez federation + GX metadata │ └─────────────────────────────────────────────────────────────────┘
2. OpenMetadata Architecture in the COREP Stack
OpenMetadata runs as a single service on port 8585 backed by MySQL (metadata store) and Elasticsearch (full-text search index). It talks to your data sources via ingestion connectors that run as Python processes — either scheduled inside OpenMetadata itself or triggered from Airflow.
Data Sources OpenMetadata 1.3 Consumers ────────────── ──────────────── ────────── PostgreSQL:5432 ──────────► Metadata Server :8585 ───────► Risk officers dbt project ──────────────► │ MySQL :3306 │ Auditors Airflow DAGs ─────────────► │ Elasticsearch :9200 │ Regulators Marquez :5000 ─────────────► │ OpenMetadata UI │ Compliance Great Expectations ────────► │ │ Data engineers │ Connectors (Python): │ MinIO :9000 ──────────────► │ • postgres_connector │ │ • dbt_connector │ │ • airflow_connector │ │ • openlineage_connector │ └──────────────────────────┘ │ Glossary │ Tags │ Lineage EBA DPM ──►│◄── PII ──│◄── Marquez
OpenMetadata is already in your docker-compose.yml from Day 1. Verify it is running:
docker compose --project-directory C:\reg_repo\platform-solution\corep-governance-pipeline ps | grep openmetadata # Expected: openmetadata running 0.0.0.0:8585->8585/tcp # Default credentials (change immediately in production) # URL: http://localhost:8585 # User: admin Password: admin
3. Connecting OpenMetadata to PostgreSQL
The PostgreSQL connector reads the information schema and pg_catalog to discover every schema, table, column, data type, constraint, and row-count statistic. It runs on a schedule and keeps the catalog current automatically.
3.1 Install the OpenMetadata ingestion library
# In your virtual environment
pip install "openmetadata-ingestion[postgres]==1.3.4"
pip install "openmetadata-ingestion[dbt]==1.3.4"
pip install "openmetadata-ingestion[airflow]==1.3.4"
The ingestion library version must match the server version exactly. If your OpenMetadata Docker image is 1.3.4 then install openmetadata-ingestion==1.3.4. A mismatch produces silent failures where ingestion runs but no metadata appears in the UI.
3.2 PostgreSQL Connector Configuration
# catalog/connectors/postgres_connector.yaml source: type: postgres serviceName: corep-postgres serviceConnection: config: type: Postgres username: ${POSTGRES_USER} authType: password: ${POSTGRES_PASSWORD} hostPort: localhost:5432 database: corep sourceConfig: config: type: DatabaseMetadata schemaFilterPattern: includes: - ^raw$ - ^staging$ - ^intermediate$ - ^mart$ - ^audit$ - ^secure$ - ^mapping$ excludes: - ^pg_.* # exclude internal Postgres schemas - ^information_schema$ tableFilterPattern: includes: - .* # discover all tables in included schemas includeViews: true includeTags: true markDeletedTables: true # flag tables dropped from DB as Deleted in catalog includeDDL: false # DDL may contain sensitive info — do not store sink: type: metadata-rest config: api_endpoint: http://localhost:8585/api auth_provider: openmetadata security_config: jwt_token: ${OM_JWT_TOKEN} # generated in OpenMetadata Settings → Bots workflowConfig: openMetadataServerConfig: hostPort: http://localhost:8585/api authProvider: openmetadata securityConfig: jwtToken: ${OM_JWT_TOKEN}
3.3 Run the PostgreSQL Ingestion
# Run connector metadata ingest -c catalog/connectors/postgres_connector.yaml # Expected output INFO Ingesting metadata for service: corep-postgres INFO Processing schema: raw — 6 tables found INFO Processing schema: staging — 5 tables found INFO Processing schema: intermediate— 4 tables found INFO Processing schema: mart — 4 tables found INFO Processing schema: audit — 2 tables found INFO Writing 21 tables to OpenMetadata... INFO ✓ Ingestion completed successfully.
After this runs, open http://localhost:8585, navigate to Explore → Tables and you will see all 21 tables with column names, data types, and cardinality statistics. This is Layer 1 complete — in under five minutes, with no manual entry.
3.4 What OpenMetadata Discovers Automatically
| Metadata item | Source | Auto-discovered? |
|---|---|---|
| Table name, schema, owner | pg_catalog.pg_tables | Yes |
| Column names and data types | information_schema.columns | Yes |
| Primary key, NOT NULL constraints | information_schema.table_constraints | Yes |
| Row count (approximate) | pg_stat_user_tables.n_live_tup | Yes |
| Column-level cardinality / null fraction | pg_stats | Yes after ANALYZE |
| Table description (COMMENT ON TABLE) | pg_description | Yes |
| Column description (COMMENT ON COLUMN) | pg_description | Yes |
| Business glossary terms | Manual or API | No — curated |
| EBA concept IDs | Manual or API | No — curated |
| PII tags | Manual or policy rule | No — curated |
4. Building the EBA Regulatory Glossary
A glossary in OpenMetadata is a hierarchical dictionary of terms where each term has a definition, synonyms, related terms, and — critically — links to the columns in your tables that represent that concept. This is what bridges the gap between a column called amount in raw.capital_instruments and the EBA concept c0010 — Own funds instruments in the COREP C 01.00 template.
4.1 Glossary Structure for COREP
# Create the EBA glossary via OpenMetadata REST API # catalog/scripts/create_eba_glossary.py import requests, os BASE = os.environ.get("OM_API", "http://localhost:8585/api") HEADERS = { "Authorization": f"Bearer {os.environ['OM_JWT_TOKEN']}", "Content-Type": "application/json", } # ── 1. Create the top-level glossary ───────────────────────────── glossary_payload = { "name": "eba-corep-glossary", "displayName": "EBA COREP Regulatory Glossary", "description": ( "Canonical definitions for all data elements used in COREP regulatory " "reporting. Terms are aligned to EBA Data Point Model 4.0 concept IDs. " "Reference: https://www.eba.europa.eu/risk-analysis-and-data/reporting-frameworks" ), "owners": [], "reviewers": [], "tags": ["regulatory", "eba-dpm-4.0"], "mutuallyExclusive": False, } r = requests.post(f"{BASE}/v1/glossaries", json=glossary_payload, headers=HEADERS) r.raise_for_status() glossary_id = r.json()["id"] print(f"Glossary created: {glossary_id}") # ── 2. Create glossary terms ────────────────────────────────────── EBA_TERMS = [ { "name": "own-funds", "displayName": "Own Funds", "description": ( "Total regulatory capital comprising CET1, AT1, and Tier 2 instruments. " "Defined in CRR Article 4(1)(118). EBA DPM concept: c0010. " "Reported in COREP template C 01.00 row 010." ), "synonyms": ["Total Capital", "Regulatory Capital", "Eligible Capital"], "relatedTerms": [], "references": [ {"name": "CRR Article 4(1)(118)", "endpoint": "https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32013R0575"}, {"name": "EBA DPM 4.0 — c0010", "endpoint": "https://www.eba.europa.eu/sites/default/documents/files/document_library/EBA_DPM_4.0.pdf"}, ], }, { "name": "cet1-capital", "displayName": "Common Equity Tier 1 Capital", "description": ( "Highest quality regulatory capital: ordinary shares, retained earnings, " "other comprehensive income. CRR Article 26. Minimum ratio: 4.5% of RWA. " "EBA DPM concept: c0020. COREP C 01.00 row 020." ), "synonyms": ["CET1", "Core Tier 1"], "references": [ {"name": "CRR Article 26", "endpoint": "https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32013R0575"}, ], }, { "name": "risk-weighted-assets", "displayName": "Risk-Weighted Assets", "description": ( "Assets adjusted for credit, market, and operational risk. The denominator " "of all Basel III capital ratios. Calculated under CRR Part Three. " "EBA DPM concept: c0060. COREP C 02.00." ), "synonyms": ["RWA", "Risk Weighted Exposures"], "references": [ {"name": "CRR Part Three Title II", "endpoint": "https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32013R0575"}, ], }, { "name": "hqla", "displayName": "High Quality Liquid Assets", "description": ( "Assets that can be converted to cash with minimal loss in stressed market " "conditions. Classified as Level 1, 2A, or 2B under Delegated Regulation 2015/61. " "Numerator of the Liquidity Coverage Ratio. EBA DPM: c0010 in C 47.00." ), "synonyms": ["High-Quality Liquid Assets", "LCR Buffer", "Liquidity Buffer"], "references": [ {"name": "Del. Reg. 2015/61 Article 7", "endpoint": "https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32015R0061"}, ], }, { "name": "lcr", "displayName": "Liquidity Coverage Ratio", "description": ( "Ratio of HQLA stock to net cash outflows over a 30-day stress period. " "Minimum requirement: 100%. Del. Reg. 2015/61 Article 4. " "EBA DPM concept: c0090 in COREP C 47.00." ), "synonyms": ["Liquidity Ratio", "30-Day Liquidity"], }, { "name": "exposure-at-default", "displayName": "Exposure at Default", "description": ( "The expected outstanding loan balance at the time of borrower default. " "Input to RWA calculation. CRR Article 166. EBA DPM: ead column in C 02.00." ), "synonyms": ["EAD", "Credit Exposure"], }, ] for term in EBA_TERMS: term["glossary"] = {"id": glossary_id, "type": "glossary"} r = requests.post(f"{BASE}/v1/glossaryTerms", json=term, headers=HEADERS) r.raise_for_status() print(f" Term created: {term['displayName']}") print("EBA glossary complete.")
4.2 Linking Glossary Terms to Columns
Once terms exist, you link them to the columns they describe. This is the step that makes your catalog machine-readable: a query to the OpenMetadata API can now return “column mart.corep_c0100.cet1_capital is defined as EBA concept c0020 — Common Equity Tier 1 Capital.”
# catalog/scripts/tag_columns.py — attach glossary terms to columns import requests, os BASE = os.environ.get("OM_API", "http://localhost:8585/api") HEADERS = { "Authorization": f"Bearer {os.environ['OM_JWT_TOKEN']}", "Content-Type": "application/json", } # Maps (schema.table, column) → glossary term name COLUMN_TERM_MAP = [ ("mart.corep_c0100", "own_funds", "own-funds"), ("mart.corep_c0100", "cet1_capital", "cet1-capital"), ("mart.corep_c0100", "total_rwa", "risk-weighted-assets"), ("mart.corep_c0200", "ead", "exposure-at-default"), ("mart.corep_c0200", "rwa", "risk-weighted-assets"), ("mart.corep_c4700", "hqla_buffer", "hqla"), ("mart.corep_c4700", "lcr_ratio", "lcr"), ("raw.capital_instruments", "amount", "own-funds"), ("raw.rwa_exposures", "ead", "exposure-at-default"), ("raw.liquidity_assets", "market_value", "hqla"), ] def get_table_fqn(schema_table: str) -> str: """Build fully qualified table name for OpenMetadata API.""" schema, table = schema_table.split(".") return f"corep-postgres.corep.{schema}.{table}" def get_table_id(fqn: str) -> str: r = requests.get(f"{BASE}/v1/tables/name/{fqn}", headers=HEADERS) r.raise_for_status() return r.json()["id"] def get_term_id(term_name: str) -> str: r = requests.get( f"{BASE}/v1/glossaryTerms/name/eba-corep-glossary.{term_name}", headers=HEADERS, ) r.raise_for_status() return r.json()["id"] for schema_table, column, term_name in COLUMN_TERM_MAP: fqn = get_table_fqn(schema_table) table_id = get_table_id(fqn) term_id = get_term_id(term_name) # PATCH the column with the glossary term tag patch = [{ "op": "add", "path": f"/columns/{column}/tags/-", "value": { "tagFQN": f"eba-corep-glossary.{term_name}", "source": "Glossary", "labelType": "Automated", "state": "Confirmed", } }] r = requests.patch( f"{BASE}/v1/tables/{table_id}", json=patch, headers={**HEADERS, "Content-Type": "application/json-patch+json"}, ) r.raise_for_status() print(f" Tagged: {schema_table}.{column} → {term_name}")
4.3 GDPR PII Tags
Your stg_counterparties.sql from Day 6 deliberately dropped name and lei columns (data minimisation). But the raw layer still has them. You must tag those raw columns as PII so that Ranger column masking policies (Day 10) can be applied automatically.
Under GDPR, a Legal Entity Identifier (LEI) is personal data when it can be used to identify a natural person who controls or is associated with the legal entity. The EBA’s Validation Rules require LEI reporting in C 02.00 for counterparties above certain thresholds. The raw table holds the LEI; the mart does not — this is your GDPR-compliant architecture. The tag is the documented evidence of that design decision.
# catalog/scripts/apply_pii_tags.py — create classification + apply to raw PII columns # Step 1: Create PII classification tag (one-time setup) classification_payload = { "name": "GDPR", "displayName": "GDPR Data Classification", "description": "Tags for GDPR Article 4 personal data classification.", "mutuallyExclusive": True, } r = requests.post(f"{BASE}/v1/classifications", json=classification_payload, headers=HEADERS) r.raise_for_status() # Step 2: Create tags within the classification PII_TAGS = [ {"name": "PII", "description": "Directly identifies a natural or legal person. Access restricted."}, {"name": "Pseudonymised", "description": "Personal data that has been pseudonymised per GDPR Article 4(5)."}, {"name": "Non-PII", "description": "Confirmed non-personal data — safe to aggregate and share."}, ] for tag in PII_TAGS: tag["classification"] = "GDPR" r = requests.post(f"{BASE}/v1/tags", json=tag, headers=HEADERS) r.raise_for_status() print(f" Tag created: GDPR.{tag['name']}") # Step 3: Apply PII tag to raw columns that contain personal data PII_COLUMNS = [ ("raw.counterparties", "name"), # legal entity name — PII ("raw.counterparties", "lei"), # LEI — PII under GDPR ("raw.loans", "borrower_name"), ("raw.loans", "borrower_id"), ] NON_PII_COLUMNS = [ ("mart.corep_c0100", "own_funds"), ("mart.corep_c0100", "cet1_capital"), ("mart.corep_c0300", "cet1_ratio"), ("mart.corep_c4700", "lcr_ratio"), ] def apply_tag(schema_table, column, tag_fqn): fqn = get_table_fqn(schema_table) table_id = get_table_id(fqn) patch = [{ "op": "add", "path": f"/columns/{column}/tags/-", "value": {"tagFQN": tag_fqn, "source": "Classification", "labelType": "Manual", "state": "Confirmed"}, }] r = requests.patch( f"{BASE}/v1/tables/{table_id}", json=patch, headers={**HEADERS, "Content-Type": "application/json-patch+json"}, ) r.raise_for_status() print(f" Applied {tag_fqn} → {schema_table}.{column}") for schema_table, column in PII_COLUMNS: apply_tag(schema_table, column, "GDPR.PII") for schema_table, column in NON_PII_COLUMNS: apply_tag(schema_table, column, "GDPR.Non-PII")
5. Connecting dbt to OpenMetadata
The dbt connector reads your dbt/target/manifest.json and catalog.json — the artefacts produced by dbt docs generate — and pushes column-level lineage, model descriptions, and test results into OpenMetadata. This is where Layer 3 becomes automated.
# catalog/connectors/dbt_connector.yaml source: type: dbt serviceName: corep-postgres # attach dbt metadata to the existing Postgres service sourceConfig: config: type: DBT dbtConfigSource: dbtCatalogFilePath: dbt/target/catalog.json dbtManifestFilePath: dbt/target/manifest.json dbtRunResultsFilePath: dbt/target/run_results.json # test pass/fail status dbtUpdateDescriptions: true # push dbt model descriptions into OM table descriptions includeDbtTags: true # push dbt tags (e.g. tag: corep, tag: pii_free) sink: type: metadata-rest config: api_endpoint: http://localhost:8585/api auth_provider: openmetadata security_config: jwt_token: ${OM_JWT_TOKEN} workflowConfig: openMetadataServerConfig: hostPort: http://localhost:8585/api authProvider: openmetadata securityConfig: jwtToken: ${OM_JWT_TOKEN}
# Generate dbt artefacts first, then run the connector cd corep-governance-pipeline dbt docs generate --project-dir dbt --profiles-dir dbt # Push dbt metadata to OpenMetadata metadata ingest -c catalog/connectors/dbt_connector.yaml
After this runs, open any mart table in OpenMetadata and click Lineage. You will see the full column-level DAG from CSV source file through to the mart column — driven by dbt’s manifest.json which records every ref() and source() call.
6. Connecting Marquez Lineage to OpenMetadata
Marquez holds the OpenLineage events emitted by your Python modules and Airflow DAG. OpenMetadata can federate this lineage via its OpenLineage connector, showing pipeline-level job-to-dataset edges alongside dbt column-level edges in a single graph.
# catalog/connectors/openlineage_connector.yaml source: type: OpenLineage serviceName: corep-pipeline-lineage serviceConnection: config: type: OpenLineage openLineageApiEndpoint: http://localhost:5000/api/v1 # Marquez API lineageNamespace: corep-governance-pipeline sourceConfig: config: type: PipelineMetadata includeLineage: true sink: type: metadata-rest config: api_endpoint: http://localhost:8585/api auth_provider: openmetadata security_config: jwt_token: ${OM_JWT_TOKEN} workflowConfig: openMetadataServerConfig: hostPort: http://localhost:8585/api authProvider: openmetadata securityConfig: jwtToken: ${OM_JWT_TOKEN}
7. Adding Table Freshness to Layer 3
OpenMetadata can query your tables and compute “time since last updated” by reading the audit.pipeline_run_log table you built in Day 2. This appears as a freshness indicator on every table in the catalog — no extra infrastructure needed.
# Add table profile + freshness config to postgres_connector.yaml source: type: postgres serviceName: corep-postgres serviceConnection: config: type: Postgres # ... (same as before) sourceConfig: config: type: DatabaseMetadata # ... (same as before) # Add a SEPARATE profiler workflow for freshness --- source: type: postgres serviceName: corep-postgres sourceConfig: config: type: Profiler generateSampleData: true computeMetrics: true profileSample: 30.0 # profile 30% of rows (fast) timeoutSeconds: 300 tableConfig: - fullyQualifiedName: corep-postgres.corep.mart.corep_c0100 profileSample: 100.0 # always full-scan mart tables (small) - fullyQualifiedName: corep-postgres.corep.mart.corep_c0300 profileSample: 100.0
8. The catalog.py Module
""" modules/catalog.py — Push pipeline run metadata to OpenMetadata after each run. Responsibilities: - Trigger PostgreSQL connector ingestion (refresh table stats) - Trigger dbt connector ingestion (push manifest + lineage) - Write pipeline run completion time to audit.pipeline_run_log so OpenMetadata freshness checks have a timestamp to read """ import logging, os, subprocess from pathlib import Path from modules.base import BaseModule log = logging.getLogger(__name__) CATALOG_DIR = Path(os.environ.get("CATALOG_DIR", "catalog")) class CatalogModule(BaseModule): MODULE_NAME = "catalog" _CONNECTORS = [ "connectors/postgres_connector.yaml", "connectors/dbt_connector.yaml", "connectors/openlineage_connector.yaml", ] def input_check(self) -> None: """Verify connector YAML files and OM_JWT_TOKEN are present.""" if not os.environ.get("OM_JWT_TOKEN"): raise RuntimeError( "[catalog] OM_JWT_TOKEN not set. " "Generate a bot token in OpenMetadata Settings → Bots." ) missing = [ str(CATALOG_DIR / c) for c in self._CONNECTORS if not (CATALOG_DIR / c).exists() ] if missing: raise RuntimeError("[catalog] Missing connector files: " + ", ".join(missing)) def _execute(self) -> None: for connector_path in self._CONNECTORS: full_path = CATALOG_DIR / connector_path log.info("[catalog] Running ingestion: %s", connector_path) result = subprocess.run( ["metadata", "ingest", "-c", str(full_path)], capture_output=True, text=True, timeout=300, ) if result.returncode != 0: log.warning( "[catalog] Ingestion failed for %s (non-fatal):\n%s", connector_path, result.stderr[-2000:] ) else: log.info("[catalog] Ingestion complete: %s", connector_path) def emit_lineage(self) -> None: log.info("[catalog] No lineage event — catalog is metadata, not data.") def output_check(self) -> None: """Verify OM API responds with tables present.""" import requests api = os.environ.get("OM_API", "http://localhost:8585/api") headers = {"Authorization": f"Bearer {os.environ['OM_JWT_TOKEN']}"} r = requests.get(f"{api}/v1/tables?service=corep-postgres&limit=1", headers=headers, timeout=10) r.raise_for_status() count = r.json().get("paging", {}).get("total", 0) if count == 0: raise RuntimeError("[catalog] OpenMetadata reports 0 tables for corep-postgres service.") log.info("[catalog] OpenMetadata has %d tables indexed for corep-postgres.", count)
9. What the Catalog Looks Like After Setup
| OpenMetadata screen | What you see | Metadata layer |
|---|---|---|
| Explore → Tables → mart.corep_c0100 | Columns, types, row count (1 row), description from dbt | Layer 1 |
| corep_c0100 → Schema tab → cet1_capital column | Glossary term “CET1 Capital”, EBA concept c0020, GDPR.Non-PII tag | Layer 2 |
| corep_c0100 → Lineage tab | Full DAG: CSV → raw → staging → intermediate → mart (dbt column-level) + Airflow job node (Marquez) | Layer 3 |
| corep_c0100 → Activity Feed | Last ingested timestamp, dbt run result (PASS/FAIL), GX quality badge | Layer 3 |
| raw.counterparties → Schema → name column | GDPR.PII tag in red, tooltip: “This column is masked in Trino for non-privileged roles” | Layer 2 |
| Glossary → EBA COREP Regulatory Glossary | 6 terms with definitions, legal references, synonyms, linked columns | Layer 2 |
| Explore → Search “LCR” | Returns corep_c4700, hqla glossary term, raw.liquidity_assets, dbt model int_lcr_hqla | Layer 2 |
10. Answering the ECB SREP Inspector Question in 30 Seconds
The question that triggered this blog post: “Show me the definition of Own Funds in your data dictionary, trace it to your COREP C 01.00 submission, and show me the last time it was quality-checked.”
| What the inspector wants | Where it lives in OpenMetadata | How it got there |
|---|---|---|
| “Definition of Own Funds” | Glossary → EBA COREP → own-funds term with CRR Article 4(1)(118) reference | create_eba_glossary.py |
| “Trace to COREP C 01.00” | mart.corep_c0100 → own_funds column → glossary tag: own-funds | tag_columns.py |
| “Where does the number come from?” | mart.corep_c0100 → Lineage tab → traces to raw.capital_instruments.amount via dbt + Airflow | dbt connector + Marquez connector |
| “Last quality check” | mart.corep_c0100 → Activity Feed → GX checkpoint mart_corep_suite PASS with timestamp | quality.py + OM audit write |
| “Who is allowed to see this?” | raw.counterparties.lei → GDPR.PII tag → Ranger policy link (Day 10) | apply_pii_tags.py |
The inspector does not need to call a data engineer. They can navigate OpenMetadata themselves. Every answer links to evidence. Every piece of evidence has a timestamp. The entire chain from raw CSV to XBRL submission is visible in one screen. This is the practical definition of BCBS 239 Principle 1 — Governance and Architecture.
11. Full Metadata Coverage Map for the COREP Pipeline
| Data asset | Technical meta | Semantic meta | Operational meta |
|---|---|---|---|
raw.capital_instruments | Auto-discovered | EBA glossary + GDPR tags | GX Layer 1 results |
staging.stg_capital_instruments | Auto-discovered | dbt descriptions | dbt lineage + run results |
intermediate.int_capital_by_tier | Auto-discovered | dbt descriptions | dbt lineage |
mart.corep_c0100 | Auto-discovered | EBA glossary + Non-PII tags + C 01.00 link | dbt lineage + GX Layer 2 results |
mart.corep_c0300 | Auto-discovered | EBA glossary (cet1_ratio, lcr) | GX pair checks logged |
raw.counterparties | Auto-discovered | GDPR.PII on name, lei | Airflow job lineage |
Airflow DAG corep_pipeline | Airflow connector | Manual description | Run history + SLA |
| dbt models (all) | dbt connector (manifest) | dbt model descriptions | dbt run_results.json tests |
12. Which Regulation Requires Which Metadata Layer
| Regulation / Principle | Specific requirement | Metadata layer | How OpenMetadata satisfies it |
|---|---|---|---|
| BCBS 239 Principle 1 | Board-approved data governance framework with documented data dictionaries | Layer 2 | EBA glossary with CRR references and owner fields |
| BCBS 239 Principle 3 | Completeness — all material risk data captured | Layer 3 | GX completeness results linked to tables via Activity Feed |
| BCBS 239 Principle 6 | Accuracy — data reflects actual risk positions; reconciliation possible | Layer 3 | Column-level lineage proves the mart number derives from the source file |
| GDPR Article 30 | Records of processing activities — document what personal data you hold | Layer 2 | GDPR.PII tags on raw columns with column-level granularity |
| GDPR Article 25 | Data protection by design — minimise personal data in processing | Layer 2 | GDPR.Non-PII tag on mart confirms minimisation was applied |
| EBA IRRBB GL 2022/14 | Data lineage documented for interest rate risk data elements | Layer 3 | Marquez OpenLineage federation in OpenMetadata lineage tab |
| ECB SREP supervision | Bank can demonstrate data quality and lineage on demand | All three | Single OpenMetadata URL per table answers all three questions |
📚 Day 9 Key Takeaways
- Three distinct metadata layers — Technical (what exists), Semantic (what it means), Operational (what happened to it) — require different tools and different processes to populate.
- Layer 1 is free after five minutes — the PostgreSQL connector auto-discovers all 21 tables, all column types, all row counts with a single
metadata ingestcall. - Layer 2 requires curation — EBA glossary terms must be written by a human who understands CRR and EBA DPM. This is the most valuable and most under-invested layer in every bank.
- GDPR.PII tags at column level are the bridge between OpenMetadata (catalog) and Apache Ranger (enforcement on Day 10). The tag triggers the masking policy automatically.
- The dbt connector pushes column-level lineage from manifest.json directly into OpenMetadata — no OpenLineage event required for dbt models.
- The ECB question — definition + lineage + quality timestamp — is answerable in under 30 seconds when all three layers are populated.
- Next: Day 10 — Apache Ranger + Trino: enforcing the GDPR.PII tags as column masking policies so that no
SELECT * FROM raw.counterpartiesever returns an unmasked LEI to an unauthorised analyst.

