Every piece of work in this series — the dbt models, the quality gates, the catalog, the Ranger masking policies — has been building towards a single deliverable: a valid XBRL instance document that can be submitted to a National Competent Authority.
XBRL is the format the EBA mandates for all COREP submissions. It is XML with a precise vocabulary defined by the EBA taxonomy. Each number in your mart.corep_c0100 table must become an XML element with the correct concept ID, unit reference, decimal precision, period, and entity identifier. Get any of these wrong and the NCA’s validation engine rejects the filing before a human ever reads it.
Arelle is the open-source Python library that understands EBA XBRL taxonomies. It loads the taxonomy, validates instance documents against it, and — with your code — generates the XML from your mart data. This post walks through every step: taxonomy download, Arelle installation, concept resolution, instance XML generation, and the xbrl_gen.py module.
1. What XBRL Actually Is (and Why It Is Hard)
XBRL stands for eXtensible Business Reporting Language. The name suggests it is a language. It is better understood as a structured tagging system for financial facts with a strict schema enforced by a taxonomy.
| XBRL concept | What it is | COREP example |
|---|---|---|
| Taxonomy | The authoritative schema defining every valid concept, its data type, and the rules for using it | EBA COREP taxonomy v3.3 (2024 version) |
| Concept (element) | A named data point — like a column definition with data type, unit type, and period type | ei:c0020 — CET1 capital, monetary, instant |
| Context | Defines who (entity) and when (period) a fact applies to | Entity: LEIDENTIFIER, Period: 2026-03-31 |
| Unit | The measurement unit for monetary or percentage facts | iso4217:EUR for monetary, xbrli:pure for ratios |
| Fact | The actual data value, tagged with a concept, context, and unit | <ei:c0020 ...>450000000</ei:c0020> |
| Instance document | The XML file submitted to the regulator — contains all facts for one reporting period | COREP_2026Q1_submission.xbrl |
| decimals attribute | Precision indicator — how many decimal places the value is reliable to | decimals="-3" means value is in thousands |
The EBA taxonomy validation engine checks decimals on every monetary fact. decimals="-3" means the reported value is in thousands of euros — so 450000 means €450 million. If you report 450000000 with decimals="-3" the validator sees €450 billion and may flag a consistency error against your capital ratio. The mapping file mart_to_xbrl_mapping.yaml you built on Day 7 encodes the correct decimals for every concept.
2. The EBA Taxonomy — Download and Structure
2.1 Download the EBA Reporting Framework
# WSL or Linux terminal — download EBA COREP taxonomy # EBA publishes taxonomy packages at: # https://www.eba.europa.eu/risk-analysis-and-data/reporting-frameworks cd corep-governance-pipeline/data/taxonomy # EBA Reporting Framework 3.3 (ITS 2024 — valid for submissions from 2024-Q1) wget https://www.eba.europa.eu/sites/default/documents/files/document_library/\ EBA_taxonomy_3.3.zip -O eba_taxonomy_3.3.zip unzip eba_taxonomy_3.3.zip -d eba_3.3/ # The taxonomy package structure after extraction: # eba_3.3/ # www.eba.europa.eu/ # eu/fr/xbrl/crr/fws/corep/ # cor/2024-01-31/ # mod/ # c_01.00.xsd ← C 01.00 Own Funds schema # c_02.00.xsd ← C 02.00 RWA # c_03.00.xsd ← C 03.00 Capital Ratios # c_47.00.xsd ← C 47.00 LCR # tab/ # c_01.00.tab ← human-readable cell list # val/ # c_01.00_val.xml ← EBA validation rules
The taxonomy is a collection of XSD (XML Schema Definition) files, linkbase files, and validation rule files. The XSD files define the concepts (data elements). The linkbase files define relationships — which concepts are allowed in which templates, which concepts are summation-children of other concepts (the calculation linkbase), and which concepts have human-readable labels (the label linkbase). Arelle loads all of these automatically when you point it at the entry point XSD.
2.2 Key Taxonomy Files for COREP
| File | Purpose | Used by Arelle for |
|---|---|---|
corep-full-entry-point.xsd | Master entry point — links to all template schemas | Loading the complete taxonomy in one call |
c_01.00.xsd | C 01.00 Own Funds concept definitions | Validating c0010–c0060 fact elements |
c_01.00-cal.xml | Calculation linkbase — c0010 = c0020 + c0030 + c0040 | Summation consistency checks |
c_01.00-lab.xml | Label linkbase — human-readable names in EN, DE, FR | Generating readable error messages |
c_01.00_val.xml | EBA business validation rules (e.g. CET1 ≥ 4.5%) | XBRL validation (Day 12) |
3. Arelle Installation
# Install Arelle — the open-source XBRL processor pip install arelle-release==2.30.3 # Verify installation python -c "from arelle import Cntlr; print('Arelle OK')" # Add to requirements.txt # arelle-release==2.30.3 # Day 11 — XBRL generation and validation
The PyPI package arelle (no suffix) is an unofficial community mirror that may lag behind. Always use arelle-release which is the official Arelle project’s PyPI distribution. The import name is the same: from arelle import Cntlr.
4. What a Valid COREP XBRL Instance Looks Like
Before writing Python to generate the instance document, you need to understand what you are generating. Here is a minimal but complete valid COREP XBRL instance fragment for C 01.00:
<?xml version="1.0" encoding="UTF-8"?> <xbrl xmlns="http://www.xbrl.org/2003/instance" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:iso4217="http://www.xbrl.org/2003/iso4217" xmlns:ei="http://www.eba.europa.eu/xbrl/crr/dict/con" xmlns:find="http://www.eurofiling.info/xbrl/ext/filing-indicators" > <!-- ── Taxonomy reference ───────────────────────────────── --> <link:schemaRef xlink:type="simple" xlink:href="http://www.eba.europa.eu/eu/fr/xbrl/crr/fws/corep/cor/2024-01-31/mod/corep-full-entry-point.xsd" /> <!-- ── Context: entity + reporting period (balance sheet date) ── --> <context id="C_2026-03-31_instant"> <entity> <identifier scheme="http://standards.iso.org/iso/17442"> 529900T8BM49AURSDO55 <!-- LEI of the reporting entity --> </identifier> </entity> <period> <instant>2026-03-31</instant> <!-- Q1 2026 balance sheet date --> </period> </context> <!-- ── Unit: EUR (for monetary amounts) ── --> <unit id="EUR"> <measure>iso4217:EUR</measure> </unit> <!-- ── Unit: pure (for ratios) ── --> <unit id="pure"> <measure>xbrli:pure</measure> </unit> <!-- ── Filing indicators: declare which templates are included ── --> <find:fIndicators xmlns:find="http://www.eurofiling.info/xbrl/ext/filing-indicators"> <find:filingIndicator contextRef="C_2026-03-31_instant">C 01.00</find:filingIndicator> <find:filingIndicator contextRef="C_2026-03-31_instant">C 02.00</find:filingIndicator> <find:filingIndicator contextRef="C_2026-03-31_instant">C 03.00</find:filingIndicator> <find:filingIndicator contextRef="C_2026-03-31_instant">C 47.00</find:filingIndicator> </find:fIndicators> <!-- ══════════════════════════════════════════════════════════ --> <!-- C 01.00 — Own Funds facts --> <!-- ══════════════════════════════════════════════════════════ --> <!-- c0010: Own Funds — total (CRR Article 4(1)(118)) --> <ei:c0010 contextRef="C_2026-03-31_instant" unitRef="EUR" decimals="-3" >900000</ei:c0010> <!-- €900 million (in thousands) --> <!-- c0020: CET1 capital --> <ei:c0020 contextRef="C_2026-03-31_instant" unitRef="EUR" decimals="-3" >450000</ei:c0020> <!-- €450 million --> <!-- c0050: CET1 ratio (C 03.00) — pure unit, 6 decimal places --> <ei:c0050 contextRef="C_2026-03-31_instant" unitRef="pure" decimals="4" >0.112583</ei:c0050> <!-- 11.2583% CET1 ratio --> </xbrl>
Look at every <ei:c0020> element above. It has exactly four attributes that are all mandatory:
1. contextRef — links to a <context> element identifying the entity and period
2. unitRef — links to a <unit> element — EUR for monetary, pure for ratios
3. decimals — precision indicator: -3 = thousands, 4 = 4 significant figures
4. The value itself — the raw number from your mart table
Miss any one of these and Arelle’s validator returns an XBRL error. The NCA’s filing system will reject the document before checking the numbers.
5. Loading and Exploring the Taxonomy with Arelle
Before generating facts you need to confirm the exact concept QName (qualified name) for each data point. The EBA taxonomy uses the namespace prefix ei: for its concepts but you must verify this programmatically — namespace bindings can change between taxonomy versions.
# xbrl/explore_taxonomy.py — run this once to verify concept names from arelle import Cntlr, ModelManager from arelle.ModelValue import qname import os TAXONOMY_ENTRY = os.environ.get( "EBA_TAXONOMY_ENTRY", "data/taxonomy/eba_3.3/www.eba.europa.eu/eu/fr/xbrl/crr/fws/corep/cor/2024-01-31/mod/corep-full-entry-point.xsd", ) # ── 1. Initialise Arelle controller ────────────────────────────── cntlr = Cntlr.Cntlr(logFileName="arelle_explore.log") cntlr.startLogging(logFileName="arelle_explore.log") modelMgr = ModelManager.ModelManager(cntlr) # ── 2. Load the taxonomy (this parses all XSD + linkbase files) ── print("Loading EBA taxonomy — this takes 20-60 seconds on first load...") modelXbrl = modelMgr.load(TAXONOMY_ENTRY) print(f"Taxonomy loaded. Concepts: {len(modelXbrl.qnameConcepts)}") # ── 3. Look up specific concepts by local name ─────────────────── EBA_NAMESPACE = "http://www.eba.europa.eu/xbrl/crr/dict/con" CONCEPTS_TO_CHECK = [ "c0010", # Own Funds "c0020", # CET1 "c0030", # AT1 "c0040", # T2 "c0050", # CET1 ratio (in C 03.00) "c0060", # Total RWA "c0090", # LCR ratio (in C 47.00) ] print("\n── Concept Inspection ──────────────────────────────────────") for local_name in CONCEPTS_TO_CHECK: qn = qname(EBA_NAMESPACE, local_name) concept = modelXbrl.qnameConcepts.get(qn) if concept: print(f"\n {local_name}:") print(f" QName : {concept.qname}") print(f" TypeName : {concept.typeQname}") print(f" PeriodType: {concept.periodType}") # instant or duration print(f" Balance : {concept.balance}") # debit / credit print(f" Abstract : {concept.isAbstract}") # abstract = not a fact # Get the English label from the label linkbase labels = concept.label(lang="en", fallbackToQname=True) print(f" Label (EN): {labels}") else: print(f" {local_name}: NOT FOUND in taxonomy — check namespace or version") # ── 4. Inspect the calculation linkbase for C 01.00 ───────────── # This shows the summation tree: c0010 = c0020 + c0030 + c0040 print("\n── Calculation Linkbase (C 01.00 summations) ───────────────") for rel in modelXbrl.relationshipSet("http://www.xbrl.org/2003/arcrole/summation-item").modelRelationships: parent = rel.fromModelObject child = rel.toModelObject if parent and "c_01" in str(rel.linkQname): print(f" {parent.qname.localName} → {child.qname.localName} weight={rel.weight}") modelMgr.close() cntlr.close()
# Expected output from explore_taxonomy.py:
Loading EBA taxonomy — this takes 20-60 seconds on first load...
Taxonomy loaded. Concepts: 18420
── Concept Inspection ──────────────────────────────────────
c0010:
QName : {http://www.eba.europa.eu/xbrl/crr/dict/con}c0010
TypeName : xbrli:monetaryItemType
PeriodType: instant
Balance : credit
Abstract : False
Label (EN): Own funds
c0020:
QName : {http://www.eba.europa.eu/xbrl/crr/dict/con}c0020
TypeName : xbrli:monetaryItemType
PeriodType: instant
Balance : credit
Abstract : False
Label (EN): Common Equity Tier 1 capital
c0050:
QName : {http://www.eba.europa.eu/xbrl/crr/dict/con}c0050
TypeName : xbrli:pureItemType
PeriodType: instant
Balance : None
Abstract : False
Label (EN): CET1 capital ratio
── Calculation Linkbase (C 01.00 summations) ───────────────
c0010 → c0020 weight=1.0
c0010 → c0030 weight=1.0
c0010 → c0040 weight=1.0
The calculation linkbase output confirms: c0010 = c0020 + c0030 + c0040. Your mart.corep_c0100 must produce numbers that satisfy this identity or Arelle’s calculation validation will fail on Day 12.
6. Reading the Mart Data from Trino
The XBRL generator reads from Trino — not directly from PostgreSQL. This enforces the architectural rule that all consumers go through Trino, and it means Ranger’s column masking policies are applied even during XBRL generation. The corep_reporting role has the ALLOW policy on PII columns where needed, so this is not a problem — it is the correct behaviour.
# xbrl/read_mart.py — read mart tables via Trino for XBRL generation from trino.dbapi import connect as trino_connect import os, logging log = logging.getLogger(__name__) TRINO_HOST = os.environ.get("TRINO_HOST", "localhost") TRINO_PORT = int(os.environ.get("TRINO_PORT", "8080")) TRINO_USER = os.environ.get("TRINO_XBRL_USER", "corep_reporting_svc") def fetch_mart_data() -> dict: """ Query all four COREP mart tables via Trino. Returns a dict keyed by template code. """ conn = trino_connect( host=TRINO_HOST, port=TRINO_PORT, user=TRINO_USER, catalog="postgresql", schema="mart", ) cur = conn.cursor() mart_data = {} # ── C 01.00 — Own Funds ────────────────────────────────────── cur.execute(""" SELECT own_funds, cet1_capital, at1_capital, t2_capital, total_rwa, reporting_date FROM mart.corep_c0100 LIMIT 1 """) row = cur.fetchone() cols = [d[0] for d in cur.description] mart_data["c0100"] = dict(zip(cols, row)) if row else {} # ── C 02.00 — RWA by exposure class ────────────────────────── cur.execute(""" SELECT exposure_class, ead, rwa, reporting_date FROM mart.corep_c0200 ORDER BY CASE exposure_class WHEN 'central_governments' THEN 1 WHEN 'institutions' THEN 2 WHEN 'corporates' THEN 3 WHEN 'retail' THEN 4 WHEN 'real_estate' THEN 5 WHEN 'equity' THEN 6 WHEN 'other' THEN 7 ELSE 99 END """) rows = cur.fetchall() cols = [d[0] for d in cur.description] mart_data["c0200"] = [dict(zip(cols, r)) for r in rows] # ── C 03.00 — Capital Ratios ────────────────────────────────── cur.execute(""" SELECT cet1_ratio, tier1_ratio, total_capital_ratio, leverage_ratio, total_rwa, reporting_date FROM mart.corep_c0300 LIMIT 1 """) row = cur.fetchone() cols = [d[0] for d in cur.description] mart_data["c0300"] = dict(zip(cols, row)) if row else {} # ── C 47.00 — LCR ──────────────────────────────────────────── cur.execute(""" SELECT hqla_level1, hqla_level2a, hqla_level2b, hqla_buffer, net_outflows, lcr_ratio, reporting_date FROM mart.corep_c4700 LIMIT 1 """) row = cur.fetchone() cols = [d[0] for d in cur.description] mart_data["c4700"] = dict(zip(cols, row)) if row else {} cur.close() conn.close() # Validate all four templates have data empty = [k for k, v in mart_data.items() if not v] if empty: raise RuntimeError(f"[xbrl_gen] Empty mart tables for templates: {empty}") log.info("[xbrl_gen] Mart data fetched for templates: %s", list(mart_data.keys())) return mart_data
7. Generating the XBRL Instance Document
# modules/xbrl_gen.py — full implementation """ modules/xbrl_gen.py — Generate EBA COREP XBRL instance document from mart tables. Reads mart data via Trino, loads mart_to_xbrl_mapping.yaml, and produces a valid XBRL instance document for submission. """ import logging, os, yaml from datetime import date, datetime, timezone from decimal import Decimal, ROUND_HALF_UP from pathlib import Path from xml.etree import ElementTree as ET from modules.base import BaseModule from xbrl.read_mart import fetch_mart_data log = logging.getLogger(__name__) # ── Namespace map ───────────────────────────────────────────────── NS = { "xbrli": "http://www.xbrl.org/2003/instance", "link": "http://www.xbrl.org/2003/linkbase", "xlink": "http://www.w3.org/1999/xlink", "iso4217": "http://www.xbrl.org/2003/iso4217", "ei": "http://www.eba.europa.eu/xbrl/crr/dict/con", "find": "http://www.eurofiling.info/xbrl/ext/filing-indicators", } TAXONOMY_HREF = ( "http://www.eba.europa.eu/eu/fr/xbrl/crr/fws/corep/cor/2024-01-31/mod/corep-full-entry-point.xsd" ) MAPPING_FILE = Path(os.environ.get("XBRL_MAPPING_FILE", "mart_to_xbrl_mapping.yaml")) OUTPUT_DIR = Path(os.environ.get("XBRL_OUTPUT_DIR", "output/xbrl")) class XbrlGenModule(BaseModule): MODULE_NAME = "xbrl_gen" def input_check(self) -> None: if not MAPPING_FILE.exists(): raise RuntimeError(f"[xbrl_gen] Mapping file not found: {MAPPING_FILE}") OUTPUT_DIR.mkdir(parents=True, exist_ok=True) log.info("[xbrl_gen] Mapping file present. Output dir: %s", OUTPUT_DIR) def _execute(self) -> None: mapping = yaml.safe_load(MAPPING_FILE.read_text()) mart = fetch_mart_data() # Determine reporting date from mart data reporting_date = ( mart["c0100"].get("reporting_date") or date.today().replace(day=31) # quarter-end fallback ) entity_lei = os.environ.get("ENTITY_LEI", "529900T8BM49AURSDO55") # Register all namespaces (prevents ET from generating ns0: prefixes) for prefix, uri in NS.items(): ET.register_namespace(prefix, uri) # ── Build the XBRL root element ─────────────────────────────── xbrl = ET.Element("{http://www.xbrl.org/2003/instance}xbrl") # Schema reference schema_ref = ET.SubElement( xbrl, "{http://www.xbrl.org/2003/linkbase}schemaRef" ) schema_ref.set("{http://www.w3.org/1999/xlink}type", "simple") schema_ref.set("{http://www.w3.org/1999/xlink}href", TAXONOMY_HREF) # ── Context ────────────────────────────────────────────────── ctx_id = f"C_{reporting_date}_instant" ctx = ET.SubElement(xbrl, "{http://www.xbrl.org/2003/instance}context", id=ctx_id) entity = ET.SubElement(ctx, "{http://www.xbrl.org/2003/instance}entity") ident = ET.SubElement(entity, "{http://www.xbrl.org/2003/instance}identifier") ident.set("scheme", "http://standards.iso.org/iso/17442") ident.text = entity_lei period = ET.SubElement(ctx, "{http://www.xbrl.org/2003/instance}period") instant = ET.SubElement(period, "{http://www.xbrl.org/2003/instance}instant") instant.text = str(reporting_date) # ── Units ───────────────────────────────────────────────────── unit_eur = ET.SubElement(xbrl, "{http://www.xbrl.org/2003/instance}unit", id="EUR") measure_e = ET.SubElement(unit_eur, "{http://www.xbrl.org/2003/instance}measure") measure_e.text = "iso4217:EUR" unit_pure = ET.SubElement(xbrl, "{http://www.xbrl.org/2003/instance}unit", id="pure") measure_p = ET.SubElement(unit_pure, "{http://www.xbrl.org/2003/instance}measure") measure_p.text = "xbrli:pure" # ── Filing indicators ───────────────────────────────────────── find_ns = NS["find"] fi_root = ET.SubElement(xbrl, f"{{{find_ns}}}fIndicators") for tmpl in ["C 01.00", "C 02.00", "C 03.00", "C 47.00"]: fi = ET.SubElement(fi_root, f"{{{find_ns}}}filingIndicator") fi.set("contextRef", ctx_id) fi.text = tmpl # ── Generate facts from mapping ─────────────────────────────── facts_written = 0 for template_code, template_mapping in mapping.items(): mart_key = template_code.lower().replace(" ", "").replace(".", "") data = mart.get(mart_key, {}) if isinstance(data, list): # C 02.00 — multi-row template for row in data: facts_written += self._write_facts( xbrl, template_mapping, row, ctx_id ) else: facts_written += self._write_facts( xbrl, template_mapping, data, ctx_id ) # ── Write output file ───────────────────────────────────────── ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ") out_path = OUTPUT_DIR / f"COREP_{reporting_date}_{ts}.xbrl" tree = ET.ElementTree(xbrl) ET.indent(tree, space=" ") # Python 3.9+ pretty-print tree.write(str(out_path), xml_declaration=True, encoding="utf-8") log.info("[xbrl_gen] Instance written: %s (%d facts)", out_path, facts_written) self._upload_to_minio(out_path) self._instance_path = out_path # stored for output_check def _write_facts( self, xbrl: ET.Element, template_mapping: list, data: dict, ctx_id: str, ) -> int: """ Write one set of facts from a mapping list + a data dict. Returns count of facts written. """ written = 0 for mapping_entry in template_mapping: mart_col = mapping_entry["mart_column"] concept_id = mapping_entry["xbrl_concept"] # e.g. "c0020" unit = mapping_entry["unit"] # "EUR" or "pure" decimals = mapping_entry["decimals"] # -3 or 4 value = data.get(mart_col) if value is None: log.warning("[xbrl_gen] Missing mart value: %s → skipping concept %s", mart_col, concept_id) continue # Apply decimal rounding per EBA DPM precision rules value_str = self._format_value(value, decimals, unit) fact = ET.SubElement( xbrl, f"{{{NS['ei']}}}{concept_id}" ) fact.set("contextRef", ctx_id) fact.set("unitRef", unit) fact.set("decimals", str(decimals)) fact.text = value_str written += 1 return written @staticmethod def _format_value(value, decimals: int, unit: str) -> str: """ Format a mart value for XBRL output. EBA DPM rules: - Monetary (EUR): decimals=-3 means report value in thousands, so divide by 1000 and round to nearest integer. - Ratios (pure): decimals=4 means 4 significant figures; report the raw decimal (e.g. 0.112583). """ d = Decimal(str(value)) if unit == "EUR": # Report in thousands (decimals=-3) d_thousands = (d / Decimal("1000")).quantize( Decimal("1"), rounding=ROUND_HALF_UP ) return str(d_thousands) else: # Pure ratio — keep 6 decimal places as produced by dbt ROUND(6) return str(d.quantize(Decimal("0.000001"), rounding=ROUND_HALF_UP)) def _upload_to_minio(self, out_path: Path) -> None: try: from minio import Minio client = Minio( os.environ.get("MINIO_ENDPOINT", "minio:9000"), access_key=os.environ.get("MINIO_ROOT_USER", "minioadmin"), secret_key=os.environ.get("MINIO_ROOT_PASSWORD", "minioadmin"), secure=False, ) bucket = "corep-xbrl-output" if not client.bucket_exists(bucket): client.make_bucket(bucket) client.fput_object( bucket, out_path.name, str(out_path), content_type="application/xbrl+xml", ) log.info("[xbrl_gen] Uploaded → minio://%s/%s", bucket, out_path.name) except Exception as exc: log.warning("[xbrl_gen] MinIO upload failed (non-fatal): %s", exc) def emit_lineage(self) -> None: from openlineage.client.run import RunEvent, RunState, Run, Job, Dataset from datetime import datetime, timezone event = RunEvent( eventType=RunState.COMPLETE, eventTime=datetime.now(timezone.utc).isoformat(), run=Run(runId=self._run_id), job=Job(namespace=self._namespace, name="xbrl_gen"), inputs=[ Dataset(namespace="trino://corep", name="mart.corep_c0100"), Dataset(namespace="trino://corep", name="mart.corep_c0200"), Dataset(namespace="trino://corep", name="mart.corep_c0300"), Dataset(namespace="trino://corep", name="mart.corep_c4700"), ], outputs=[ Dataset( namespace="minio://corep-xbrl-output", name=str(getattr(self, "_instance_path", "COREP_unknown.xbrl")), ) ], producer="https://github.com/your-org/corep-governance-pipeline/modules/xbrl_gen.py", ) try: self._ol_client.emit(event) log.info("[xbrl_gen] OpenLineage COMPLETE event emitted.") except Exception as exc: log.warning("[xbrl_gen] OpenLineage emit failed (non-fatal): %s", exc) def output_check(self) -> None: """Verify the XBRL file exists and is well-formed XML.""" out_path = getattr(self, "_instance_path", None) if not out_path or not Path(out_path).exists(): raise RuntimeError("[xbrl_gen] Output XBRL file not found.") try: ET.parse(str(out_path)) log.info("[xbrl_gen] Output XBRL is well-formed XML: %s", out_path) except ET.ParseError as exc: raise RuntimeError(f"[xbrl_gen] Output XBRL is malformed XML: {exc}")
8. Common XBRL Generation Errors and How to Fix Them
| Error | Cause | Fix |
|---|---|---|
Unknown concept: ei:c0010 | Wrong namespace URI or wrong taxonomy version | Run explore_taxonomy.py to confirm the EBA namespace for your taxonomy version. It changes between framework releases. |
Calculation inconsistency: c0010 ≠ c0020 + c0030 + c0040 | Rounding in dbt models causes 1–2 unit difference | Use ROUND(..., 0) consistently across all mart models. Ensure the parent concept is the sum, not independently computed. |
decimals attribute required | Missing decimals on a fact element | Every fact must have decimals. Check mart_to_xbrl_mapping.yaml — every entry must have a decimals key. |
Value out of allowed range | Ratio reported as percentage (e.g. 11.25 instead of 0.1125) | EBA ratios are decimals (0–1), not percentages. Check your dbt mart model for ROUND(ratio * 100, ...) errors. |
ns0: prefix in generated XML | Namespace not registered before creating elements | Call ET.register_namespace(prefix, uri) for all namespaces before creating any element. |
contextRef not found | Context ID in fact does not match any <context id="..."> | Generate the context ID string once and reuse it. Do not use f-strings inline on each fact. |
Rejected by NCA: Invalid LEI format | Entity LEI in context is not 20 characters | LEI is exactly 20 alphanumeric characters. Validate with: assert len(lei) == 20 and lei.isalnum() |
9. Running the Generator
# Run just the XBRL generator python pipeline.py --module xbrl_gen # Expected output INFO [xbrl_gen] Mapping file present. Output dir: output/xbrl INFO [xbrl_gen] Mart data fetched for templates: ['c0100', 'c0200', 'c0300', 'c4700'] INFO [xbrl_gen] Instance written: output/xbrl/COREP_2026-03-31_20260507T081433Z.xbrl (26 facts) INFO [xbrl_gen] Uploaded → minio://corep-xbrl-output/COREP_2026-03-31_20260507T081433Z.xbrl INFO [xbrl_gen] OpenLineage COMPLETE event emitted. INFO [xbrl_gen] Output XBRL is well-formed XML. # Inspect the generated file cat output/xbrl/COREP_2026-03-31_*.xbrl | head -60 # Quick fact count check grep -c "<ei:" output/xbrl/COREP_2026-03-31_*.xbrl # Expected: 26 (matches the 26 data points in mart_to_xbrl_mapping.yaml)
10. Full Pipeline Flow to This Point
Day 5 Day 6–7 Day 8 Day 9–10 Day 11 ────────── ───────────── ───────── ─────────── ────────────── CSV files dbt transform GX quality Catalog + XBRL instance ↓ ↓ gates Security ↓ ingest.py → raw.* → Layer 1 → OpenMetadata xbrl_gen.py staging.* raw check + Ranger ↓ intermediate.* ↓ masking COREP_*.xbrl mart.* → Layer 2 → ↓ corep_c0100 mart check minio:// corep_c0200 corep-xbrl-output/ corep_c0300 corep_c4700 │ └─── mart_to_xbrl_mapping.yaml ────────────► (26 concept IDs, units, decimals)
📚 Day 11 Key Takeaways
- XBRL is not just XML — it is typed, constrained XML governed by a taxonomy that defines every valid concept, unit, and calculation relationship. A well-formed XML file can still be an invalid XBRL instance.
- Four mandatory attributes per fact —
contextRef,unitRef,decimals, and the value. Miss any one and the NCA filing system rejects the document before validation starts. decimals="-3"means thousands — monetary facts are reported in thousands of euros. A value of450000withdecimals="-3"means €450 million. This is the most common source of magnitude errors in COREP submissions.- Ratios are decimals, not percentages —
0.1125not11.25. The EBA taxonomy enforces this via thexbrli:pureItemTypetype on ratio concepts. - The calculation linkbase is your friend — explore it with Arelle before generating facts. If
c0010 ≠ c0020 + c0030 + c0040the validator will fail on Day 12 and you will hunt a rounding bug in your dbt models. - XBRL generation reads from Trino, not directly from PostgreSQL. Ranger masking policies apply. This is correct behaviour — it proves the submission data went through the governed query layer.
- Next: Day 12 — XBRL validation with Arelle: running the EBA validation rules engine, interpreting error codes, and handling the two classes of validation failure — structural errors and business rule violations.

