Cracking the EBA XBRL Taxonomy with Arelle — a Python Walkthrough

📅 Day 11 of 18 · COREP Governance Pipeline Series · XBRL Generation

Every piece of work in this series — the dbt models, the quality gates, the catalog, the Ranger masking policies — has been building towards a single deliverable: a valid XBRL instance document that can be submitted to a National Competent Authority.

XBRL is the format the EBA mandates for all COREP submissions. It is XML with a precise vocabulary defined by the EBA taxonomy. Each number in your mart.corep_c0100 table must become an XML element with the correct concept ID, unit reference, decimal precision, period, and entity identifier. Get any of these wrong and the NCA’s validation engine rejects the filing before a human ever reads it.

Arelle is the open-source Python library that understands EBA XBRL taxonomies. It loads the taxonomy, validates instance documents against it, and — with your code — generates the XML from your mart data. This post walks through every step: taxonomy download, Arelle installation, concept resolution, instance XML generation, and the xbrl_gen.py module.

1. What XBRL Actually Is (and Why It Is Hard)

XBRL stands for eXtensible Business Reporting Language. The name suggests it is a language. It is better understood as a structured tagging system for financial facts with a strict schema enforced by a taxonomy.

XBRL concept	What it is	COREP example
Taxonomy	The authoritative schema defining every valid concept, its data type, and the rules for using it	EBA COREP taxonomy v3.3 (2024 version)
Concept (element)	A named data point — like a column definition with data type, unit type, and period type	`ei:c0020` — CET1 capital, monetary, instant
Context	Defines who (entity) and when (period) a fact applies to	Entity: LEIDENTIFIER, Period: 2026-03-31
Unit	The measurement unit for monetary or percentage facts	`iso4217:EUR` for monetary, `xbrli:pure` for ratios
Fact	The actual data value, tagged with a concept, context, and unit	`<ei:c0020 ...>450000000</ei:c0020>`
Instance document	The XML file submitted to the regulator — contains all facts for one reporting period	`COREP_2026Q1_submission.xbrl`
decimals attribute	Precision indicator — how many decimal places the value is reliable to	`decimals="-3"` means value is in thousands

⚠ The decimals Attribute Is Not Optional

The EBA taxonomy validation engine checks decimals on every monetary fact. decimals="-3" means the reported value is in thousands of euros — so 450000 means €450 million. If you report 450000000 with decimals="-3" the validator sees €450 billion and may flag a consistency error against your capital ratio. The mapping file mart_to_xbrl_mapping.yaml you built on Day 7 encodes the correct decimals for every concept.

2. The EBA Taxonomy — Download and Structure

2.1 Download the EBA Reporting Framework

# WSL or Linux terminal — download EBA COREP taxonomy
# EBA publishes taxonomy packages at:
# https://www.eba.europa.eu/risk-analysis-and-data/reporting-frameworks

cd corep-governance-pipeline/data/taxonomy

# EBA Reporting Framework 3.3 (ITS 2024 — valid for submissions from 2024-Q1)
wget https://www.eba.europa.eu/sites/default/documents/files/document_library/\
EBA_taxonomy_3.3.zip -O eba_taxonomy_3.3.zip

unzip eba_taxonomy_3.3.zip -d eba_3.3/

# The taxonomy package structure after extraction:
# eba_3.3/
#   www.eba.europa.eu/
#     eu/fr/xbrl/crr/fws/corep/
#       cor/2024-01-31/
#         mod/
#           c_01.00.xsd   ← C 01.00 Own Funds schema
#           c_02.00.xsd   ← C 02.00 RWA
#           c_03.00.xsd   ← C 03.00 Capital Ratios
#           c_47.00.xsd   ← C 47.00 LCR
#         tab/
#           c_01.00.tab   ← human-readable cell list
#         val/
#           c_01.00_val.xml ← EBA validation rules

📄 What Is in the Taxonomy Package

The taxonomy is a collection of XSD (XML Schema Definition) files, linkbase files, and validation rule files. The XSD files define the concepts (data elements). The linkbase files define relationships — which concepts are allowed in which templates, which concepts are summation-children of other concepts (the calculation linkbase), and which concepts have human-readable labels (the label linkbase). Arelle loads all of these automatically when you point it at the entry point XSD.

2.2 Key Taxonomy Files for COREP

File	Purpose	Used by Arelle for
`corep-full-entry-point.xsd`	Master entry point — links to all template schemas	Loading the complete taxonomy in one call
`c_01.00.xsd`	C 01.00 Own Funds concept definitions	Validating c0010–c0060 fact elements
`c_01.00-cal.xml`	Calculation linkbase — c0010 = c0020 + c0030 + c0040	Summation consistency checks
`c_01.00-lab.xml`	Label linkbase — human-readable names in EN, DE, FR	Generating readable error messages
`c_01.00_val.xml`	EBA business validation rules (e.g. CET1 ≥ 4.5%)	XBRL validation (Day 12)

3. Arelle Installation

# Install Arelle — the open-source XBRL processor
pip install arelle-release==2.30.3

# Verify installation
python -c "from arelle import Cntlr; print('Arelle OK')"

# Add to requirements.txt
# arelle-release==2.30.3    # Day 11 — XBRL generation and validation

⚠ Arelle vs arelle-release

The PyPI package arelle (no suffix) is an unofficial community mirror that may lag behind. Always use arelle-release which is the official Arelle project’s PyPI distribution. The import name is the same: from arelle import Cntlr.

📄 Understanding XBRL Instance Document Structure

4. What a Valid COREP XBRL Instance Looks Like

Before writing Python to generate the instance document, you need to understand what you are generating. Here is a minimal but complete valid COREP XBRL instance fragment for C 01.00:

<?xml version="1.0" encoding="UTF-8"?>
<xbrl
  xmlns="http://www.xbrl.org/2003/instance"
  xmlns:xbrli="http://www.xbrl.org/2003/instance"
  xmlns:link="http://www.xbrl.org/2003/linkbase"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:iso4217="http://www.xbrl.org/2003/iso4217"
  xmlns:ei="http://www.eba.europa.eu/xbrl/crr/dict/con"
  xmlns:find="http://www.eurofiling.info/xbrl/ext/filing-indicators"
>

  <!-- ── Taxonomy reference ───────────────────────────────── -->
  <link:schemaRef
    xlink:type="simple"
    xlink:href="http://www.eba.europa.eu/eu/fr/xbrl/crr/fws/corep/cor/2024-01-31/mod/corep-full-entry-point.xsd"
  />

  <!-- ── Context: entity + reporting period (balance sheet date) ── -->
  <context id="C_2026-03-31_instant">
    <entity>
      <identifier scheme="http://standards.iso.org/iso/17442">
        529900T8BM49AURSDO55        <!-- LEI of the reporting entity -->
      </identifier>
    </entity>
    <period>
      <instant>2026-03-31</instant>   <!-- Q1 2026 balance sheet date -->
    </period>
  </context>

  <!-- ── Unit: EUR (for monetary amounts) ── -->
  <unit id="EUR">
    <measure>iso4217:EUR</measure>
  </unit>

  <!-- ── Unit: pure (for ratios) ── -->
  <unit id="pure">
    <measure>xbrli:pure</measure>
  </unit>

  <!-- ── Filing indicators: declare which templates are included ── -->
  <find:fIndicators xmlns:find="http://www.eurofiling.info/xbrl/ext/filing-indicators">
    <find:filingIndicator contextRef="C_2026-03-31_instant">C 01.00</find:filingIndicator>
    <find:filingIndicator contextRef="C_2026-03-31_instant">C 02.00</find:filingIndicator>
    <find:filingIndicator contextRef="C_2026-03-31_instant">C 03.00</find:filingIndicator>
    <find:filingIndicator contextRef="C_2026-03-31_instant">C 47.00</find:filingIndicator>
  </find:fIndicators>

  <!-- ══════════════════════════════════════════════════════════ -->
  <!-- C 01.00 — Own Funds facts                                 -->
  <!-- ══════════════════════════════════════════════════════════ -->

  <!-- c0010: Own Funds — total (CRR Article 4(1)(118)) -->
  <ei:c0010
    contextRef="C_2026-03-31_instant"
    unitRef="EUR"
    decimals="-3"
  >900000</ei:c0010>              <!-- €900 million (in thousands) -->

  <!-- c0020: CET1 capital -->
  <ei:c0020
    contextRef="C_2026-03-31_instant"
    unitRef="EUR"
    decimals="-3"
  >450000</ei:c0020>              <!-- €450 million -->

  <!-- c0050: CET1 ratio (C 03.00) — pure unit, 6 decimal places -->
  <ei:c0050
    contextRef="C_2026-03-31_instant"
    unitRef="pure"
    decimals="4"
  >0.112583</ei:c0050>           <!-- 11.2583% CET1 ratio -->

</xbrl>

📄 Four Things Every Fact Needs

Look at every <ei:c0020> element above. It has exactly four attributes that are all mandatory:
1. contextRef — links to a <context> element identifying the entity and period
2. unitRef — links to a <unit> element — EUR for monetary, pure for ratios
3. decimals — precision indicator: -3 = thousands, 4 = 4 significant figures
4. The value itself — the raw number from your mart table

Miss any one of these and Arelle’s validator returns an XBRL error. The NCA’s filing system will reject the document before checking the numbers.

🔍 Exploring the EBA Taxonomy with Arelle’s Python API

5. Loading and Exploring the Taxonomy with Arelle

Before generating facts you need to confirm the exact concept QName (qualified name) for each data point. The EBA taxonomy uses the namespace prefix ei: for its concepts but you must verify this programmatically — namespace bindings can change between taxonomy versions.

# xbrl/explore_taxonomy.py — run this once to verify concept names

from arelle import Cntlr, ModelManager
from arelle.ModelValue import qname
import os

TAXONOMY_ENTRY = os.environ.get(
    "EBA_TAXONOMY_ENTRY",
    "data/taxonomy/eba_3.3/www.eba.europa.eu/eu/fr/xbrl/crr/fws/corep/cor/2024-01-31/mod/corep-full-entry-point.xsd",
)

# ── 1. Initialise Arelle controller ──────────────────────────────
cntlr = Cntlr.Cntlr(logFileName="arelle_explore.log")
cntlr.startLogging(logFileName="arelle_explore.log")
modelMgr = ModelManager.ModelManager(cntlr)

# ── 2. Load the taxonomy (this parses all XSD + linkbase files) ──
print("Loading EBA taxonomy — this takes 20-60 seconds on first load...")
modelXbrl = modelMgr.load(TAXONOMY_ENTRY)
print(f"Taxonomy loaded. Concepts: {len(modelXbrl.qnameConcepts)}")

# ── 3. Look up specific concepts by local name ───────────────────
EBA_NAMESPACE = "http://www.eba.europa.eu/xbrl/crr/dict/con"

CONCEPTS_TO_CHECK = [
    "c0010",  # Own Funds
    "c0020",  # CET1
    "c0030",  # AT1
    "c0040",  # T2
    "c0050",  # CET1 ratio (in C 03.00)
    "c0060",  # Total RWA
    "c0090",  # LCR ratio (in C 47.00)
]

print("\n── Concept Inspection ──────────────────────────────────────")
for local_name in CONCEPTS_TO_CHECK:
    qn = qname(EBA_NAMESPACE, local_name)
    concept = modelXbrl.qnameConcepts.get(qn)
    if concept:
        print(f"\n  {local_name}:")
        print(f"    QName     : {concept.qname}")
        print(f"    TypeName  : {concept.typeQname}")
        print(f"    PeriodType: {concept.periodType}")   # instant or duration
        print(f"    Balance   : {concept.balance}")       # debit / credit
        print(f"    Abstract  : {concept.isAbstract}")    # abstract = not a fact
        # Get the English label from the label linkbase
        labels = concept.label(lang="en", fallbackToQname=True)
        print(f"    Label (EN): {labels}")
    else:
        print(f"  {local_name}: NOT FOUND in taxonomy — check namespace or version")

# ── 4. Inspect the calculation linkbase for C 01.00 ─────────────
# This shows the summation tree: c0010 = c0020 + c0030 + c0040
print("\n── Calculation Linkbase (C 01.00 summations) ───────────────")
for rel in modelXbrl.relationshipSet("http://www.xbrl.org/2003/arcrole/summation-item").modelRelationships:
    parent = rel.fromModelObject
    child  = rel.toModelObject
    if parent and "c_01" in str(rel.linkQname):
        print(f"  {parent.qname.localName} → {child.qname.localName}  weight={rel.weight}")

modelMgr.close()
cntlr.close()

# Expected output from explore_taxonomy.py:

Loading EBA taxonomy — this takes 20-60 seconds on first load...
Taxonomy loaded. Concepts: 18420

── Concept Inspection ──────────────────────────────────────

  c0010:
    QName     : {http://www.eba.europa.eu/xbrl/crr/dict/con}c0010
    TypeName  : xbrli:monetaryItemType
    PeriodType: instant
    Balance   : credit
    Abstract  : False
    Label (EN): Own funds

  c0020:
    QName     : {http://www.eba.europa.eu/xbrl/crr/dict/con}c0020
    TypeName  : xbrli:monetaryItemType
    PeriodType: instant
    Balance   : credit
    Abstract  : False
    Label (EN): Common Equity Tier 1 capital

  c0050:
    QName     : {http://www.eba.europa.eu/xbrl/crr/dict/con}c0050
    TypeName  : xbrli:pureItemType
    PeriodType: instant
    Balance   : None
    Abstract  : False
    Label (EN): CET1 capital ratio

── Calculation Linkbase (C 01.00 summations) ───────────────
  c0010 → c0020  weight=1.0
  c0010 → c0030  weight=1.0
  c0010 → c0040  weight=1.0

The calculation linkbase output confirms: c0010 = c0020 + c0030 + c0040. Your mart.corep_c0100 must produce numbers that satisfy this identity or Arelle’s calculation validation will fail on Day 12.

⚙ The xbrl_gen.py Module — Full Implementation

6. Reading the Mart Data from Trino

The XBRL generator reads from Trino — not directly from PostgreSQL. This enforces the architectural rule that all consumers go through Trino, and it means Ranger’s column masking policies are applied even during XBRL generation. The corep_reporting role has the ALLOW policy on PII columns where needed, so this is not a problem — it is the correct behaviour.

# xbrl/read_mart.py — read mart tables via Trino for XBRL generation

from trino.dbapi import connect as trino_connect
import os, logging

log = logging.getLogger(__name__)

TRINO_HOST = os.environ.get("TRINO_HOST", "localhost")
TRINO_PORT = int(os.environ.get("TRINO_PORT", "8080"))
TRINO_USER = os.environ.get("TRINO_XBRL_USER", "corep_reporting_svc")


def fetch_mart_data() -> dict:
    """
    Query all four COREP mart tables via Trino.
    Returns a dict keyed by template code.
    """
    conn = trino_connect(
        host=TRINO_HOST,
        port=TRINO_PORT,
        user=TRINO_USER,
        catalog="postgresql",
        schema="mart",
    )
    cur = conn.cursor()
    mart_data = {}

    # ── C 01.00 — Own Funds ──────────────────────────────────────
    cur.execute("""
        SELECT own_funds, cet1_capital, at1_capital, t2_capital,
               total_rwa, reporting_date
        FROM mart.corep_c0100
        LIMIT 1
    """)
    row = cur.fetchone()
    cols = [d[0] for d in cur.description]
    mart_data["c0100"] = dict(zip(cols, row)) if row else {}

    # ── C 02.00 — RWA by exposure class ──────────────────────────
    cur.execute("""
        SELECT exposure_class, ead, rwa, reporting_date
        FROM mart.corep_c0200
        ORDER BY CASE exposure_class
            WHEN 'central_governments' THEN 1
            WHEN 'institutions'        THEN 2
            WHEN 'corporates'          THEN 3
            WHEN 'retail'              THEN 4
            WHEN 'real_estate'         THEN 5
            WHEN 'equity'              THEN 6
            WHEN 'other'               THEN 7
            ELSE 99
        END
    """)
    rows = cur.fetchall()
    cols = [d[0] for d in cur.description]
    mart_data["c0200"] = [dict(zip(cols, r)) for r in rows]

    # ── C 03.00 — Capital Ratios ──────────────────────────────────
    cur.execute("""
        SELECT cet1_ratio, tier1_ratio, total_capital_ratio,
               leverage_ratio, total_rwa, reporting_date
        FROM mart.corep_c0300
        LIMIT 1
    """)
    row = cur.fetchone()
    cols = [d[0] for d in cur.description]
    mart_data["c0300"] = dict(zip(cols, row)) if row else {}

    # ── C 47.00 — LCR ────────────────────────────────────────────
    cur.execute("""
        SELECT hqla_level1, hqla_level2a, hqla_level2b, hqla_buffer,
               net_outflows, lcr_ratio, reporting_date
        FROM mart.corep_c4700
        LIMIT 1
    """)
    row = cur.fetchone()
    cols = [d[0] for d in cur.description]
    mart_data["c4700"] = dict(zip(cols, row)) if row else {}

    cur.close()
    conn.close()

    # Validate all four templates have data
    empty = [k for k, v in mart_data.items() if not v]
    if empty:
        raise RuntimeError(f"[xbrl_gen] Empty mart tables for templates: {empty}")

    log.info("[xbrl_gen] Mart data fetched for templates: %s", list(mart_data.keys()))
    return mart_data

7. Generating the XBRL Instance Document

# modules/xbrl_gen.py — full implementation
"""
modules/xbrl_gen.py — Generate EBA COREP XBRL instance document from mart tables.

Reads mart data via Trino, loads mart_to_xbrl_mapping.yaml,
and produces a valid XBRL instance document for submission.
"""

import logging, os, yaml
from datetime import date, datetime, timezone
from decimal import Decimal, ROUND_HALF_UP
from pathlib import Path
from xml.etree import ElementTree as ET

from modules.base import BaseModule
from xbrl.read_mart import fetch_mart_data

log = logging.getLogger(__name__)

# ── Namespace map ─────────────────────────────────────────────────
NS = {
    "xbrli":    "http://www.xbrl.org/2003/instance",
    "link":     "http://www.xbrl.org/2003/linkbase",
    "xlink":    "http://www.w3.org/1999/xlink",
    "iso4217":  "http://www.xbrl.org/2003/iso4217",
    "ei":       "http://www.eba.europa.eu/xbrl/crr/dict/con",
    "find":     "http://www.eurofiling.info/xbrl/ext/filing-indicators",
}

TAXONOMY_HREF = (
    "http://www.eba.europa.eu/eu/fr/xbrl/crr/fws/corep/cor/2024-01-31/mod/corep-full-entry-point.xsd"
)

MAPPING_FILE = Path(os.environ.get("XBRL_MAPPING_FILE", "mart_to_xbrl_mapping.yaml"))
OUTPUT_DIR   = Path(os.environ.get("XBRL_OUTPUT_DIR",   "output/xbrl"))


class XbrlGenModule(BaseModule):
    MODULE_NAME = "xbrl_gen"

    def input_check(self) -> None:
        if not MAPPING_FILE.exists():
            raise RuntimeError(f"[xbrl_gen] Mapping file not found: {MAPPING_FILE}")
        OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
        log.info("[xbrl_gen] Mapping file present. Output dir: %s", OUTPUT_DIR)

    def _execute(self) -> None:
        mapping  = yaml.safe_load(MAPPING_FILE.read_text())
        mart     = fetch_mart_data()

        # Determine reporting date from mart data
        reporting_date = (
            mart["c0100"].get("reporting_date")
            or date.today().replace(day=31)  # quarter-end fallback
        )
        entity_lei = os.environ.get("ENTITY_LEI", "529900T8BM49AURSDO55")

        # Register all namespaces (prevents ET from generating ns0: prefixes)
        for prefix, uri in NS.items():
            ET.register_namespace(prefix, uri)

        # ── Build the XBRL root element ───────────────────────────────
        xbrl = ET.Element("{http://www.xbrl.org/2003/instance}xbrl")

        # Schema reference
        schema_ref = ET.SubElement(
            xbrl, "{http://www.xbrl.org/2003/linkbase}schemaRef"
        )
        schema_ref.set("{http://www.w3.org/1999/xlink}type", "simple")
        schema_ref.set("{http://www.w3.org/1999/xlink}href", TAXONOMY_HREF)

        # ── Context ──────────────────────────────────────────────────
        ctx_id = f"C_{reporting_date}_instant"
        ctx    = ET.SubElement(xbrl, "{http://www.xbrl.org/2003/instance}context", id=ctx_id)
        entity = ET.SubElement(ctx,  "{http://www.xbrl.org/2003/instance}entity")
        ident  = ET.SubElement(entity, "{http://www.xbrl.org/2003/instance}identifier")
        ident.set("scheme", "http://standards.iso.org/iso/17442")
        ident.text = entity_lei
        period   = ET.SubElement(ctx, "{http://www.xbrl.org/2003/instance}period")
        instant  = ET.SubElement(period, "{http://www.xbrl.org/2003/instance}instant")
        instant.text = str(reporting_date)

        # ── Units ─────────────────────────────────────────────────────
        unit_eur  = ET.SubElement(xbrl, "{http://www.xbrl.org/2003/instance}unit", id="EUR")
        measure_e = ET.SubElement(unit_eur, "{http://www.xbrl.org/2003/instance}measure")
        measure_e.text = "iso4217:EUR"

        unit_pure  = ET.SubElement(xbrl, "{http://www.xbrl.org/2003/instance}unit", id="pure")
        measure_p  = ET.SubElement(unit_pure, "{http://www.xbrl.org/2003/instance}measure")
        measure_p.text = "xbrli:pure"

        # ── Filing indicators ─────────────────────────────────────────
        find_ns = NS["find"]
        fi_root = ET.SubElement(xbrl, f"{{{find_ns}}}fIndicators")
        for tmpl in ["C 01.00", "C 02.00", "C 03.00", "C 47.00"]:
            fi = ET.SubElement(fi_root, f"{{{find_ns}}}filingIndicator")
            fi.set("contextRef", ctx_id)
            fi.text = tmpl

        # ── Generate facts from mapping ───────────────────────────────
        facts_written = 0
        for template_code, template_mapping in mapping.items():
            mart_key = template_code.lower().replace(" ", "").replace(".", "")
            data     = mart.get(mart_key, {})

            if isinstance(data, list):
                # C 02.00 — multi-row template
                for row in data:
                    facts_written += self._write_facts(
                        xbrl, template_mapping, row, ctx_id
                    )
            else:
                facts_written += self._write_facts(
                    xbrl, template_mapping, data, ctx_id
                )

        # ── Write output file ─────────────────────────────────────────
        ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
        out_path = OUTPUT_DIR / f"COREP_{reporting_date}_{ts}.xbrl"
        tree = ET.ElementTree(xbrl)
        ET.indent(tree, space="  ")   # Python 3.9+ pretty-print
        tree.write(str(out_path), xml_declaration=True, encoding="utf-8")
        log.info("[xbrl_gen] Instance written: %s (%d facts)", out_path, facts_written)

        self._upload_to_minio(out_path)
        self._instance_path = out_path   # stored for output_check

    def _write_facts(
        self,
        xbrl: ET.Element,
        template_mapping: list,
        data: dict,
        ctx_id: str,
    ) -> int:
        """
        Write one set of facts from a mapping list + a data dict.
        Returns count of facts written.
        """
        written = 0
        for mapping_entry in template_mapping:
            mart_col    = mapping_entry["mart_column"]
            concept_id  = mapping_entry["xbrl_concept"]   # e.g. "c0020"
            unit        = mapping_entry["unit"]            # "EUR" or "pure"
            decimals    = mapping_entry["decimals"]        # -3 or 4

            value = data.get(mart_col)
            if value is None:
                log.warning("[xbrl_gen] Missing mart value: %s → skipping concept %s", mart_col, concept_id)
                continue

            # Apply decimal rounding per EBA DPM precision rules
            value_str = self._format_value(value, decimals, unit)

            fact = ET.SubElement(
                xbrl,
                f"{{{NS['ei']}}}{concept_id}"
            )
            fact.set("contextRef", ctx_id)
            fact.set("unitRef",    unit)
            fact.set("decimals",   str(decimals))
            fact.text = value_str
            written += 1

        return written

    @staticmethod
    def _format_value(value, decimals: int, unit: str) -> str:
        """
        Format a mart value for XBRL output.

        EBA DPM rules:
          - Monetary (EUR): decimals=-3 means report value in thousands,
            so divide by 1000 and round to nearest integer.
          - Ratios (pure): decimals=4 means 4 significant figures;
            report the raw decimal (e.g. 0.112583).
        """
        d = Decimal(str(value))
        if unit == "EUR":
            # Report in thousands (decimals=-3)
            d_thousands = (d / Decimal("1000")).quantize(
                Decimal("1"), rounding=ROUND_HALF_UP
            )
            return str(d_thousands)
        else:
            # Pure ratio — keep 6 decimal places as produced by dbt ROUND(6)
            return str(d.quantize(Decimal("0.000001"), rounding=ROUND_HALF_UP))

    def _upload_to_minio(self, out_path: Path) -> None:
        try:
            from minio import Minio
            client = Minio(
                os.environ.get("MINIO_ENDPOINT", "minio:9000"),
                access_key=os.environ.get("MINIO_ROOT_USER", "minioadmin"),
                secret_key=os.environ.get("MINIO_ROOT_PASSWORD", "minioadmin"),
                secure=False,
            )
            bucket = "corep-xbrl-output"
            if not client.bucket_exists(bucket):
                client.make_bucket(bucket)
            client.fput_object(
                bucket, out_path.name, str(out_path),
                content_type="application/xbrl+xml",
            )
            log.info("[xbrl_gen] Uploaded → minio://%s/%s", bucket, out_path.name)
        except Exception as exc:
            log.warning("[xbrl_gen] MinIO upload failed (non-fatal): %s", exc)

    def emit_lineage(self) -> None:
        from openlineage.client.run import RunEvent, RunState, Run, Job, Dataset
        from datetime import datetime, timezone

        event = RunEvent(
            eventType=RunState.COMPLETE,
            eventTime=datetime.now(timezone.utc).isoformat(),
            run=Run(runId=self._run_id),
            job=Job(namespace=self._namespace, name="xbrl_gen"),
            inputs=[
                Dataset(namespace="trino://corep", name="mart.corep_c0100"),
                Dataset(namespace="trino://corep", name="mart.corep_c0200"),
                Dataset(namespace="trino://corep", name="mart.corep_c0300"),
                Dataset(namespace="trino://corep", name="mart.corep_c4700"),
            ],
            outputs=[
                Dataset(
                    namespace="minio://corep-xbrl-output",
                    name=str(getattr(self, "_instance_path", "COREP_unknown.xbrl")),
                )
            ],
            producer="https://github.com/your-org/corep-governance-pipeline/modules/xbrl_gen.py",
        )
        try:
            self._ol_client.emit(event)
            log.info("[xbrl_gen] OpenLineage COMPLETE event emitted.")
        except Exception as exc:
            log.warning("[xbrl_gen] OpenLineage emit failed (non-fatal): %s", exc)

    def output_check(self) -> None:
        """Verify the XBRL file exists and is well-formed XML."""
        out_path = getattr(self, "_instance_path", None)
        if not out_path or not Path(out_path).exists():
            raise RuntimeError("[xbrl_gen] Output XBRL file not found.")
        try:
            ET.parse(str(out_path))
            log.info("[xbrl_gen] Output XBRL is well-formed XML: %s", out_path)
        except ET.ParseError as exc:
            raise RuntimeError(f"[xbrl_gen] Output XBRL is malformed XML: {exc}")

8. Common XBRL Generation Errors and How to Fix Them

Error	Cause	Fix
`Unknown concept: ei:c0010`	Wrong namespace URI or wrong taxonomy version	Run `explore_taxonomy.py` to confirm the EBA namespace for your taxonomy version. It changes between framework releases.
`Calculation inconsistency: c0010 ≠ c0020 + c0030 + c0040`	Rounding in dbt models causes 1–2 unit difference	Use `ROUND(..., 0)` consistently across all mart models. Ensure the parent concept is the sum, not independently computed.
`decimals attribute required`	Missing `decimals` on a fact element	Every fact must have `decimals`. Check `mart_to_xbrl_mapping.yaml` — every entry must have a `decimals` key.
`Value out of allowed range`	Ratio reported as percentage (e.g. 11.25 instead of 0.1125)	EBA ratios are decimals (0–1), not percentages. Check your dbt mart model for `ROUND(ratio * 100, ...)` errors.
`ns0: prefix in generated XML`	Namespace not registered before creating elements	Call `ET.register_namespace(prefix, uri)` for all namespaces before creating any element.
`contextRef not found`	Context ID in fact does not match any `<context id="...">`	Generate the context ID string once and reuse it. Do not use f-strings inline on each fact.
Rejected by NCA: `Invalid LEI format`	Entity LEI in context is not 20 characters	LEI is exactly 20 alphanumeric characters. Validate with: `assert len(lei) == 20 and lei.isalnum()`

9. Running the Generator

# Run just the XBRL generator
python pipeline.py --module xbrl_gen

# Expected output
INFO [xbrl_gen] Mapping file present. Output dir: output/xbrl
INFO [xbrl_gen] Mart data fetched for templates: ['c0100', 'c0200', 'c0300', 'c4700']
INFO [xbrl_gen] Instance written: output/xbrl/COREP_2026-03-31_20260507T081433Z.xbrl (26 facts)
INFO [xbrl_gen] Uploaded → minio://corep-xbrl-output/COREP_2026-03-31_20260507T081433Z.xbrl
INFO [xbrl_gen] OpenLineage COMPLETE event emitted.
INFO [xbrl_gen] Output XBRL is well-formed XML.

# Inspect the generated file
cat output/xbrl/COREP_2026-03-31_*.xbrl | head -60

# Quick fact count check
grep -c "<ei:" output/xbrl/COREP_2026-03-31_*.xbrl
# Expected: 26 (matches the 26 data points in mart_to_xbrl_mapping.yaml)

10. Full Pipeline Flow to This Point

  Day 5          Day 6–7           Day 8      Day 9–10      Day 11
  ──────────     ─────────────     ─────────  ───────────   ──────────────
  CSV files      dbt transform     GX quality  Catalog +     XBRL instance
  ↓              ↓                 gates       Security      ↓
  ingest.py  →   raw.*         →   Layer 1 →  OpenMetadata  xbrl_gen.py
                 staging.*         raw check   + Ranger      ↓
                 intermediate.*    ↓           masking      COREP_*.xbrl
                 mart.*        →   Layer 2 →               ↓
                 corep_c0100        mart check              minio://
                 corep_c0200                                corep-xbrl-output/
                 corep_c0300
                 corep_c4700
                     │
                     └─── mart_to_xbrl_mapping.yaml ────────────►
                          (26 concept IDs, units, decimals)

📚 Day 11 Key Takeaways

XBRL is not just XML — it is typed, constrained XML governed by a taxonomy that defines every valid concept, unit, and calculation relationship. A well-formed XML file can still be an invalid XBRL instance.
Four mandatory attributes per fact — contextRef, unitRef, decimals, and the value. Miss any one and the NCA filing system rejects the document before validation starts.
decimals="-3" means thousands — monetary facts are reported in thousands of euros. A value of 450000 with decimals="-3" means €450 million. This is the most common source of magnitude errors in COREP submissions.
Ratios are decimals, not percentages — 0.1125 not 11.25. The EBA taxonomy enforces this via the xbrli:pureItemType type on ratio concepts.
The calculation linkbase is your friend — explore it with Arelle before generating facts. If c0010 ≠ c0020 + c0030 + c0040 the validator will fail on Day 12 and you will hunt a rounding bug in your dbt models.
XBRL generation reads from Trino, not directly from PostgreSQL. Ranger masking policies apply. This is correct behaviour — it proves the submission data went through the governed query layer.
Next: Day 12 — XBRL validation with Arelle: running the EBA validation rules engine, interpreting error codes, and handling the two classes of validation failure — structural errors and business rule violations.

Cracking the EBA XBRL Taxonomy with Arelle — a Python Walkthrough

1. What XBRL Actually Is (and Why It Is Hard)

2. The EBA Taxonomy — Download and Structure

2.1 Download the EBA Reporting Framework

2.2 Key Taxonomy Files for COREP

3. Arelle Installation

4. What a Valid COREP XBRL Instance Looks Like

5. Loading and Exploring the Taxonomy with Arelle

6. Reading the Mart Data from Trino

7. Generating the XBRL Instance Document

8. Common XBRL Generation Errors and How to Fix Them

9. Running the Generator

10. Full Pipeline Flow to This Point

📚 Day 11 Key Takeaways

XBRL Formula Validation — Why Your COREP Numbers Must Add Up Across Templates

Leave a Reply Cancel reply

You May Be Interested

End-to-End Data Lineage for COREP — Drilling from an XBRL Fact Back to the Source Column

Orchestrating a Regulatory Reporting Pipeline with Apache Airflow

XBRL Formula Validation — Why Your COREP Numbers Must Add Up Across Templates

Production-Ready GKE: The Complete Best Practices Guide for Enterprise Kubernetes Deployments

Production-Ready GKE: The Complete Best Practices Guide for Enterprise Kubernetes Deployments

Production-Ready GKE: The Complete Best Practices Guide for Enterprise Kubernetes Deployments

Production-Ready GKE: The Complete Best Practices Guide for Enterprise Kubernetes Deployments

Production-Ready EKS: The Complete Best Practices Guide for Enterprise Kubernetes Deployments

Cracking the EBA XBRL Taxonomy with Arelle — a Python Walkthrough

1. What XBRL Actually Is (and Why It Is Hard)

2. The EBA Taxonomy — Download and Structure

2.1 Download the EBA Reporting Framework

2.2 Key Taxonomy Files for COREP

3. Arelle Installation

4. What a Valid COREP XBRL Instance Looks Like

5. Loading and Exploring the Taxonomy with Arelle

6. Reading the Mart Data from Trino

7. Generating the XBRL Instance Document

8. Common XBRL Generation Errors and How to Fix Them

9. Running the Generator

10. Full Pipeline Flow to This Point

📚 Day 11 Key Takeaways

Apache Ranger + Trino: Centralised Data Security for a Banking Governance Pipeline

XBRL Formula Validation — Why Your COREP Numbers Must Add Up Across Templates

Leave a Reply Cancel reply

You May Be Interested

End-to-End Data Lineage for COREP — Drilling from an XBRL Fact Back to the Source Column

Orchestrating a Regulatory Reporting Pipeline with Apache Airflow

XBRL Formula Validation — Why Your COREP Numbers Must Add Up Across Templates