Data Quality Gates for Regulatory Reporting — What Great Expectations Catches That dbt Tests Miss

dbt tests catch transformation bugs. Great Expectations catches bad source data before dbt runs. Build two quality layers for COREP regulatory reporting with full Python code.

Bhanu

May 7, 2026

12 min read

📅 Day 8 of 18 · COREP Governance Pipeline Series · Data Quality

You finished Day 7 with four dbt mart models that map your bank data to EBA DPM 4.0 templates. dbt tests verify that cet1_ratio >= 0.045 and lcr_ratio >= 1.0 — and they run inside the transformation step. That sounds thorough. It isn’t.

A Basel III capital ratio test failing inside dbt means the problem surfaces only after you’ve already transformed potentially tainted data into your mart. The raw data has already been loaded, staged, and joined. The bad number is already in the lineage graph.

Regulators don’t accept “we caught it after transformation.” BCBS 239 Principle 3 requires completeness and accuracy at the point of origination. You need a quality gate before dbt runs — and a separate one after the mart is built.

This post builds both layers using Great Expectations (GX) and wires them into your pipeline via quality.py.

1. Why You Need Two Quality Layers

Think of data quality in a regulatory pipeline the same way a manufacturer thinks about quality control on a production line: you inspect raw materials before they enter the machine, and you inspect finished goods before they ship. Inspecting only at the end doesn’t tell you which raw material batch was the problem.

Layer	When it runs	What it validates	Framework	Failure effect
Layer 1 — Raw gate	After `ingest.py` loads CSVs into raw.*	Column presence, nulls, domain values, numeric ranges on source data	Great Expectations	Pipeline halts before dbt runs
Layer 2 — Mart gate	After `dbt run` completes	Regulatory floors (Basel III ratios), cross-table consistency, XBRL decimal precision	Great Expectations + dbt tests	Pipeline halts before XBRL generation

⚠ What Happens Without Layer 1

Without a raw gate, a single CSV row where risk_weight_pct = 1500 (a data-entry typo) would pass through stg_rwa_exposures.sql filter (BETWEEN 0 AND 12.5 is a cast to decimal — the string “1500” casts and then fails the filter, silently dropping the row), causing your RWA total to be understated. The mart model produces a number. dbt tests pass. You file an incorrect COREP report.

2. What dbt Tests Do — and Where They Stop

dbt ships four built-in generic tests plus a rich ecosystem of packages (dbt_utils, dbt_expectations). They are excellent for structural invariants on transformed models. Here is what they do well:

dbt test	What it checks	Where it runs
`not_null`	Column has no NULL rows	On any model, any layer
`unique`	No duplicate values	On any model
`accepted_values`	Value in a fixed list	On staging / mart
`relationships`	FK referential integrity	Between models
`dbt_utils.expression_is_true`	Arbitrary SQL expression	On any model

Here is what dbt tests cannot do:

Requirement	dbt can do it?	Why not
Statistical distribution check (mean ± 3σ on capital ratios)	No	No built-in stats; `dbt_utils` has no distribution tests
Row count within expected band (e.g. 50–5000 rows)	Partial	`dbt_utils.expression_is_true` can do it but is awkward
Column-level completeness % (≥ 95% populated)	No	Built-in `not_null` is all-or-nothing
Cross-column conditional logic (if tier=CET1 then amount must be positive)	Hard	Requires custom macro + SQL injection risk
Profiling: min, max, mean, p5, p95 stored as metadata	No	dbt tests are pass/fail, no metrics storage
HTML data docs for human audit review	No	dbt docs don’t include per-column statistics
Expectation suites versioned separately from SQL models	No	dbt tests live in `schema.yml` tied to model versions
Re-run quality check without re-running transformation	No	`dbt test` queries the transformed table but has no separate validation run

✓ The Right Mental Model

dbt tests = transformation correctness. They confirm your SQL logic is right. Great Expectations = data correctness. It confirms your data — independent of your SQL — meets regulatory business rules. Both are mandatory. Neither replaces the other.

3. Great Expectations Architecture in the COREP Pipeline

  CSV files                            PostgreSQL
  /data/source/                        corep database
  ┌──────────────────┐                 ┌────────────────────────────────────┐
  │ capital_          │   ingest.py     │  raw.*          staging.*           │
  │ instruments.csv   │ ──────────────► │  capital_       stg_capital_        │
  │ rwa_exposures.csv │                 │  instruments    instruments         │
  │ ...               │                 │  rwa_exposures  stg_rwa_exposures   │
  └──────────────────┘                 │  ...            ...                 │
                                        │                                    │
  ◄── GX LAYER 1 (raw gate) runs here  │  intermediate.* mart.*             │
      Suite: raw_capital_suite           │  int_capital_   corep_c0100         │
      Suite: raw_rwa_suite               │  by_tier        corep_c0200         │
      Suite: raw_liquidity_suite         │  ...            ...                 │
                                        │                                    │
      PASS → dbt run proceeds          │  ◄── GX LAYER 2 (mart gate)        │
      FAIL → pipeline halts             │      Suite: mart_corep_suite       │
             audit log written           │                                    │
             Airflow branch: skip_xbrl  └────────────────────────────────────┘
                                                        │
                                          PASS → xbrl_gen.py runs
                                          FAIL → BranchPythonOperator
                                                 routes to quarantine

GX uses three concepts you need to understand:

GX concept	What it is	Analogy
Expectation	A single assertion: “column X has values between A and B”	A single test case
Expectation Suite	A named collection of expectations stored as JSON	A test class / spec file
Checkpoint	Binds a suite to a data source and runs it, producing a ValidationResult	A test runner invocation
Data Docs	HTML report generated from ValidationResults — human-readable audit evidence	Test coverage report
Data Context	Root configuration object — knows where your suites, results, and docs live	pytest configuration + fixtures

4. Install Great Expectations

# Inside your Python virtual environment
pip install great-expectations==0.18.19 sqlalchemy psycopg2-binary

⚠ Version Pin Is Not Optional

GX made breaking API changes between 0.17, 0.18, and 1.x. Pin to 0.18.19 which is the last stable 0.18 release. The 1.x rewrite renamed most classes. This post uses the 0.18 API throughout.

Add it to requirements.txt:

great-expectations==0.18.19   # Day 8 — data quality gates

5. GX Project Structure

corep-governance-pipeline/
└── gx/
    ├── great_expectations.yml     # Data Context config
    ├── expectations/
    │   ├── raw_capital_suite.json
    │   ├── raw_rwa_suite.json
    │   ├── raw_liquidity_suite.json
    │   └── mart_corep_suite.json
    ├── checkpoints/
    │   ├── raw_capital_checkpoint.yml
    │   ├── raw_rwa_checkpoint.yml
    │   ├── raw_liquidity_checkpoint.yml
    │   └── mart_corep_checkpoint.yml
    └── uncommitted/
        └── data_docs/
            └── local_site/        # HTML output → also uploaded to MinIO

6. `great_expectations.yml` — Data Context

# gx/great_expectations.yml
config_version: 3.0

datasources:
  corep_postgres:
    class_name: Datasource
    execution_engine:
      class_name: SqlAlchemyExecutionEngine
      connection_string: ${COREP_GX_DB_URL}         # injected from .env
    data_connectors:
      default_inferred_data_connector_name:
        class_name: InferredAssetSqlDataConnector
        include_schema_name: true

stores:
  expectations_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: expectations/

  validations_store:
    class_name: ValidationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/validations/

  evaluation_parameter_store:
    class_name: EvaluationParameterStore

  checkpoint_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: checkpoints/

data_docs_sites:
  local_site:
    class_name: SiteBuilder
    show_how_to_buttons: false
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
      class_name: DefaultSiteIndexPageRenderer

expectations_store_name: expectations_store
validations_store_name: validations_store
evaluation_parameter_store_name: evaluation_parameter_store
checkpoint_store_name: checkpoint_store

🔒 Security Note — Connection String

The ${COREP_GX_DB_URL} substitution tells GX to read from the environment variable. Never hardcode credentials in great_expectations.yml. Your .env file already contains COREP_GX_DB_URL=postgresql+psycopg2://corep_admin:${POSTGRES_PASSWORD}@localhost:5432/corep.

7. Layer 1 — Raw Expectation Suites

7.1 Capital Instruments Suite

# gx/expectations/raw_capital_suite.json (abbreviated — full file below)
# Build programmatically in Python, then save to JSON

import great_expectations as gx

context = gx.get_context(context_root_dir="gx")
suite = context.add_expectation_suite("raw_capital_suite")

validator = context.get_validator(
    datasource_name="corep_postgres",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="raw.capital_instruments",
    expectation_suite_name="raw_capital_suite",
)

# ── Table-level ──────────────────────────────────────────
validator.expect_table_row_count_to_be_between(min_value=10, max_value=50_000)
validator.expect_table_columns_to_match_set(
    column_set=[
        "instrument_id", "name", "tier", "amount",
        "currency", "issuance_date", "maturity_date"
    ],
    exact_match=False   # allow extra columns — forward compatible
)

# ── Column: instrument_id ─────────────────────────────────
validator.expect_column_values_to_not_be_null("instrument_id")
validator.expect_column_values_to_be_unique("instrument_id")

# ── Column: tier ─────────────────────────────────────────
validator.expect_column_values_to_not_be_null("tier")
validator.expect_column_values_to_be_in_set(
    "tier", value_set=["CET1", "AT1", "T2"]
)

# ── Column: amount ───────────────────────────────────────
validator.expect_column_values_to_not_be_null("amount")
validator.expect_column_values_to_be_between(
    "amount", min_value=0, max_value=1_000_000_000_000,
    mostly=0.99   # allow 1% outliers — GX's "mostly" parameter
)
validator.expect_column_mean_to_be_between(
    "amount", min_value=1_000, max_value=100_000_000
)

# ── Column: currency ─────────────────────────────────────
validator.expect_column_values_to_match_regex(
    "currency", regex=r"^[A-Z]{3}$"   # ISO 4217 format
)

# ── Column: issuance_date ────────────────────────────────
validator.expect_column_values_to_match_regex(
    "issuance_date", regex=r"^\d{4}-\d{2}-\d{2}$"
)

# ── Cross-column: CET1 tier amounts must be positive ─────
# GX SQL expression expectation — dbt has no equivalent
validator.expect_column_values_to_not_be_null(
    "amount",
    row_condition='tier == "CET1"',
    condition_parser="pandas"
)

validator.save_expectation_suite(discard_failed_expectations=False)
print("raw_capital_suite saved.")

7.2 What “mostly” Unlocks

The mostly parameter is GX’s most powerful feature for regulatory data. It lets you express completeness requirements as a percentage threshold rather than an all-or-nothing assertion.

Field	dbt `not_null` test	GX with `mostly=0.95`	Regulatory interpretation
`amount`	Fails on first NULL	Passes if ≥ 95% populated	Some instruments may legitimately have no amount at reporting date (e.g., contingent instruments)
`maturity_date`	Fails on first NULL	Passes if ≥ 80% populated	Perpetual instruments (AT1) have no maturity — NULL is valid
`lei`	Fails on first NULL	Passes if ≥ 90% populated	LEI registration may be in-progress for new counterparties
`instrument_id"`	Correct: 100% required	Correct: `mostly=1.0`	Primary key — zero tolerance

7.3 RWA Exposures Suite

# Key expectations for raw.rwa_exposures

validator.expect_table_row_count_to_be_between(min_value=5, max_value=100_000)

validator.expect_column_values_to_be_in_set(
    "exposure_class",
    value_set=[
        "central_governments", "institutions", "corporates",
        "retail", "real_estate", "equity", "other"
    ]
)

# EBA CORR: risk_weight_pct is a decimal (0–1250%), not a percentage integer
# This catches the common ETL error of loading 75.0 as 7500 (×100 shift)
validator.expect_column_values_to_be_between(
    "risk_weight_pct", min_value=0.0, max_value=12.5,  # 1250% = 12.5 in decimal
    mostly=0.999
)

validator.expect_column_mean_to_be_between(
    "risk_weight_pct", min_value=0.1, max_value=3.0
)
# If mean > 3.0 something has gone catastrophically wrong in the source system

validator.expect_column_values_to_be_between(
    "ead", min_value=0, max_value=500_000_000_000, mostly=0.99
)

# Cross-column: rwa must not exceed ead × 12.5 (maximum risk weight)
validator.expect_column_pair_values_to_be_in_set(
    column_A="rwa", column_B="ead",
    value_pairs_set=None   # not a set check — use custom SQL expectation below
)

# Custom SQL expectation — impossible in dbt without a macro
validator.expect_column_values_to_not_be_null(
    "rwa",
    row_condition='exposure_class == "corporates"',
    condition_parser="pandas"
)

7.4 Liquidity Assets Suite

# Key expectations for raw.liquidity_assets

validator.expect_column_values_to_be_in_set(
    "hqla_level", value_set=["1", "2A", "2B"]
)

# haircut_rate must be a decimal fraction, not a whole number percent
validator.expect_column_values_to_be_between(
    "haircut_rate", min_value=0.0, max_value=1.0   # 0% to 100% as fraction
)
# EBA Delegated Regulation 2015/61 specifies specific haircut levels:
# Level 1: 0%, Level 2A: 15%, Level 2B: 25-50%
# This check ensures haircut is sane before LCR calculation

validator.expect_column_quantile_values_to_be_between(
    "market_value",
    quantile_ranges={
        "quantiles": [0.05, 0.50, 0.95],
        "value_ranges": [[1_000, None], [100_000, None], [None, 50_000_000_000]]
    }
)
# Quantile expectations — entirely impossible in dbt built-in tests

8. Layer 2 — Mart Expectation Suite

After dbt runs, the mart tables contain the numbers that will go into your XBRL instance document. These are your last line of defence before regulatory submission.

# gx/expectations/mart_corep_suite.json — built programmatically

validator = context.get_validator(
    datasource_name="corep_postgres",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="mart.corep_c0300",  # Capital Ratios template
    expectation_suite_name="mart_corep_suite",
)

# ── Basel III regulatory minimums ────────────────────────────────────────
# CRR Article 92(1)(a): CET1 ≥ 4.5%
validator.expect_column_values_to_be_between(
    "cet1_ratio", min_value=0.045, max_value=1.0
)

# CRR Article 92(1)(b): Tier 1 ≥ 6.0%
validator.expect_column_values_to_be_between(
    "tier1_ratio", min_value=0.06, max_value=1.0
)

# CRR Article 92(1)(c): Total Capital ≥ 8.0%
validator.expect_column_values_to_be_between(
    "total_capital_ratio", min_value=0.08, max_value=1.0
)

# CRR2 Article 429: Leverage ratio ≥ 3.0%
validator.expect_column_values_to_be_between(
    "leverage_ratio", min_value=0.03, max_value=1.0
)

# ── Cross-ratio consistency ───────────────────────────────────────────────
# tier1_ratio ≥ cet1_ratio always (CET1 ⊂ Tier1)
# total_capital_ratio ≥ tier1_ratio always (T1 ⊂ Total Capital)
# These are impossible to express in a single dbt test without a macro
validator.expect_column_pair_values_A_to_be_greater_than_or_equal_to_B(
    "tier1_ratio", "cet1_ratio"
)
validator.expect_column_pair_values_A_to_be_greater_than_or_equal_to_B(
    "total_capital_ratio", "tier1_ratio"
)

# ── XBRL decimal precision check ─────────────────────────────────────────
# EBA DPM requires monetary values in thousands (decimals=-3)
# Ratios need 6 decimal places (decimals=4 in XBRL = 4 significant figures)
# This ensures no floating-point garbage makes it into the XBRL document
validator.expect_column_values_to_match_regex(
    "cet1_ratio", regex=r"^\d+\.\d{6}$",
    meta={"notes": "EBA DPM requires 6dp for ratio values per xbrl decimals=4"}
)

# ── Table completeness ───────────────────────────────────────────────────
validator.expect_table_row_count_to_equal(1)
# C 03.00 always produces exactly one row — the reporting period totals

validator.save_expectation_suite(discard_failed_expectations=False)

📋 The Cross-Ratio Consistency Check Is Your Most Important Gate

The check tier1_ratio ≥ cet1_ratio is a mathematical identity that must hold because CET1 is a subset of Tier 1 capital. If this fails it means one of three things: a bug in your dbt aggregation logic, a sign error in the source data, or a schema mismatch between two mart models. Any of these would produce an incorrect COREP report. dbt tests cannot express this because it is a cross-column constraint, not a single-column assertion.

9. `quality.py` — The Module Implementation

"""
modules/quality.py — Run GX expectation suites as pipeline quality gates.

Layer 1 (raw gate)  : runs after ingest, before dbt
Layer 2 (mart gate) : runs after dbt, before xbrl_gen
"""

import logging
import os
from pathlib import Path

import great_expectations as gx
from great_expectations.core.batch import BatchRequest
from great_expectations.checkpoint import SimpleCheckpoint

from modules.base import BaseModule

log = logging.getLogger(__name__)

GX_DIR = Path(os.environ.get("GX_DIR", "gx"))


class QualityGateError(RuntimeError):
    """Raised when a GX checkpoint produces any failed expectations."""
    pass


class QualityModule(BaseModule):
    MODULE_NAME = "quality"

    # Ordered list of (checkpoint_name, suite_name, asset_name)
    # Layer 1 = raw tables, Layer 2 = mart tables
    _CHECKPOINTS = [
        # Layer 1 — run before dbt
        ("raw_capital_checkpoint",   "raw_capital_suite",   "raw.capital_instruments"),
        ("raw_rwa_checkpoint",        "raw_rwa_suite",        "raw.rwa_exposures"),
        ("raw_liquidity_checkpoint",  "raw_liquidity_suite",  "raw.liquidity_assets"),
        ("raw_outflows_checkpoint",   "raw_outflows_suite",   "raw.liquidity_outflows"),
        # Layer 2 — run after dbt mart build
        ("mart_corep_checkpoint",     "mart_corep_suite",     "mart.corep_c0300"),
    ]

    def input_check(self) -> None:
        """Verify GX context directory and expectation suite JSON files exist."""
        if not GX_DIR.exists():
            raise RuntimeError(
                f"[quality] GX directory not found: {GX_DIR}. "
                "Run: great_expectations init"
            )
        suites_dir = GX_DIR / "expectations"
        missing = [
            suite
            for _, suite, _ in self._CHECKPOINTS
            if not (suites_dir / f"{suite}.json").exists()
        ]
        if missing:
            raise RuntimeError(
                "[quality] Missing expectation suite JSON files: "
                + ", ".join(missing)
            )
        log.info("[quality] All %d expectation suites present.", len(self._CHECKPOINTS))

    def _execute(self) -> None:
        """Run all GX checkpoints. Raise QualityGateError on any failure."""
        context = gx.get_context(context_root_dir=str(GX_DIR))
        failed_suites = []
        results_summary = []

        for checkpoint_name, suite_name, asset_name in self._CHECKPOINTS:
            log.info(
                "[quality] Running checkpoint: %s → asset: %s",
                checkpoint_name, asset_name
            )
            result = context.run_checkpoint(
                checkpoint_name=checkpoint_name,
                batch_request=BatchRequest(
                    datasource_name="corep_postgres",
                    data_connector_name="default_inferred_data_connector_name",
                    data_asset_name=asset_name,
                ),
            )

            passed = result.success
            stats  = result.get_statistics()
            evaluated  = stats.get("evaluated_expectations", 0)
            successful = stats.get("successful_expectations", 0)
            pct        = stats.get("success_percent", 0.0)

            results_summary.append({
                "suite":      suite_name,
                "asset":      asset_name,
                "evaluated":  evaluated,
                "passed":     successful,
                "pct":        pct,
                "status":     "PASS" if passed else "FAIL",
            })

            log.info(
                "[quality] %s: %s (%d/%d expectations, %.1f%%)",
                suite_name, "PASS" if passed else "FAIL",
                successful, evaluated, pct
            )

            if not passed:
                failed_suites.append(suite_name)

        # Build data docs (HTML report)
        context.build_data_docs()
        self._upload_data_docs_to_minio(context)
        self._write_audit(results_summary)

        if failed_suites:
            raise QualityGateError(
                f"[quality] {len(failed_suites)} suite(s) failed: "
                + ", ".join(failed_suites)
                + ". Pipeline halted — see GX data docs for details."
            )

    def _upload_data_docs_to_minio(self, context) -> None:
        """Upload HTML data docs to MinIO for persistent audit evidence."""
        try:
            from minio import Minio
            docs_dir = GX_DIR / "uncommitted" / "data_docs" / "local_site"
            client = Minio(
                os.environ.get("MINIO_ENDPOINT", "minio:9000"),
                access_key=os.environ.get("MINIO_ROOT_USER", "minioadmin"),
                secret_key=os.environ.get("MINIO_ROOT_PASSWORD", "minioadmin"),
                secure=False,
            )
            bucket = "corep-gx-reports"
            if not client.bucket_exists(bucket):
                client.make_bucket(bucket)

            for html_file in docs_dir.rglob("*.html"):
                object_name = str(html_file.relative_to(docs_dir))
                client.fput_object(bucket, object_name, str(html_file), content_type="text/html")
                log.info("[quality] Uploaded data doc → minio://%s/%s", bucket, object_name)
        except Exception as exc:
            log.warning("[quality] MinIO data docs upload failed (non-fatal): %s", exc)

    def _write_audit(self, results: list) -> None:
        """Persist quality run summary to audit.pipeline_run_log."""
        import json
        from modules.base import _pg_conn
        conn = _pg_conn()
        try:
            cur = conn.cursor()
            cur.execute(
                """
                INSERT INTO audit.pipeline_run_log
                    (run_id, module_name, status, metadata, ran_at)
                VALUES (%s, 'quality', %s, %s, now())
                """,
                (
                    self._run_id,
                    "FAIL" if any(r["status"] == "FAIL" for r in results) else "PASS",
                    json.dumps(results),
                ),
            )
            conn.commit()
        finally:
            conn.close()

    def emit_lineage(self) -> None:
        # Quality runs are validation-only — no data written, no lineage event needed
        log.info("[quality] No lineage event emitted (read-only validation step).")

    def output_check(self) -> None:
        # If _execute completed without raising, all checkpoints passed
        log.info("[quality] output_check: all suites passed (no QualityGateError raised).")

10. Airflow Branching on Quality Failure

Your DAG from Day 14 uses BranchPythonOperator. Here is how the quality gate hooks into the branch logic:

# dags/corep_pipeline_dag.py — quality gate branch logic

def _quality_branch(**context) -> str:
    """Return next task ID based on quality gate result stored in XCom."""
    quality_status = context["task_instance"].xcom_pull(
        task_ids="run_quality_gates", key="quality_status"
    )
    if quality_status == "PASS":
        return "run_xbrl_generation"
    return "quarantine_failed_run"    # writes to audit, alerts ops team

run_quality_gates = PythonOperator(
    task_id="run_quality_gates",
    python_callable=_run_quality_module,
)

branch_on_quality = BranchPythonOperator(
    task_id="branch_on_quality",
    python_callable=_quality_branch,
)

# DAG flow
(
    run_ingest
    >> run_quality_layer1
    >> run_dbt_staging
    >> run_dbt_intermediate
    >> run_dbt_mart
    >> run_quality_layer2      # second gate after mart build
    >> branch_on_quality
)
branch_on_quality >> run_xbrl_generation
branch_on_quality >> quarantine_failed_run

11. dbt Tests vs Great Expectations — Full Comparison

Capability	dbt built-in tests	dbt_expectations package	Great Expectations 0.18
Not null	Yes	Yes	Yes
Unique values	Yes	Yes	Yes
Accepted value set	Yes	Yes	Yes
Value range (between)	Via expression_is_true	Yes	Yes
Regex pattern match	No	Yes	Yes
Completeness % (mostly)	No	No	Yes — native
Column mean / std-dev range	No	No	Yes
Quantile value ranges	No	No	Yes
Cross-column pair ordering	No	No	Yes
Row-conditional check (IF tier=CET1 THEN…)	No	No	Yes — row_condition
Table row count band	Awkward	Yes	Yes
HTML audit report output	No	No	Yes — Data Docs
Runs independently of transformation	No	No	Yes — checkpoint
Suites versioned in JSON	No (schema.yml)	No	Yes
Runs on raw layer before dbt	No	No	Yes

12. Data Docs — Your Audit Evidence

Every time GX runs a checkpoint, it updates an HTML report in gx/uncommitted/data_docs/local_site/. This report is your regulatory audit evidence that quality was checked before submission.

📋 BCBS 239 Principle 3 — Completeness

“A BCIB should be able to capture and aggregate all material risk data across the banking group. Data should be available by business line, legal entity, asset type, industry, region and other groupings, as relevant for the risk in question.”

Your GX data docs prove that completeness was measured per column, per table, per pipeline run. The HTML file timestamped before submission is your evidence. Upload it to MinIO so it persists independently of the pipeline container.

To view the data docs locally after a pipeline run:

# From WSL / inside the pipeline container
great_expectations docs build --directory gx

# Open in browser (WSL path → Windows path)
explorer.exe "$(wslpath -w gx/uncommitted/data_docs/local_site/index.html)"

13. Mapping Quality Gates to BCBS 239 Principles

BCBS 239 Principle	Requirement	How GX satisfies it
P3 — Completeness	Capture all material risk data	`mostly=0.95` completeness checks on all raw tables
P4 — Timeliness	Data available on time for risk decisions	Checkpoint runtime logged to `audit.pipeline_run_log`
P5 — Adaptability	Risk data adaptable to varying scenarios	Separate suites per table — update one suite without touching others
P6 — Accuracy	Data reflects actual risk positions	Cross-ratio pair checks, statistical distribution bounds, regex format checks
P7 — Completeness (reporting)	All material risk positions in reports	Row count check on mart tables confirms all data points present
P8 — Clarity	Reconciliation between risk reports	Cross-column pair checks: `tier1_ratio ≥ cet1_ratio` always holds

14. Run the Quality Gates

# Full pipeline run (includes quality layer 1 after ingest)
python pipeline.py --module quality

# Run only the raw gate (useful during development)
python pipeline.py --from ingest --to quality

# Run only the mart gate after dbt is already complete
python pipeline.py --from quality --to quality

# Check exit code — quality gate failures raise exit code 1
echo $?

# Expected output on PASS
INFO [quality] Running checkpoint: raw_capital_checkpoint → asset: raw.capital_instruments
INFO [quality] raw_capital_suite: PASS (12/12 expectations, 100.0%)
INFO [quality] Running checkpoint: raw_rwa_checkpoint → asset: raw.rwa_exposures
INFO [quality] raw_rwa_suite: PASS (9/9 expectations, 100.0%)
INFO [quality] Running checkpoint: raw_liquidity_checkpoint → asset: raw.liquidity_assets
INFO [quality] raw_liquidity_suite: PASS (11/11 expectations, 100.0%)
INFO [quality] Running checkpoint: mart_corep_checkpoint → asset: mart.corep_c0300
INFO [quality] mart_corep_suite: PASS (8/8 expectations, 100.0%)
INFO [quality] Data docs built at gx/uncommitted/data_docs/local_site/index.html
INFO [quality] output_check: all suites passed (no QualityGateError raised).

# Expected output on FAIL (e.g. risk weight typo in source data)
INFO [quality] raw_rwa_suite: FAIL (7/9 expectations, 77.8%)
ERROR [quality] 1 suite(s) failed: raw_rwa_suite. Pipeline halted — see GX data docs for details.
Traceback: QualityGateError: [quality] 1 suite(s) failed...

📚 Day 8 Key Takeaways

Two layers are mandatory — raw gate before dbt, mart gate before XBRL. dbt tests can only run on transformed tables.
mostly is the killer feature — it lets you express regulatory completeness requirements as a percentage rather than all-or-nothing, matching real-world data realities.
Cross-column pair checks (tier1_ratio ≥ cet1_ratio) are impossible in dbt tests but essential for Basel III consistency verification.
Statistical distribution expectations (mean, quantiles) catch scale/magnitude errors — the most common ETL bug with financial data (e.g. basis points vs decimal).
Data Docs are audit evidence — upload to MinIO, timestamp before submission. BCBS 239 auditors expect proof that quality was checked.
Quality module is read-only — no lineage event needed. OpenLineage lineage events are for data-writing steps only.
Next: Day 9 — OpenMetadata catalog: auto-discovery of all tables built so far, EBA glossary terms, PII tags, and lineage federation with Marquez.

Published: May 07, 2026

Updated: May 07, 2026

Infographic showing how dbt SQL maps bank data to COREP templates, connecting PostgreSQL, Trino, and Marquez data sources through dbt transformations to produce CRR3‑compliant XBRL reports using Arelle, Great Expectations, and OpenLineage.

Mapping Bank Data to COREP Templates — the dbt SQL That Makes It Work

May 7, 2026

11 min read

Three Layers of Metadata Every Bank Must Manage — and How OpenMetadata Handles All of Them

May 7, 2026

13 min read

Add a comment

Cracking the EBA XBRL Taxonomy with Arelle — a Python Walkthrough

May 7, 2026

12 min read

Apache Ranger + Trino: Centralised Data Security for a Banking Governance Pipeline

May 7, 2026

13 min read

Three Layers of Metadata Every Bank Must Manage — and How OpenMetadata Handles All of Them

May 7, 2026

13 min read

Data Quality Gates for Regulatory Reporting — What Great Expectations Catches That dbt Tests Miss

1. Why You Need Two Quality Layers

2. What dbt Tests Do — and Where They Stop

3. Great Expectations Architecture in the COREP Pipeline

4. Install Great Expectations

5. GX Project Structure

6. `great_expectations.yml` — Data Context

7. Layer 1 — Raw Expectation Suites

7.1 Capital Instruments Suite

7.2 What “mostly” Unlocks

7.3 RWA Exposures Suite

7.4 Liquidity Assets Suite

8. Layer 2 — Mart Expectation Suite

9. `quality.py` — The Module Implementation

10. Airflow Branching on Quality Failure

11. dbt Tests vs Great Expectations — Full Comparison

12. Data Docs — Your Audit Evidence

13. Mapping Quality Gates to BCBS 239 Principles

14. Run the Quality Gates

📚 Day 8 Key Takeaways

Mapping Bank Data to COREP Templates — the dbt SQL That Makes It Work

Leave a Reply Cancel reply

You May Be Interested

Cracking the EBA XBRL Taxonomy with Arelle — a Python Walkthrough

Production-Ready GKE: The Complete Best Practices Guide for Enterprise Kubernetes Deployments

Production-Ready GKE: The Complete Best Practices Guide for Enterprise Kubernetes Deployments

Production-Ready GKE: The Complete Best Practices Guide for Enterprise Kubernetes Deployments

Production-Ready GKE: The Complete Best Practices Guide for Enterprise Kubernetes Deployments

Production-Ready EKS: The Complete Best Practices Guide for Enterprise Kubernetes Deployments

Data Quality Gates for Regulatory Reporting — What Great Expectations Catches That dbt Tests Miss

1. Why You Need Two Quality Layers

2. What dbt Tests Do — and Where They Stop

3. Great Expectations Architecture in the COREP Pipeline

4. Install Great Expectations

5. GX Project Structure

6. great_expectations.yml — Data Context

7. Layer 1 — Raw Expectation Suites

7.1 Capital Instruments Suite

7.2 What “mostly” Unlocks

7.3 RWA Exposures Suite

7.4 Liquidity Assets Suite

8. Layer 2 — Mart Expectation Suite

9. quality.py — The Module Implementation

10. Airflow Branching on Quality Failure

11. dbt Tests vs Great Expectations — Full Comparison

12. Data Docs — Your Audit Evidence

13. Mapping Quality Gates to BCBS 239 Principles

14. Run the Quality Gates

📚 Day 8 Key Takeaways

Mapping Bank Data to COREP Templates — the dbt SQL That Makes It Work

Three Layers of Metadata Every Bank Must Manage — and How OpenMetadata Handles All of Them

Leave a Reply Cancel reply

You May Be Interested

Cracking the EBA XBRL Taxonomy with Arelle — a Python Walkthrough

Apache Ranger + Trino: Centralised Data Security for a Banking Governance Pipeline

Three Layers of Metadata Every Bank Must Manage — and How OpenMetadata Handles All of Them

6. `great_expectations.yml` — Data Context

9. `quality.py` — The Module Implementation