You finished Day 7 with four dbt mart models that map your bank data to EBA DPM 4.0 templates.
dbt tests verify that cet1_ratio >= 0.045 and lcr_ratio >= 1.0 — and they run inside the transformation step.
That sounds thorough. It isn’t.
A Basel III capital ratio test failing inside dbt means the problem surfaces only after you’ve already transformed potentially tainted data into your mart. The raw data has already been loaded, staged, and joined. The bad number is already in the lineage graph.
Regulators don’t accept “we caught it after transformation.” BCBS 239 Principle 3 requires completeness and accuracy at the point of origination. You need a quality gate before dbt runs — and a separate one after the mart is built.
This post builds both layers using Great Expectations (GX) and wires them into your pipeline via quality.py.
1. Why You Need Two Quality Layers
Think of data quality in a regulatory pipeline the same way a manufacturer thinks about quality control on a production line: you inspect raw materials before they enter the machine, and you inspect finished goods before they ship. Inspecting only at the end doesn’t tell you which raw material batch was the problem.
| Layer | When it runs | What it validates | Framework | Failure effect |
|---|---|---|---|---|
| Layer 1 — Raw gate | After ingest.py loads CSVs into raw.* | Column presence, nulls, domain values, numeric ranges on source data | Great Expectations | Pipeline halts before dbt runs |
| Layer 2 — Mart gate | After dbt run completes | Regulatory floors (Basel III ratios), cross-table consistency, XBRL decimal precision | Great Expectations + dbt tests | Pipeline halts before XBRL generation |
Without a raw gate, a single CSV row where risk_weight_pct = 1500 (a data-entry typo) would pass through stg_rwa_exposures.sql filter (BETWEEN 0 AND 12.5 is a cast to decimal — the string “1500” casts and then fails the filter, silently dropping the row), causing your RWA total to be understated. The mart model produces a number. dbt tests pass. You file an incorrect COREP report.
2. What dbt Tests Do — and Where They Stop
dbt ships four built-in generic tests plus a rich ecosystem of packages (dbt_utils, dbt_expectations).
They are excellent for structural invariants on transformed models. Here is what they do well:
| dbt test | What it checks | Where it runs |
|---|---|---|
not_null | Column has no NULL rows | On any model, any layer |
unique | No duplicate values | On any model |
accepted_values | Value in a fixed list | On staging / mart |
relationships | FK referential integrity | Between models |
dbt_utils.expression_is_true | Arbitrary SQL expression | On any model |
Here is what dbt tests cannot do:
| Requirement | dbt can do it? | Why not |
|---|---|---|
| Statistical distribution check (mean ± 3σ on capital ratios) | No | No built-in stats; dbt_utils has no distribution tests |
| Row count within expected band (e.g. 50–5000 rows) | Partial | dbt_utils.expression_is_true can do it but is awkward |
| Column-level completeness % (≥ 95% populated) | No | Built-in not_null is all-or-nothing |
| Cross-column conditional logic (if tier=CET1 then amount must be positive) | Hard | Requires custom macro + SQL injection risk |
| Profiling: min, max, mean, p5, p95 stored as metadata | No | dbt tests are pass/fail, no metrics storage |
| HTML data docs for human audit review | No | dbt docs don’t include per-column statistics |
| Expectation suites versioned separately from SQL models | No | dbt tests live in schema.yml tied to model versions |
| Re-run quality check without re-running transformation | No | dbt test queries the transformed table but has no separate validation run |
dbt tests = transformation correctness. They confirm your SQL logic is right. Great Expectations = data correctness. It confirms your data — independent of your SQL — meets regulatory business rules. Both are mandatory. Neither replaces the other.
3. Great Expectations Architecture in the COREP Pipeline
CSV files PostgreSQL
/data/source/ corep database
┌──────────────────┐ ┌────────────────────────────────────┐
│ capital_ │ ingest.py │ raw.* staging.* │
│ instruments.csv │ ──────────────► │ capital_ stg_capital_ │
│ rwa_exposures.csv │ │ instruments instruments │
│ ... │ │ rwa_exposures stg_rwa_exposures │
└──────────────────┘ │ ... ... │
│ │
◄── GX LAYER 1 (raw gate) runs here │ intermediate.* mart.* │
Suite: raw_capital_suite │ int_capital_ corep_c0100 │
Suite: raw_rwa_suite │ by_tier corep_c0200 │
Suite: raw_liquidity_suite │ ... ... │
│ │
PASS → dbt run proceeds │ ◄── GX LAYER 2 (mart gate) │
FAIL → pipeline halts │ Suite: mart_corep_suite │
audit log written │ │
Airflow branch: skip_xbrl └────────────────────────────────────┘
│
PASS → xbrl_gen.py runs
FAIL → BranchPythonOperator
routes to quarantine
GX uses three concepts you need to understand:
| GX concept | What it is | Analogy |
|---|---|---|
| Expectation | A single assertion: “column X has values between A and B” | A single test case |
| Expectation Suite | A named collection of expectations stored as JSON | A test class / spec file |
| Checkpoint | Binds a suite to a data source and runs it, producing a ValidationResult | A test runner invocation |
| Data Docs | HTML report generated from ValidationResults — human-readable audit evidence | Test coverage report |
| Data Context | Root configuration object — knows where your suites, results, and docs live | pytest configuration + fixtures |
4. Install Great Expectations
# Inside your Python virtual environment
pip install great-expectations==0.18.19 sqlalchemy psycopg2-binary
GX made breaking API changes between 0.17, 0.18, and 1.x. Pin to 0.18.19 which is the last stable 0.18 release. The 1.x rewrite renamed most classes. This post uses the 0.18 API throughout.
Add it to requirements.txt:
great-expectations==0.18.19 # Day 8 — data quality gates
5. GX Project Structure
corep-governance-pipeline/
└── gx/
├── great_expectations.yml # Data Context config
├── expectations/
│ ├── raw_capital_suite.json
│ ├── raw_rwa_suite.json
│ ├── raw_liquidity_suite.json
│ └── mart_corep_suite.json
├── checkpoints/
│ ├── raw_capital_checkpoint.yml
│ ├── raw_rwa_checkpoint.yml
│ ├── raw_liquidity_checkpoint.yml
│ └── mart_corep_checkpoint.yml
└── uncommitted/
└── data_docs/
└── local_site/ # HTML output → also uploaded to MinIO
6. great_expectations.yml — Data Context
# gx/great_expectations.yml config_version: 3.0 datasources: corep_postgres: class_name: Datasource execution_engine: class_name: SqlAlchemyExecutionEngine connection_string: ${COREP_GX_DB_URL} # injected from .env data_connectors: default_inferred_data_connector_name: class_name: InferredAssetSqlDataConnector include_schema_name: true stores: expectations_store: class_name: ExpectationsStore store_backend: class_name: TupleFilesystemStoreBackend base_directory: expectations/ validations_store: class_name: ValidationsStore store_backend: class_name: TupleFilesystemStoreBackend base_directory: uncommitted/validations/ evaluation_parameter_store: class_name: EvaluationParameterStore checkpoint_store: class_name: CheckpointStore store_backend: class_name: TupleFilesystemStoreBackend suppress_store_backend_id: true base_directory: checkpoints/ data_docs_sites: local_site: class_name: SiteBuilder show_how_to_buttons: false store_backend: class_name: TupleFilesystemStoreBackend base_directory: uncommitted/data_docs/local_site/ site_index_builder: class_name: DefaultSiteIndexPageRenderer expectations_store_name: expectations_store validations_store_name: validations_store evaluation_parameter_store_name: evaluation_parameter_store checkpoint_store_name: checkpoint_store
The ${COREP_GX_DB_URL} substitution tells GX to read from the environment variable. Never hardcode credentials in great_expectations.yml. Your .env file already contains COREP_GX_DB_URL=postgresql+psycopg2://corep_admin:${POSTGRES_PASSWORD}@localhost:5432/corep.
7. Layer 1 — Raw Expectation Suites
7.1 Capital Instruments Suite
# gx/expectations/raw_capital_suite.json (abbreviated — full file below) # Build programmatically in Python, then save to JSON import great_expectations as gx context = gx.get_context(context_root_dir="gx") suite = context.add_expectation_suite("raw_capital_suite") validator = context.get_validator( datasource_name="corep_postgres", data_connector_name="default_inferred_data_connector_name", data_asset_name="raw.capital_instruments", expectation_suite_name="raw_capital_suite", ) # ── Table-level ────────────────────────────────────────── validator.expect_table_row_count_to_be_between(min_value=10, max_value=50_000) validator.expect_table_columns_to_match_set( column_set=[ "instrument_id", "name", "tier", "amount", "currency", "issuance_date", "maturity_date" ], exact_match=False # allow extra columns — forward compatible ) # ── Column: instrument_id ───────────────────────────────── validator.expect_column_values_to_not_be_null("instrument_id") validator.expect_column_values_to_be_unique("instrument_id") # ── Column: tier ───────────────────────────────────────── validator.expect_column_values_to_not_be_null("tier") validator.expect_column_values_to_be_in_set( "tier", value_set=["CET1", "AT1", "T2"] ) # ── Column: amount ─────────────────────────────────────── validator.expect_column_values_to_not_be_null("amount") validator.expect_column_values_to_be_between( "amount", min_value=0, max_value=1_000_000_000_000, mostly=0.99 # allow 1% outliers — GX's "mostly" parameter ) validator.expect_column_mean_to_be_between( "amount", min_value=1_000, max_value=100_000_000 ) # ── Column: currency ───────────────────────────────────── validator.expect_column_values_to_match_regex( "currency", regex=r"^[A-Z]{3}$" # ISO 4217 format ) # ── Column: issuance_date ──────────────────────────────── validator.expect_column_values_to_match_regex( "issuance_date", regex=r"^\d{4}-\d{2}-\d{2}$" ) # ── Cross-column: CET1 tier amounts must be positive ───── # GX SQL expression expectation — dbt has no equivalent validator.expect_column_values_to_not_be_null( "amount", row_condition='tier == "CET1"', condition_parser="pandas" ) validator.save_expectation_suite(discard_failed_expectations=False) print("raw_capital_suite saved.")
7.2 What “mostly” Unlocks
The mostly parameter is GX’s most powerful feature for regulatory data. It lets you express completeness requirements as a percentage threshold rather than an all-or-nothing assertion.
| Field | dbt not_null test | GX with mostly=0.95 | Regulatory interpretation |
|---|---|---|---|
amount | Fails on first NULL | Passes if ≥ 95% populated | Some instruments may legitimately have no amount at reporting date (e.g., contingent instruments) |
maturity_date | Fails on first NULL | Passes if ≥ 80% populated | Perpetual instruments (AT1) have no maturity — NULL is valid |
lei | Fails on first NULL | Passes if ≥ 90% populated | LEI registration may be in-progress for new counterparties |
instrument_id" | Correct: 100% required | Correct: mostly=1.0 | Primary key — zero tolerance |
7.3 RWA Exposures Suite
# Key expectations for raw.rwa_exposures validator.expect_table_row_count_to_be_between(min_value=5, max_value=100_000) validator.expect_column_values_to_be_in_set( "exposure_class", value_set=[ "central_governments", "institutions", "corporates", "retail", "real_estate", "equity", "other" ] ) # EBA CORR: risk_weight_pct is a decimal (0–1250%), not a percentage integer # This catches the common ETL error of loading 75.0 as 7500 (×100 shift) validator.expect_column_values_to_be_between( "risk_weight_pct", min_value=0.0, max_value=12.5, # 1250% = 12.5 in decimal mostly=0.999 ) validator.expect_column_mean_to_be_between( "risk_weight_pct", min_value=0.1, max_value=3.0 ) # If mean > 3.0 something has gone catastrophically wrong in the source system validator.expect_column_values_to_be_between( "ead", min_value=0, max_value=500_000_000_000, mostly=0.99 ) # Cross-column: rwa must not exceed ead × 12.5 (maximum risk weight) validator.expect_column_pair_values_to_be_in_set( column_A="rwa", column_B="ead", value_pairs_set=None # not a set check — use custom SQL expectation below ) # Custom SQL expectation — impossible in dbt without a macro validator.expect_column_values_to_not_be_null( "rwa", row_condition='exposure_class == "corporates"', condition_parser="pandas" )
7.4 Liquidity Assets Suite
# Key expectations for raw.liquidity_assets validator.expect_column_values_to_be_in_set( "hqla_level", value_set=["1", "2A", "2B"] ) # haircut_rate must be a decimal fraction, not a whole number percent validator.expect_column_values_to_be_between( "haircut_rate", min_value=0.0, max_value=1.0 # 0% to 100% as fraction ) # EBA Delegated Regulation 2015/61 specifies specific haircut levels: # Level 1: 0%, Level 2A: 15%, Level 2B: 25-50% # This check ensures haircut is sane before LCR calculation validator.expect_column_quantile_values_to_be_between( "market_value", quantile_ranges={ "quantiles": [0.05, 0.50, 0.95], "value_ranges": [[1_000, None], [100_000, None], [None, 50_000_000_000]] } ) # Quantile expectations — entirely impossible in dbt built-in tests
8. Layer 2 — Mart Expectation Suite
After dbt runs, the mart tables contain the numbers that will go into your XBRL instance document. These are your last line of defence before regulatory submission.
# gx/expectations/mart_corep_suite.json — built programmatically validator = context.get_validator( datasource_name="corep_postgres", data_connector_name="default_inferred_data_connector_name", data_asset_name="mart.corep_c0300", # Capital Ratios template expectation_suite_name="mart_corep_suite", ) # ── Basel III regulatory minimums ──────────────────────────────────────── # CRR Article 92(1)(a): CET1 ≥ 4.5% validator.expect_column_values_to_be_between( "cet1_ratio", min_value=0.045, max_value=1.0 ) # CRR Article 92(1)(b): Tier 1 ≥ 6.0% validator.expect_column_values_to_be_between( "tier1_ratio", min_value=0.06, max_value=1.0 ) # CRR Article 92(1)(c): Total Capital ≥ 8.0% validator.expect_column_values_to_be_between( "total_capital_ratio", min_value=0.08, max_value=1.0 ) # CRR2 Article 429: Leverage ratio ≥ 3.0% validator.expect_column_values_to_be_between( "leverage_ratio", min_value=0.03, max_value=1.0 ) # ── Cross-ratio consistency ─────────────────────────────────────────────── # tier1_ratio ≥ cet1_ratio always (CET1 ⊂ Tier1) # total_capital_ratio ≥ tier1_ratio always (T1 ⊂ Total Capital) # These are impossible to express in a single dbt test without a macro validator.expect_column_pair_values_A_to_be_greater_than_or_equal_to_B( "tier1_ratio", "cet1_ratio" ) validator.expect_column_pair_values_A_to_be_greater_than_or_equal_to_B( "total_capital_ratio", "tier1_ratio" ) # ── XBRL decimal precision check ───────────────────────────────────────── # EBA DPM requires monetary values in thousands (decimals=-3) # Ratios need 6 decimal places (decimals=4 in XBRL = 4 significant figures) # This ensures no floating-point garbage makes it into the XBRL document validator.expect_column_values_to_match_regex( "cet1_ratio", regex=r"^\d+\.\d{6}$", meta={"notes": "EBA DPM requires 6dp for ratio values per xbrl decimals=4"} ) # ── Table completeness ─────────────────────────────────────────────────── validator.expect_table_row_count_to_equal(1) # C 03.00 always produces exactly one row — the reporting period totals validator.save_expectation_suite(discard_failed_expectations=False)
The check tier1_ratio ≥ cet1_ratio is a mathematical identity that must hold because CET1 is a subset of Tier 1 capital. If this fails it means one of three things: a bug in your dbt aggregation logic, a sign error in the source data, or a schema mismatch between two mart models. Any of these would produce an incorrect COREP report. dbt tests cannot express this because it is a cross-column constraint, not a single-column assertion.
9. quality.py — The Module Implementation
""" modules/quality.py — Run GX expectation suites as pipeline quality gates. Layer 1 (raw gate) : runs after ingest, before dbt Layer 2 (mart gate) : runs after dbt, before xbrl_gen """ import logging import os from pathlib import Path import great_expectations as gx from great_expectations.core.batch import BatchRequest from great_expectations.checkpoint import SimpleCheckpoint from modules.base import BaseModule log = logging.getLogger(__name__) GX_DIR = Path(os.environ.get("GX_DIR", "gx")) class QualityGateError(RuntimeError): """Raised when a GX checkpoint produces any failed expectations.""" pass class QualityModule(BaseModule): MODULE_NAME = "quality" # Ordered list of (checkpoint_name, suite_name, asset_name) # Layer 1 = raw tables, Layer 2 = mart tables _CHECKPOINTS = [ # Layer 1 — run before dbt ("raw_capital_checkpoint", "raw_capital_suite", "raw.capital_instruments"), ("raw_rwa_checkpoint", "raw_rwa_suite", "raw.rwa_exposures"), ("raw_liquidity_checkpoint", "raw_liquidity_suite", "raw.liquidity_assets"), ("raw_outflows_checkpoint", "raw_outflows_suite", "raw.liquidity_outflows"), # Layer 2 — run after dbt mart build ("mart_corep_checkpoint", "mart_corep_suite", "mart.corep_c0300"), ] def input_check(self) -> None: """Verify GX context directory and expectation suite JSON files exist.""" if not GX_DIR.exists(): raise RuntimeError( f"[quality] GX directory not found: {GX_DIR}. " "Run: great_expectations init" ) suites_dir = GX_DIR / "expectations" missing = [ suite for _, suite, _ in self._CHECKPOINTS if not (suites_dir / f"{suite}.json").exists() ] if missing: raise RuntimeError( "[quality] Missing expectation suite JSON files: " + ", ".join(missing) ) log.info("[quality] All %d expectation suites present.", len(self._CHECKPOINTS)) def _execute(self) -> None: """Run all GX checkpoints. Raise QualityGateError on any failure.""" context = gx.get_context(context_root_dir=str(GX_DIR)) failed_suites = [] results_summary = [] for checkpoint_name, suite_name, asset_name in self._CHECKPOINTS: log.info( "[quality] Running checkpoint: %s → asset: %s", checkpoint_name, asset_name ) result = context.run_checkpoint( checkpoint_name=checkpoint_name, batch_request=BatchRequest( datasource_name="corep_postgres", data_connector_name="default_inferred_data_connector_name", data_asset_name=asset_name, ), ) passed = result.success stats = result.get_statistics() evaluated = stats.get("evaluated_expectations", 0) successful = stats.get("successful_expectations", 0) pct = stats.get("success_percent", 0.0) results_summary.append({ "suite": suite_name, "asset": asset_name, "evaluated": evaluated, "passed": successful, "pct": pct, "status": "PASS" if passed else "FAIL", }) log.info( "[quality] %s: %s (%d/%d expectations, %.1f%%)", suite_name, "PASS" if passed else "FAIL", successful, evaluated, pct ) if not passed: failed_suites.append(suite_name) # Build data docs (HTML report) context.build_data_docs() self._upload_data_docs_to_minio(context) self._write_audit(results_summary) if failed_suites: raise QualityGateError( f"[quality] {len(failed_suites)} suite(s) failed: " + ", ".join(failed_suites) + ". Pipeline halted — see GX data docs for details." ) def _upload_data_docs_to_minio(self, context) -> None: """Upload HTML data docs to MinIO for persistent audit evidence.""" try: from minio import Minio docs_dir = GX_DIR / "uncommitted" / "data_docs" / "local_site" client = Minio( os.environ.get("MINIO_ENDPOINT", "minio:9000"), access_key=os.environ.get("MINIO_ROOT_USER", "minioadmin"), secret_key=os.environ.get("MINIO_ROOT_PASSWORD", "minioadmin"), secure=False, ) bucket = "corep-gx-reports" if not client.bucket_exists(bucket): client.make_bucket(bucket) for html_file in docs_dir.rglob("*.html"): object_name = str(html_file.relative_to(docs_dir)) client.fput_object(bucket, object_name, str(html_file), content_type="text/html") log.info("[quality] Uploaded data doc → minio://%s/%s", bucket, object_name) except Exception as exc: log.warning("[quality] MinIO data docs upload failed (non-fatal): %s", exc) def _write_audit(self, results: list) -> None: """Persist quality run summary to audit.pipeline_run_log.""" import json from modules.base import _pg_conn conn = _pg_conn() try: cur = conn.cursor() cur.execute( """ INSERT INTO audit.pipeline_run_log (run_id, module_name, status, metadata, ran_at) VALUES (%s, 'quality', %s, %s, now()) """, ( self._run_id, "FAIL" if any(r["status"] == "FAIL" for r in results) else "PASS", json.dumps(results), ), ) conn.commit() finally: conn.close() def emit_lineage(self) -> None: # Quality runs are validation-only — no data written, no lineage event needed log.info("[quality] No lineage event emitted (read-only validation step).") def output_check(self) -> None: # If _execute completed without raising, all checkpoints passed log.info("[quality] output_check: all suites passed (no QualityGateError raised).")
10. Airflow Branching on Quality Failure
Your DAG from Day 14 uses BranchPythonOperator. Here is how the quality gate hooks into the branch logic:
# dags/corep_pipeline_dag.py — quality gate branch logic def _quality_branch(**context) -> str: """Return next task ID based on quality gate result stored in XCom.""" quality_status = context["task_instance"].xcom_pull( task_ids="run_quality_gates", key="quality_status" ) if quality_status == "PASS": return "run_xbrl_generation" return "quarantine_failed_run" # writes to audit, alerts ops team run_quality_gates = PythonOperator( task_id="run_quality_gates", python_callable=_run_quality_module, ) branch_on_quality = BranchPythonOperator( task_id="branch_on_quality", python_callable=_quality_branch, ) # DAG flow ( run_ingest >> run_quality_layer1 >> run_dbt_staging >> run_dbt_intermediate >> run_dbt_mart >> run_quality_layer2 # second gate after mart build >> branch_on_quality ) branch_on_quality >> run_xbrl_generation branch_on_quality >> quarantine_failed_run
11. dbt Tests vs Great Expectations — Full Comparison
| Capability | dbt built-in tests | dbt_expectations package | Great Expectations 0.18 |
|---|---|---|---|
| Not null | Yes | Yes | Yes |
| Unique values | Yes | Yes | Yes |
| Accepted value set | Yes | Yes | Yes |
| Value range (between) | Via expression_is_true | Yes | Yes |
| Regex pattern match | No | Yes | Yes |
| Completeness % (mostly) | No | No | Yes — native |
| Column mean / std-dev range | No | No | Yes |
| Quantile value ranges | No | No | Yes |
| Cross-column pair ordering | No | No | Yes |
| Row-conditional check (IF tier=CET1 THEN…) | No | No | Yes — row_condition |
| Table row count band | Awkward | Yes | Yes |
| HTML audit report output | No | No | Yes — Data Docs |
| Runs independently of transformation | No | No | Yes — checkpoint |
| Suites versioned in JSON | No (schema.yml) | No | Yes |
| Runs on raw layer before dbt | No | No | Yes |
12. Data Docs — Your Audit Evidence
Every time GX runs a checkpoint, it updates an HTML report in gx/uncommitted/data_docs/local_site/. This report is your regulatory audit evidence that quality was checked before submission.
“A BCIB should be able to capture and aggregate all material risk data across the banking group. Data should be available by business line, legal entity, asset type, industry, region and other groupings, as relevant for the risk in question.”
Your GX data docs prove that completeness was measured per column, per table, per pipeline run. The HTML file timestamped before submission is your evidence. Upload it to MinIO so it persists independently of the pipeline container.
To view the data docs locally after a pipeline run:
# From WSL / inside the pipeline container great_expectations docs build --directory gx # Open in browser (WSL path → Windows path) explorer.exe "$(wslpath -w gx/uncommitted/data_docs/local_site/index.html)"
13. Mapping Quality Gates to BCBS 239 Principles
| BCBS 239 Principle | Requirement | How GX satisfies it |
|---|---|---|
| P3 — Completeness | Capture all material risk data | mostly=0.95 completeness checks on all raw tables |
| P4 — Timeliness | Data available on time for risk decisions | Checkpoint runtime logged to audit.pipeline_run_log |
| P5 — Adaptability | Risk data adaptable to varying scenarios | Separate suites per table — update one suite without touching others |
| P6 — Accuracy | Data reflects actual risk positions | Cross-ratio pair checks, statistical distribution bounds, regex format checks |
| P7 — Completeness (reporting) | All material risk positions in reports | Row count check on mart tables confirms all data points present |
| P8 — Clarity | Reconciliation between risk reports | Cross-column pair checks: tier1_ratio ≥ cet1_ratio always holds |
14. Run the Quality Gates
# Full pipeline run (includes quality layer 1 after ingest) python pipeline.py --module quality # Run only the raw gate (useful during development) python pipeline.py --from ingest --to quality # Run only the mart gate after dbt is already complete python pipeline.py --from quality --to quality # Check exit code — quality gate failures raise exit code 1 echo $?
# Expected output on PASS INFO [quality] Running checkpoint: raw_capital_checkpoint → asset: raw.capital_instruments INFO [quality] raw_capital_suite: PASS (12/12 expectations, 100.0%) INFO [quality] Running checkpoint: raw_rwa_checkpoint → asset: raw.rwa_exposures INFO [quality] raw_rwa_suite: PASS (9/9 expectations, 100.0%) INFO [quality] Running checkpoint: raw_liquidity_checkpoint → asset: raw.liquidity_assets INFO [quality] raw_liquidity_suite: PASS (11/11 expectations, 100.0%) INFO [quality] Running checkpoint: mart_corep_checkpoint → asset: mart.corep_c0300 INFO [quality] mart_corep_suite: PASS (8/8 expectations, 100.0%) INFO [quality] Data docs built at gx/uncommitted/data_docs/local_site/index.html INFO [quality] output_check: all suites passed (no QualityGateError raised). # Expected output on FAIL (e.g. risk weight typo in source data) INFO [quality] raw_rwa_suite: FAIL (7/9 expectations, 77.8%) ERROR [quality] 1 suite(s) failed: raw_rwa_suite. Pipeline halted — see GX data docs for details. Traceback: QualityGateError: [quality] 1 suite(s) failed...
📚 Day 8 Key Takeaways
- Two layers are mandatory — raw gate before dbt, mart gate before XBRL. dbt tests can only run on transformed tables.
mostlyis the killer feature — it lets you express regulatory completeness requirements as a percentage rather than all-or-nothing, matching real-world data realities.- Cross-column pair checks (
tier1_ratio ≥ cet1_ratio) are impossible in dbt tests but essential for Basel III consistency verification. - Statistical distribution expectations (mean, quantiles) catch scale/magnitude errors — the most common ETL bug with financial data (e.g. basis points vs decimal).
- Data Docs are audit evidence — upload to MinIO, timestamp before submission. BCBS 239 auditors expect proof that quality was checked.
- Quality module is read-only — no lineage event needed. OpenLineage lineage events are for data-writing steps only.
- Next: Day 9 — OpenMetadata catalog: auto-discovery of all tables built so far, EBA glossary terms, PII tags, and lineage federation with Marquez.


