Catalog Checks: Catalog Seeds#
Note
The below checks require both catalog.json and manifest.json to be present.
Functions:
| Name | Description |
|---|---|
check_seed_columns_are_all_documented |
All columns in a seed CSV file should be included in the seed's properties file, i.e. |
check_seed_max_bytes |
Each seed must not exceed the given size in bytes. |
check_seed_max_row_count |
Each seed must not contain more than the given number of rows. |
check_seed_columns_are_all_documented
#
All columns in a seed CSV file should be included in the seed's properties file, i.e. .yml file.
Warning
This check is only supported for dbt 1.9.0 and above.
Rationale
Seed CSV files often serve as reference data (e.g. country codes, product categories) that are queried directly by downstream models. When a column exists in the CSV but not in the properties file, it is invisible to documentation tools, data catalogues, and column-level tests. This check ensures that every column in a seed is explicitly declared, making it easier for consumers to understand the seed's schema and for teams to apply descriptions and tests uniformly.
Receives at execution time:
| Name | Type | Description |
|---|---|---|
catalog_node |
CatalogNodeEntry
|
The CatalogNodeEntry object to check. |
manifest_obj |
ManifestObject
|
The ManifestObject object parsed from |
seeds |
list[SeedNode]
|
List of SeedNode objects parsed from |
Other Parameters (passed via config file):
| Name | Type | Description |
|---|---|---|
description |
str | None
|
Description of what the check does and why it is implemented. |
exclude |
str | None
|
Regex pattern to match the seed path. Seed paths that match the pattern will not be checked. |
include |
str | None
|
Regex pattern to match the seed path. Only seed paths that match the pattern will be checked. |
severity |
Literal[error, warn] | None
|
Severity level of the check. Default: |
Example(s):
Source code in src/dbt_bouncer/checks/catalog/check_catalog_seeds.py
check_seed_max_bytes
#
Each seed must not exceed the given size in bytes.
Rationale
Seeds are checked into version control and reloaded by every dbt run, so very large CSV files inflate repository size, slow seed runs, and signal that the data probably belongs in a source table instead. This check enforces an upper bound on seed size so that the project does not accumulate oversized reference data.
Note
Seed size is read from the catalog's per-relation stats, which are populated by the warehouse adapter. The default byte_stat_keys cover the adapters whose source has been verified: dbt-snowflake (bytes), dbt-bigquery (num_bytes), dbt-redshift (size), and dbt-databricks/dbt-spark (bytes, parsed from DESCRIBE TABLE EXTENDED). The first key found on the catalog node wins.
For any other adapter — e.g. dbt-athena, dbt-fabric, dbt-trino, dbt-clickhouse, dbt-synapse, dbt-singlestore, dbt-vertica, etc. — inspect the relevant entry in catalog.json to find which key (if any) holds the byte count, then supply it via byte_stat_keys. If the adapter emits no byte stats at all the check will raise a RuntimeError.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_bytes
|
int
|
The maximum number of bytes permitted for a seed. |
required |
byte_stat_keys
|
list[str]
|
Ordered list of stat keys under |
['bytes', 'num_bytes', 'size']
|
Receives at execution time:
| Name | Type | Description |
|---|---|---|
catalog_node |
CatalogNodeEntry
|
The CatalogNodeEntry object to check. |
Other Parameters (passed via config file):
| Name | Type | Description |
|---|---|---|
description |
str | None
|
Description of what the check does and why it is implemented. |
exclude |
str | None
|
Regex pattern to match the seed path. Seed paths that match the pattern will not be checked. |
include |
str | None
|
Regex pattern to match the seed path. Only seed paths that match the pattern will be checked. |
severity |
Literal[error, warn] | None
|
Severity level of the check. Default: |
Example(s):
# Overriding the keys for a hypothetical adapter that exposes ``size_bytes``.
catalog_checks:
- name: check_seed_max_bytes
max_bytes: 1048576
byte_stat_keys:
- size_bytes
Source code in src/dbt_bouncer/checks/catalog/check_catalog_seeds.py
check_seed_max_row_count
#
Each seed must not contain more than the given number of rows.
Rationale
Seeds are intended for small reference datasets such as lookup tables, country codes, or feature flag values. As row counts grow, seeds become slow to load, awkward to review in pull requests, and prone to drift from authoritative upstream systems. This check enforces an upper bound on row count so that large datasets are surfaced as a source or external table instead of a seed.
Note
Row count is read from the catalog's per-relation stats, which are populated by the warehouse adapter. The default row_stat_keys cover the adapters whose source has been verified: dbt-snowflake (row_count), dbt-bigquery (num_rows), dbt-redshift (rows), and dbt-databricks/dbt-spark (rows, parsed from DESCRIBE TABLE EXTENDED). The first key found on the catalog node wins.
For any other adapter — e.g. dbt-athena, dbt-fabric, dbt-trino, dbt-clickhouse, dbt-synapse, dbt-singlestore, dbt-vertica, etc. — inspect the relevant entry in catalog.json to find which key (if any) holds the row count, then supply it via row_stat_keys. If the adapter emits no row stats at all the check will raise a RuntimeError.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_row_count
|
int
|
The maximum number of rows permitted for a seed. |
required |
row_stat_keys
|
list[str]
|
Ordered list of stat keys under |
['row_count', 'num_rows', 'rows']
|
Receives at execution time:
| Name | Type | Description |
|---|---|---|
catalog_node |
CatalogNodeEntry
|
The CatalogNodeEntry object to check. |
Other Parameters (passed via config file):
| Name | Type | Description |
|---|---|---|
description |
str | None
|
Description of what the check does and why it is implemented. |
exclude |
str | None
|
Regex pattern to match the seed path. Seed paths that match the pattern will not be checked. |
include |
str | None
|
Regex pattern to match the seed path. Only seed paths that match the pattern will be checked. |
severity |
Literal[error, warn] | None
|
Severity level of the check. Default: |
Example(s):
# Overriding the keys for a hypothetical adapter that exposes ``record_count``.
catalog_checks:
- name: check_seed_max_row_count
max_row_count: 1000
row_stat_keys:
- record_count