Skip to content

docchex¤

Python core engine for document QA/QC. Parse documents, apply rule sets, run AI-assisted evaluation, and generate structured reports.

from docchex import run_qaqc

result = run_qaqc(document, rules)

Status¤

Requirements¤

Installation¤

pip install docchex

Usage¤

Run QA/QC on a PDF¤

from docchex import run_qaqc

result = run_qaqc("report.pdf", "rules.yaml")

print(result["passed"])    # True / False
print(result["summary"])   # {"error": 1, "warning": 2, "info": 0}
print(result["findings"])  # list of individual rule violations

Define rules in YAML¤

# rules.yaml
rules:
  - id: check_introduction
    type: required_section
    match: Introduction
    severity: error

  - id: minimum_length
    type: word_count
    min: 500
    severity: warning

Rules can also be passed as a list of dicts directly:

result = run_qaqc("report.pdf", [
    {"id": "check_intro", "type": "required_section", "match": "Introduction", "severity": "error"},
    {"id": "min_length",  "type": "word_count", "min": 500, "severity": "warning"},
])

Result structure¤

{
    "document": "report.pdf",
    "passed": False,
    "summary": {"error": 1, "warning": 0, "info": 0},
    "findings": [
        {
            "rule_id": "check_intro",
            "severity": "error",
            "message": "Required section not found: 'Introduction'",
            "location": None,
        }
    ],
}

Built-in rule types¤

Type Parameters Description
required_section match (str) Fails if the text is not found anywhere in the document
word_count min (int), max (int) Fails if the word count is outside the given bounds

Both support a severity field: error (default for required_section), warning (default for word_count), or info.

Documentation¤

uv run python scripts/make docs

Serves the docs locally at http://127.0.0.1:8000 with live reload.

Evaluation¤

docchex ships with a versioned benchmark suite that measures end-to-end rule accuracy against committed document fixtures.

# first-time setup
uv run python scripts/make setup

# run benchmark suite
uv run python scripts/make eval

If python is not available in your shell, prefer the uv run python ... form above instead of make eval.

See the Evaluation page in the docs for current results, how to add new cases, and the versioning workflow.

License¤