Templates, Filters & Workflows
Purpose
Section titled “Purpose”Use this guide to decide how to model documents before calling the extraction API or building workflows.
The core model is:
- a template defines what to extract from one document type
- a filter decides which template should be used for a document
- a workflow automates where files come from, how they are extracted, and where results go
- document review holds low-confidence jobs until a user approves them
Recommended Setup Order
Section titled “Recommended Setup Order”- Create one template for each document type.
- Test each template with representative sample files.
- Create filters only after the templates are stable.
- Create API keys or workflows after templates and filters are ready.
- Add webhooks or document review for production control.
This order keeps each layer understandable. A workflow should not be doing template design, and an API integration should not depend on a template that has not been tested.
Designing A Good Template
Section titled “Designing A Good Template”Start From The Business Output
Section titled “Start From The Business Output”Before adding fields, decide what the receiving system needs. For an invoice, that may be supplier name, invoice number, invoice date, total, VAT, and line items. For a bank statement, that may be account number, statement period, and transactions.
Good templates are narrow. Prefer one template per document type over one broad template that tries to handle unrelated forms.
Generate A Starter Template
Section titled “Generate A Starter Template”The Generate template button is the easiest way to get started. You upload a sample document, and the system identifies its document type and creates a default template for that type, so you do not have to add every field by hand. You can then change, rename, and trim the result.
Treat a generated template as a starting point, not a final contract. Review it before using it in production.
Recommended flow:
- Open Studio > Templates and create a template.
- Upload a representative sample document and run Generate template. The system detects the document type and produces a default set of fields.
- Rename fields into stable integration names.
- Remove fields the business does not need.
- Add or refine line-item fields.
- Save and test with Quick Run.
Use Clear Field Names
Section titled “Use Clear Field Names”Field names should be stable, specific, and safe for downstream systems.
Prefer:
invoiceNumberinvoiceDatesupplierVatNumbertotalAmountlineItems.description
Avoid:
numberdatevaluefield1misc
Choose The Right Field Shape
Section titled “Choose The Right Field Shape”Use scalar fields for one value, such as an invoice number. Use line-item or array/table fields for repeated rows, such as invoice lines or statement transactions.
Do not model repeated rows as separate fields like line1, line2, and
line3. That makes integrations brittle because the number of rows changes per
document.
Guide The Model With Descriptions, Then Rules
Section titled “Guide The Model With Descriptions, Then Rules”There are two ways to tell the model how to handle a field:
- Field description and data type: start here. A clear description plus the correct data type, for example date, number, or string, is enough for most extraction. Get this right before doing anything more advanced.
- Rules field: use this for specific instructions the model must follow, such as transforming or normalizing a value rather than simply reading it off the page. Examples include reformatting a date, deriving a value, or enforcing a particular output format.
Model each field with a plain description and the right data type first, confirm extraction works, and only move to the rules field when you need a transformation or behavior that a description and data type cannot express.
Keep Templates Focused
Section titled “Keep Templates Focused”A template can define up to 50 fields, including line-item fields. This limit counts the fields you define, not how much data you can extract. Line-item fields repeat for every row, so a few columns across many rows can return hundreds of values while still counting as a single field definition. You can capture everything on the document; you are only capped at 50 defined fields.
Treat the 50-field limit as a design signal, not just a technical maximum. If a template is approaching it, check whether it is trying to cover multiple document types or downstream use cases at once.
Testing A Template
Section titled “Testing A Template”Use Quick Run or Playground before using a template in production.
Test with:
- a clean expected document
- a low-quality scan
- a multi-page document
- a document with missing optional fields
- a document with extra sections or notes
Check:
- field values match the source document
- repeated rows are grouped correctly
- missing values are represented consistently
- document confidence is good enough for automation
- field names match what the receiving system expects
If the same uploaded file could be multiple document types, test filters before using API extraction.
Creating Filters
Section titled “Creating Filters”Use filters when one request may contain different document types.
A filter should answer:
- what document types can arrive?
- which template owns each type?
- what should happen to documents that do not match any template?
Recommended flow:
- Create and test the templates first.
- Create a filter in Studio > Filters.
- Add routing rules that map document categories to templates.
- Test representative mixed-document files in Playground.
- Use
filterNamein API extraction requests when routing is required.
When using filters with webhooks, make sure all extractable templates in the
filter are covered by active webhook configuration before requesting
deliveryMode=webhook.
Building Workflows
Section titled “Building Workflows”Use workflows when you want the platform to monitor an input location and write outputs without building a custom backend upload loop.
A workflow needs:
- input source
- extraction selection, either a template or a filter
- output destination
- optional document review threshold
- file lifecycle settings when supported by the source
Recommended flow:
- Create templates.
- Create filters if routing is needed.
- Connect input and output providers.
- Create the workflow.
- Run controlled test files.
- Enable document review if uncertain results need approval.
- Monitor runtime attention messages and Document Review.
Document Review
Section titled “Document Review”Document review is a quality gate. When enabled, extracted results below the configured threshold are held before final delivery.
Use it when:
- low-confidence values should not be sent directly to downstream systems
- a user must approve output before webhook completion or workflow delivery
- the document type has high business impact
For API integrations, the async job status becomes review_required and the
result endpoint returns REVIEW_REQUIRED until approval. For workflows, review
cases appear in Dashboard > Document Review.
API Readiness Checklist
Section titled “API Readiness Checklist”Before you put an API integration into production:
- at least one template has been created and tested
- any needed filters have been tested with mixed documents
- an API key exists for the correct environment
- you have the correct
$EXTRACTION_BASE_URL - the integration uses
templateNameorfilterNamevalues that exist in the target environment - webhook routes are configured if
deliveryMode=webhookwill be used - you have decided how document review should behave before using review thresholds
- you know whether to use sync extraction, async polling, or async webhooks