Skip to content

Templates, Filters & Workflows

Use this guide to decide how to model documents before calling the extraction API or building workflows.

The core model is:

  1. a template defines what to extract from one document type
  2. a filter decides which template should be used for a document
  3. a workflow automates where files come from, how they are extracted, and where results go
  4. document review holds low-confidence jobs until a user approves them
  1. Create one template for each document type.
  2. Test each template with representative sample files.
  3. Create filters only after the templates are stable.
  4. Create API keys or workflows after templates and filters are ready.
  5. Add webhooks or document review for production control.

This order keeps each layer understandable. A workflow should not be doing template design, and an API integration should not depend on a template that has not been tested.

Before adding fields, decide what the receiving system needs. For an invoice, that may be supplier name, invoice number, invoice date, total, VAT, and line items. For a bank statement, that may be account number, statement period, and transactions.

Good templates are narrow. Prefer one template per document type over one broad template that tries to handle unrelated forms.

The Generate template button is the easiest way to get started. You upload a sample document, and the system identifies its document type and creates a default template for that type, so you do not have to add every field by hand. You can then change, rename, and trim the result.

Treat a generated template as a starting point, not a final contract. Review it before using it in production.

Recommended flow:

  1. Open Studio > Templates and create a template.
  2. Upload a representative sample document and run Generate template. The system detects the document type and produces a default set of fields.
  3. Rename fields into stable integration names.
  4. Remove fields the business does not need.
  5. Add or refine line-item fields.
  6. Save and test with Quick Run.

Field names should be stable, specific, and safe for downstream systems.

Prefer:

  1. invoiceNumber
  2. invoiceDate
  3. supplierVatNumber
  4. totalAmount
  5. lineItems.description

Avoid:

  1. number
  2. date
  3. value
  4. field1
  5. misc

Use scalar fields for one value, such as an invoice number. Use line-item or array/table fields for repeated rows, such as invoice lines or statement transactions.

Do not model repeated rows as separate fields like line1, line2, and line3. That makes integrations brittle because the number of rows changes per document.

Guide The Model With Descriptions, Then Rules

Section titled “Guide The Model With Descriptions, Then Rules”

There are two ways to tell the model how to handle a field:

  1. Field description and data type: start here. A clear description plus the correct data type, for example date, number, or string, is enough for most extraction. Get this right before doing anything more advanced.
  2. Rules field: use this for specific instructions the model must follow, such as transforming or normalizing a value rather than simply reading it off the page. Examples include reformatting a date, deriving a value, or enforcing a particular output format.

Model each field with a plain description and the right data type first, confirm extraction works, and only move to the rules field when you need a transformation or behavior that a description and data type cannot express.

A template can define up to 50 fields, including line-item fields. This limit counts the fields you define, not how much data you can extract. Line-item fields repeat for every row, so a few columns across many rows can return hundreds of values while still counting as a single field definition. You can capture everything on the document; you are only capped at 50 defined fields.

Treat the 50-field limit as a design signal, not just a technical maximum. If a template is approaching it, check whether it is trying to cover multiple document types or downstream use cases at once.

Use Quick Run or Playground before using a template in production.

Test with:

  1. a clean expected document
  2. a low-quality scan
  3. a multi-page document
  4. a document with missing optional fields
  5. a document with extra sections or notes

Check:

  1. field values match the source document
  2. repeated rows are grouped correctly
  3. missing values are represented consistently
  4. document confidence is good enough for automation
  5. field names match what the receiving system expects

If the same uploaded file could be multiple document types, test filters before using API extraction.

Use filters when one request may contain different document types.

A filter should answer:

  1. what document types can arrive?
  2. which template owns each type?
  3. what should happen to documents that do not match any template?

Recommended flow:

  1. Create and test the templates first.
  2. Create a filter in Studio > Filters.
  3. Add routing rules that map document categories to templates.
  4. Test representative mixed-document files in Playground.
  5. Use filterName in API extraction requests when routing is required.

When using filters with webhooks, make sure all extractable templates in the filter are covered by active webhook configuration before requesting deliveryMode=webhook.

Use workflows when you want the platform to monitor an input location and write outputs without building a custom backend upload loop.

A workflow needs:

  1. input source
  2. extraction selection, either a template or a filter
  3. output destination
  4. optional document review threshold
  5. file lifecycle settings when supported by the source

Recommended flow:

  1. Create templates.
  2. Create filters if routing is needed.
  3. Connect input and output providers.
  4. Create the workflow.
  5. Run controlled test files.
  6. Enable document review if uncertain results need approval.
  7. Monitor runtime attention messages and Document Review.

Document review is a quality gate. When enabled, extracted results below the configured threshold are held before final delivery.

Use it when:

  1. low-confidence values should not be sent directly to downstream systems
  2. a user must approve output before webhook completion or workflow delivery
  3. the document type has high business impact

For API integrations, the async job status becomes review_required and the result endpoint returns REVIEW_REQUIRED until approval. For workflows, review cases appear in Dashboard > Document Review.

Before you put an API integration into production:

  1. at least one template has been created and tested
  2. any needed filters have been tested with mixed documents
  3. an API key exists for the correct environment
  4. you have the correct $EXTRACTION_BASE_URL
  5. the integration uses templateName or filterName values that exist in the target environment
  6. webhook routes are configured if deliveryMode=webhook will be used
  7. you have decided how document review should behave before using review thresholds
  8. you know whether to use sync extraction, async polling, or async webhooks