Skip to main content

Overview

We use JSON Schema to define the structure of extracted data.

Rules

  • Root must be an object type
  • Allowed types: string, number, integer, boolean, object, array
  • Primitive fields must be nullable: "type": ["string", "null"]
  • Maximum nesting level: 3
  • Array items: objects or primitives (string, number, integer, boolean)
  • Enums: strings only, must include null
  • Use description fields to provide context
  • Date values should use "custom:type": "date" on string fields
  • Validation rules should be modeled as a boolean field with beltic:validation and a sibling field_ref

Unsupported Features

  • Schema composition (anyOf, oneOf, allOf)
  • Regular expressions
  • Conditional validation
  • Constant values

Examples

Basic Schema

{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": ["string", "null"],
      "description": "Invoice identifier"
    },
    "amount": {
      "type": ["number", "null"],
      "description": "Invoice amount"
    }
  },
  "required": ["invoice_number", "amount"]
}

With Nested Objects

{
  "type": "object",
  "properties": {
    "address": {
      "type": "object",
      "properties": {
        "street": { "type": ["string", "null"] },
        "city": { "type": ["string", "null"] }
      },
      "required": ["street", "city"]
    }
  }
}

With Arrays

{
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": ["string", "null"] },
          "quantity": { "type": ["number", "null"] }
        }
      }
    }
  }
}

With Enums

{
  "type": "object",
  "properties": {
    "status": {
      "enum": ["pending", "approved", "rejected", null],
      "description": "Document status"
    }
  }
}

Validation Fields (Date Reference)

Date validations are configured on a boolean field, not on the date field itself. Use this pattern:
  • Source date field: type: ["string", "null"] + "custom:type": "date"
  • Validation field: type: ["boolean", "null"] + beltic:validation
  • field_ref must point to a sibling date field name in the same scope
{
  "type": "object",
  "properties": {
    "issue_date": {
      "type": ["string", "null"],
      "custom:type": "date",
      "description": "Document issue date"
    },
    "is_recent": {
      "type": ["boolean", "null"],
      "description": "Whether issue_date is recent enough",
      "beltic:validation": {
        "kind": "date_recency",
        "field_ref": "issue_date",
        "max_age_days": 90
      }
    }
  },
  "required": ["issue_date", "is_recent"]
}

Validation Fields in Nested Objects/Arrays

field_ref is scope-relative:
  • Inside an object, it references a field in that same object
  • Inside an array item object, it references a field in each item
{
  "type": "object",
  "properties": {
    "documents": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "expiry_date": {
            "type": ["string", "null"],
            "custom:type": "date"
          },
          "is_valid": {
            "type": ["boolean", "null"],
            "beltic:validation": {
              "kind": "date_expiry",
              "field_ref": "expiry_date",
              "min_valid_days": 30
            }
          }
        },
        "required": ["expiry_date", "is_valid"]
      }
    }
  },
  "required": ["documents"]
}