Skip to main content

Overview

We use JSON Schema to define the structure of extracted data.

Rules

  • Root must be an object type
  • Allowed types: string, number, integer, boolean, object, array
  • Primitive fields must be nullable: "type": ["string", "null"]
  • Maximum nesting level: 3
  • Array items: objects or primitives (string, number, integer, boolean)
  • Enums: strings only, must include null
  • Use description fields to provide context

Unsupported Features

  • Schema composition (anyOf, oneOf, allOf)
  • Regular expressions
  • Conditional validation
  • Constant values

Examples

Basic Schema

{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": ["string", "null"],
      "description": "Invoice identifier"
    },
    "amount": {
      "type": ["number", "null"],
      "description": "Invoice amount"
    }
  },
  "required": ["invoice_number", "amount"]
}

With Nested Objects

{
  "type": "object",
  "properties": {
    "address": {
      "type": "object",
      "properties": {
        "street": { "type": ["string", "null"] },
        "city": { "type": ["string", "null"] }
      },
      "required": ["street", "city"]
    }
  }
}

With Arrays

{
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": ["string", "null"] },
          "quantity": { "type": ["number", "null"] }
        }
      }
    }
  }
}

With Enums

{
  "type": "object",
  "properties": {
    "status": {
      "enum": ["pending", "approved", "rejected", null],
      "description": "Document status"
    }
  }
}