Claim Assistant
Document AIOCRLLMNLP

Claim Assistant

AI-powered claim intake automation that extracts, validates, and analyzes insurance claim forms using document intelligence and large language models.

Try Live Demo

Overview

The Problem

Insurance claim processing is manual, error-prone, and slow. Adjusters spend hours extracting data from PDF forms, cross-referencing policy records, and writing coverage analyses, repetitive work that delays claim resolution.

Our Solution

Claim Assistant automates the entire intake pipeline: Azure Document Intelligence extracts key-value pairs from scanned PDFs, GPT-5 maps extractions to structured form fields with evidence tracking, and an LLM-powered validation engine performs coverage analysis against policy records.

Key Outcomes

  • Automated field extraction with confidence scoring (flags fields below 80% for human review)
  • Evidence-tracked form filling: every answer links back to source document extractions
  • Policy matching with weighted name similarity (Levenshtein) and date validation
  • LLM-generated coverage analysis with reasoning and confidence scores

Models & Tech Stack

AI/ML Models

Azure Document Intelligence
OCR and key-value extraction from scanned/filled PDF claim forms

Uses the prebuilt-document model to extract key-value pairs with confidence scores and bounding regions from insurance claim PDFs across 8+ US states.

OpenAI GPT-5 Family
Structured form filling and coverage analysis

GPT-5.2 performs form filling by mapping DI extractions to form fields using structured output parsing. GPT-5-nano handles coverage analysis, reasoning about whether claims fall within policy terms.

Tech Stack

Backend
Python 3.13FastAPIUvicornPydantic
Frontend
Next.js 16React 19TypeScriptTailwind CSS
ML/AI
Azure Document Intelligence SDKOpenAI APILevenshtein similarity
Data Processing
FillPDFReportLabPyPDF2

Data & Methodology

Data Sources

State-specific insurance claim form templates (FL, NH, MN, IA, KS, NY, OH, WI) with structured field schemas, plus a policy database with coverage terms, holder information, and validity periods.

Methodology

5-stage modular pipeline: (1) Data Preparation - normalize scanned PDFs, (2) Key Extraction - Azure DI prebuilt-document model extracts key-value pairs with confidence scores, (3) Form Filling - GPT-5.2 maps extractions to form fields with one-to-many evidence tracking, (4) Policy Matching - lookup by policy ID with weighted Levenshtein name verification, (5) Validation - LLM-based coverage analysis with date and claimant verification.

Evaluation Metrics

Field extraction confidence threshold: 80%. Name matching uses weighted similarity: 60% policy ID, 15% first name, 25% last name.

Preview

Try It Yourself

Experience Claim Assistant with real data. No signup required.