Architecture
The PolicyEngine API v2 is a distributed system for running tax-benefit microsimulations with persistence and async processing.
Components
API server
FastAPI application exposing RESTful endpoints for creating and managing datasets, defining policy reforms, queueing simulations, and computing aggregates. The server validates requests, persists to PostgreSQL, and queues background tasks.
Database
PostgreSQL (via Supabase) stores all persistent data using SQLModel for type-safe ORM with Pydantic integration.
Worker
Background workers poll for pending simulations and reports. They load datasets from storage, run PolicyEngine simulations, compute aggregates and impact statistics, then store results to the database.
Storage
Dataset files (HDF5 format) are stored in Supabase Storage with local caching for performance. The storage layer handles downloads and caching transparently.
Request flow
Client creates simulation via POST /analysis/economic-impact
API validates request and persists simulation + report records
API returns pending status immediately
Worker picks up pending simulation from queue
Worker loads dataset and runs PolicyEngine simulation
Worker updates simulation status to completed
Worker picks up pending report
Worker computes decile impacts and program statistics
Client polls GET /analysis/economic-impact/{id} to check status
Once complete, response includes full analysis results
Data models
All models follow Pydantic/SQLModel patterns for type safety across API, database, and business logic:
BaseShared fields across models
TableDatabase model with ID and timestamps
CreateRequest schema (no ID)
ReadResponse schema (with ID and timestamps)
Scaling
API scaling
Multiple uvicorn workers behind load balancer for horizontal scaling.
Worker scaling
Increase worker count for parallel simulation processing.
Database
PostgreSQL supports read replicas for high read throughput.
Caching
Deterministic UUIDs ensure same requests reuse cached results.