Base URL:

Architecture

The PolicyEngine API v2 is a distributed system for running tax-benefit microsimulations with persistence and async processing.

Components

API server

FastAPI application exposing RESTful endpoints for creating and managing datasets, defining policy reforms, queueing simulations, and computing aggregates. The server validates requests, persists to PostgreSQL, and queues background tasks.

Database

PostgreSQL (via Supabase) stores all persistent data using SQLModel for type-safe ORM with Pydantic integration.

datasetspoliciessimulationsaggregatesreportsdecile_impactsprogram_statisticsparameters

Worker

Background workers poll for pending simulations and reports. They load datasets from storage, run PolicyEngine simulations, compute aggregates and impact statistics, then store results to the database.

Storage

Dataset files (HDF5 format) are stored in Supabase Storage with local caching for performance. The storage layer handles downloads and caching transparently.

Request flow

1

Client creates simulation via POST /analysis/economic-impact

2

API validates request and persists simulation + report records

3

API returns pending status immediately

4

Worker picks up pending simulation from queue

5

Worker loads dataset and runs PolicyEngine simulation

6

Worker updates simulation status to completed

7

Worker picks up pending report

8

Worker computes decile impacts and program statistics

9

Client polls GET /analysis/economic-impact/{id} to check status

10

Once complete, response includes full analysis results

Data models

All models follow Pydantic/SQLModel patterns for type safety across API, database, and business logic:

Base

Shared fields across models

Table

Database model with ID and timestamps

Create

Request schema (no ID)

Read

Response schema (with ID and timestamps)

Scaling

API scaling

Multiple uvicorn workers behind load balancer for horizontal scaling.

Worker scaling

Increase worker count for parallel simulation processing.

Database

PostgreSQL supports read replicas for high read throughput.

Caching

Deterministic UUIDs ensure same requests reuse cached results.