The trusted data layer for AI agents

We turn scattered, multimodal data into enriched, structured datasets that power better decisions for AI agents.

The quality of every decision depends on the quality of the data behind it. We build the ground truth datasets that make those decisions trustworthy.

Every decision your AI agent makes starts with data

Prefetch builds one clear, trusted dataset for each domain. From data often buried in multimodal content that no API currently serves.

01

Structured

Raw records from government portals, annual accounts, and public filings. Transformed into clean, queryable schemas your agents can reason over instantly.

02

Enriched

Cross-referenced across sources, extracted from PDFs and images, tagged with domain context. Every record carries more intelligence than the original source.

03

Fresh

Continuous ingestion pipelines keep every dataset current. Planning data refreshes daily. Funder profiles re-enrich weekly. Every record is timestamped so your agents know exactly how current it is.

Trusted data, ready for your agents

Query structured, enriched datasets via REST API or MCP. One integration, access to every vertical we serve.

Request API access

How we build ground truth datasets

Every data product goes through the same pipeline. The pattern is repeatable across any domain where valuable data is buried in multimodal content.

01

Ingest

Public records, licensed content, filings. PDFs, images, HTML. Connected and collected at scale.

02

Clean

Deduplicate, normalise, validate. One consistent foundation from any format, any source.

03

Enrich

Cross-reference records, extract from documents and images, add domain context. Where the real value is created.

04

Structure

Relational and embedding formats. Optimised for how agents query and reason over information.

05

Deliver

REST API and MCP endpoints. Structured JSON, confidence scores, source citations. Ready for decisions.

Valuable data is scattered across public and private sources — buried in PDFs, images, filings, and web pages. We consolidate, harmonise, and enrich multimodal content into structured datasets your agents can trust.

Built for AI agents and the teams behind them

Every design choice optimises for one thing: trusted, structured data that powers better decisions — whether it's your AI agent reasoning autonomously or your team digging into the details.

Multimodal enrichment

We read what's inside the documents, not just the metadata

Planning reports are 47-page PDFs. Funder data is buried in annual accounts. EPC certificates are scanned images. We extract, cross-reference, and structure content from all of it.

This is the enrichment layer that turns raw public records into trusted intelligence. It deepens with every document we process.

enrichment pipeline
// Input: officer report (PDF, 47 pages)
enrich("officer_report_2024_1847.pdf")

// Output: structured, enriched
{
  "extracted_conditions": 12,
  "heritage_references": 3,
  "policy_matches": 7,
  "officer_recommendation": "approve",
  "objections_parsed": 2,
  "enriched_at": "2026-03-26",
  "confidence": 0.91
}
Deep within each vertical

Every record enriched beyond the original source

We don't just collect data. We make it more useful. A planning application gets its full document content parsed: conditions extracted, policy references linked, heritage status flagged, officer reasoning indexed.

A funder profile gets income trends, giving history, focus area classification, and application requirements. All structured from scattered sources into one queryable record.

enriched planning record
// One planning application — deeply enriched

Application 2024/1847
  ├── documents     14 parsed (officer report, notices)
  ├── conditions    12 extracted, 3 pre-commencement
  ├── policies      7 local plan refs matched
  ├── heritage      conservation area, Grade II adjacent
  ├── officer_view  recommended approval, 2 objections
  └── confidence    0.94
Data that comes with skills

Your agents don't just get data — they get capabilities

Every Prefetch dataset ships with ready-made skills: search precedents, match funders, check compliance. Your agent connects once and immediately knows what it can do.

Skills work across any agent interface — ChatGPT, Claude, your own tools. REST API and MCP endpoints, structured JSON, confidence scores, source citations. No parsing, no SDK, no lock-in.

agent discovers skills
// Connect to Prefetch — skills included
prefetch.connect("planning")

// Available skills:
├── search_precedents    full-text across officer reports
├── check_compliance     match against local plan policies
├── extract_conditions   parse decision notice conditions
└── assess_risk          heritage, flood, conservation

// Your agent just uses them
search_precedents("roof height objections")
→ 14 results, confidence 0.94

Trusted datasets, live in production

Each vertical is a complete, enriched dataset for its domain. Use it through the products we build on top, or query the Prefetch API directly from your own agents and tools.

Live

Planning

Every UK planning document parsed and searchable. Officer reports, decision notices, conditions, policy references. Find precedents, check compliance, and build stronger applications.

Precedent search Policy compliance Risk assessment Due diligence
// Search inside officer reports
search("roof height objections, Lindfield")

14 results across 9 applications

DM/24/1847 para 4.3
  approved  confidence 0.94

DM/23/3201 decision notice
  refused   confidence 0.91
Live

Funders

3,500+ UK funders enriched from annual accounts, Charity Commission records, and award histories. Giving patterns, focus areas, success rates, application requirements. Matched and ranked.

Funder matching Grant research Giving analysis Application prep
// Match funders to your project
match("community mental health, SE, £50k")

23 matches found

96% National Lottery £10k-£500k
91% Henry Smith      avg £75k/3yr
87% Lloyds Bank Fdn  unrestricted

Every dataset is also available directly via the Prefetch API. PlanningBot and FunderMatcher are applications we build on top of Prefetch data. The same enriched datasets power your agents, your products, your decisions. Query via REST API or MCP.

Request API access

Data without context is just noise. Every Prefetch dataset ships with agentic skills. Ready-made capabilities your agents discover and use instantly.

It started with education

We were building tools to help students learn grammar. Mapping curriculum content, scraping educational material, tagging it with grammar structures and difficulty levels. The AI was the easy part. Getting the data ready was the hard part.

Then we did it for funders. 3,500 profiles enriched from annual accounts, Charity Commission records, and scattered public filings. Then for planning. Indexing the full content of officer reports, decision notices, and policy documents across UK councils.

Every time, the same pattern: valuable data, buried in multimodal content, impossible for agents to query. We build one clear, trusted dataset for each domain. Structured, enriched, fresh, and ready for any agent that needs it.

Prefetch team

Common questions

Straight answers to what you need to know.

"How fresh is the data?"

Continuous ingestion. Planning data refreshes daily, funder profiles re-enrich weekly. Every record carries a timestamp so your agents know exactly how current it is. Freshness is a first-class feature, not an afterthought.

"What does 'enriched' actually mean?"

We cross-reference records across multiple public sources, extract content from PDFs and images, and tag everything with domain context. A planning record becomes linked to flood risk, EPC data, company records, and more. Automatically.

"Can my agents query this directly?"

Yes. REST API and MCP endpoints return structured JSON with confidence scores and source citations. If your agent can make an HTTP request, it can query Prefetch. No SDK required, no vendor lock-in.

"What verticals do you cover?"

Planning and funders are live in production. Education is in development. The enrichment pipeline works for any domain with messy multimodal public data. We're expanding based on demand.

"How do you handle multimodal content?"

PDFs, scanned images, HTML filings, annual accounts. We extract and structure content from all of them. This is the core of what we do. The enrichment layer reads what's inside documents, not just the metadata around them.

"Do I need to use PlanningBot or FunderMatcher?"

No. Those are applications we built on top of Prefetch data. Every dataset is available directly via the Prefetch API. Use our apps, build your own, or feed the data straight into your agents. Your call.

Better decisions start with trusted data

Your agents need ground truth datasets. Structured, enriched, and always fresh. That's what we build.

Request API access Start a conversation