We turn scattered, multimodal data into enriched, structured datasets that power better decisions for AI agents.
The quality of every decision depends on the quality of the data behind it. We build the ground truth datasets that make those decisions trustworthy.
Prefetch builds one clear, trusted dataset for each domain. From data often buried in multimodal content that no API currently serves.
Raw records from government portals, annual accounts, and public filings. Transformed into clean, queryable schemas your agents can reason over instantly.
Cross-referenced across sources, extracted from PDFs and images, tagged with domain context. Every record carries more intelligence than the original source.
Continuous ingestion pipelines keep every dataset current. Planning data refreshes daily. Funder profiles re-enrich weekly. Every record is timestamped so your agents know exactly how current it is.
Query structured, enriched datasets via REST API or MCP. One integration, access to every vertical we serve.
Request API accessEvery data product goes through the same pipeline. The pattern is repeatable across any domain where valuable data is buried in multimodal content.
Public records, licensed content, filings. PDFs, images, HTML. Connected and collected at scale.
Deduplicate, normalise, validate. One consistent foundation from any format, any source.
Cross-reference records, extract from documents and images, add domain context. Where the real value is created.
Relational and embedding formats. Optimised for how agents query and reason over information.
REST API and MCP endpoints. Structured JSON, confidence scores, source citations. Ready for decisions.
Valuable data is scattered across public and private sources — buried in PDFs, images, filings, and web pages. We consolidate, harmonise, and enrich multimodal content into structured datasets your agents can trust.
Every design choice optimises for one thing: trusted, structured data that powers better decisions — whether it's your AI agent reasoning autonomously or your team digging into the details.
Planning reports are 47-page PDFs. Funder data is buried in annual accounts. EPC certificates are scanned images. We extract, cross-reference, and structure content from all of it.
This is the enrichment layer that turns raw public records into trusted intelligence. It deepens with every document we process.
// Input: officer report (PDF, 47 pages) enrich("officer_report_2024_1847.pdf") // Output: structured, enriched { "extracted_conditions": 12, "heritage_references": 3, "policy_matches": 7, "officer_recommendation": "approve", "objections_parsed": 2, "enriched_at": "2026-03-26", "confidence": 0.91 }
We don't just collect data. We make it more useful. A planning application gets its full document content parsed: conditions extracted, policy references linked, heritage status flagged, officer reasoning indexed.
A funder profile gets income trends, giving history, focus area classification, and application requirements. All structured from scattered sources into one queryable record.
// One planning application — deeply enriched Application 2024/1847 ├── documents 14 parsed (officer report, notices) ├── conditions 12 extracted, 3 pre-commencement ├── policies 7 local plan refs matched ├── heritage conservation area, Grade II adjacent ├── officer_view recommended approval, 2 objections └── confidence 0.94
Every Prefetch dataset ships with ready-made skills: search precedents, match funders, check compliance. Your agent connects once and immediately knows what it can do.
Skills work across any agent interface — ChatGPT, Claude, your own tools. REST API and MCP endpoints, structured JSON, confidence scores, source citations. No parsing, no SDK, no lock-in.
// Connect to Prefetch — skills included prefetch.connect("planning") // Available skills: ├── search_precedents full-text across officer reports ├── check_compliance match against local plan policies ├── extract_conditions parse decision notice conditions └── assess_risk heritage, flood, conservation // Your agent just uses them search_precedents("roof height objections") → 14 results, confidence 0.94
Each vertical is a complete, enriched dataset for its domain. Use it through the products we build on top, or query the Prefetch API directly from your own agents and tools.
Every UK planning document parsed and searchable. Officer reports, decision notices, conditions, policy references. Find precedents, check compliance, and build stronger applications.
// Search inside officer reports search("roof height objections, Lindfield") 14 results across 9 applications DM/24/1847 para 4.3 approved confidence 0.94 DM/23/3201 decision notice refused confidence 0.91
3,500+ UK funders enriched from annual accounts, Charity Commission records, and award histories. Giving patterns, focus areas, success rates, application requirements. Matched and ranked.
// Match funders to your project match("community mental health, SE, £50k") 23 matches found 96% National Lottery £10k-£500k 91% Henry Smith avg £75k/3yr 87% Lloyds Bank Fdn unrestricted
Every dataset is also available directly via the Prefetch API. PlanningBot and FunderMatcher are applications we build on top of Prefetch data. The same enriched datasets power your agents, your products, your decisions. Query via REST API or MCP.
Request API accessData without context is just noise. Every Prefetch dataset ships with agentic skills. Ready-made capabilities your agents discover and use instantly.
We were building tools to help students learn grammar. Mapping curriculum content, scraping educational material, tagging it with grammar structures and difficulty levels. The AI was the easy part. Getting the data ready was the hard part.
Then we did it for funders. 3,500 profiles enriched from annual accounts, Charity Commission records, and scattered public filings. Then for planning. Indexing the full content of officer reports, decision notices, and policy documents across UK councils.
Every time, the same pattern: valuable data, buried in multimodal content, impossible for agents to query. We build one clear, trusted dataset for each domain. Structured, enriched, fresh, and ready for any agent that needs it.
Straight answers to what you need to know.
Continuous ingestion. Planning data refreshes daily, funder profiles re-enrich weekly. Every record carries a timestamp so your agents know exactly how current it is. Freshness is a first-class feature, not an afterthought.
We cross-reference records across multiple public sources, extract content from PDFs and images, and tag everything with domain context. A planning record becomes linked to flood risk, EPC data, company records, and more. Automatically.
Yes. REST API and MCP endpoints return structured JSON with confidence scores and source citations. If your agent can make an HTTP request, it can query Prefetch. No SDK required, no vendor lock-in.
Planning and funders are live in production. Education is in development. The enrichment pipeline works for any domain with messy multimodal public data. We're expanding based on demand.
PDFs, scanned images, HTML filings, annual accounts. We extract and structure content from all of them. This is the core of what we do. The enrichment layer reads what's inside documents, not just the metadata around them.
No. Those are applications we built on top of Prefetch data. Every dataset is available directly via the Prefetch API. Use our apps, build your own, or feed the data straight into your agents. Your call.
Your agents need ground truth datasets. Structured, enriched, and always fresh. That's what we build.