Extend Raises $17 Million to Build the Document Processing Cloud

Kushal Byatnal

4 min read

Jun 17, 2025

Blog Post

Documents are often the system of record for mission-critical business data, but reliably parsing, extracting, and using that data has always been a challenge. We built Extend to change that — delivering state-of-the-art accuracy out of the box, and enabling teams to go from raw PDFs to production-ready data in days (instead of months).

Today, we’re excited to announce $17M in seed and series A funding, led by Innovation Endeavors with participation from YCombinator, Homebrew, Character, and angels including Scott Belsky (former CSO of Adobe) and Guillermo Rauch (CEO of Vercel).

Extend is reimagining document intelligence with a full-stack approach that combines cutting-edge LLMs with all the other developer primitives needed to process complex documents reliably. The product is so powerful that many of Extend’s customers are not only able to automate existing workflows, but launch entirely novel features that drive competitive differentiation.

— Davis Treybig, Partner at Innovation Endeavors

Production-ready document pipelines

For companies processing documents in domains where precision and reliability are non-negotiable — including healthcare, finance, logistics, and more — the cost of an error is so high that even 99% accuracy might not be good enough!

In speaking with companies with these mission-critical document pipelines, they often have to invest engineering-years worth of effort in building their document ingestion. This often includes:

VLM parsing engines to handle complex edge cases across images, tables, handwriting, signatures, and more
LLM context management techniques, such as semantic chunking or table header continuation
Data labeling and evaluation tooling, to measure performance and improve it
Pipelines that orchestrate classification, splitting, and extraction to better achieve accuracy
Annotation tooling for their domain experts
Reinforcement learning & feedback loops to improve the system with more data
Human-in-the-loop tooling to flag and escalate low confidence edge cases

Ultimately, high quality document processing is actually a data and systems engineering problem, and raw OCR or foundation models don’t fully solve the problem.

That’s why Extend isn’t just another document API. It’s a unified set of infrastructure and tooling that enables cutting edge teams to handle all their messy documents in one place. With modern APIs for developers and intuitive tooling for operators, we help companies unlock the full value of their documents, faster and more accurate than ever.

Extend outperformed every solution we tested — including other vendors, open source and even foundation models. It now powers key document workflows across our 30,000 customers, helping us build the most intelligent and modern financial platform out there.

— Pedro Franceschi, CEO of Brex

Today, we’re excited to release two key improvements that make Extend even better.

Automated config generation

One of the biggest bottlenecks in document processing is the manual time teams spend tuning schemas, crafting prompts, and debugging edge cases to improve accuracy. We're excited to release a beta of our new automated config generation to reduce that burden.

Just upload a few sample docs, and Extend generates a tailored schema optimized for the structure of your documents.

Soon, Extend will integrate this experience with your evaluation sets and deploy an agent that continuously runs optimization loops in the background, so your accuracy improves even while you sleep.

Instant access with sandbox mode

Starting today, teams can try Extend instantly with our new sandbox mode. Just sign up and start running your own documents through the platform. If you ever need help, we’re just a quick message away in the product.

Whether you're exploring use cases or validating accuracy on real examples, the sandbox gives you a fast, frictionless way to experience the full power of Extend.

The modern document processing cloud

Unstructured data trapped in documents is one of the last great frontiers of untapped data — and one of the most painful. Our mission at Extend is to make that data accessible, accurate, and actionable. With this new capital, we’re thrilled to double down on that mission.

We’ve built the first and only full-stack document processing cloud on the market today. And we’ve been fortunate to partner with some of the most ambitious teams out there, including Brex, Square, Checkr, Flatiron Health, and multiple Fortune 500s — all using Extend to process millions of documents with unmatched precision and reliability.

In the months ahead, Extend will become even smarter, autonomous, and agentic — continuously optimizing document flows for accuracy, speed, and reliability. Our north star is simple: to eliminate every bottleneck standing between teams and their unstructured data, so they can focus on what makes their business unique (not wrangling PDFs).

The world has cloud platforms for storage, compute, and collaboration. But until now, no one has built a true cloud for document processing — a full-stack system purpose-built to handle the complexity, messiness, and nuance of real-world documents at scale. That’s what we’re building at Extend: the document processing cloud.

And we’re just getting started.

See other articles

Customers

How Checkr cut manual review time by 60% to near 100% across millions of documents

Checkr uses Extend to process millions of documents with 95–100% accuracy and reduce human document review time by 60–100%.

Jing Reyhan

Product

Workflows upgrade: build document workflows in natural language, manage as code

Build complete document workflows with Composer, validate them with code or semantic rules, and manage deployment through GitHub.