How Column Tax Benchmarked Every OCR Option and Chose Extend

“We’ve done over a million tax filings, so accuracy is our P0 as a company. We evaluated 16 different vendors on real W-2 and 1099 documents, and Extend was by far the most accurate and consistent across every form type.”
— Gavin Nachbar, CEO, Column Tax

Column Tax powers modern tax filing for top companies like MoneyLion, Found, Varo, and more. They collect every detail about a person’s finances (income, wages, investments) and translate that data into a return that’s accurate, fast, and effortless.

Here’s how their engineering team rebuilt their entire document-processing pipeline, benchmarked every major OCR and extraction option, and selected Extend as the long-term foundation.

TL;DR

Document ingestion is unpredictable.

Every month, unexpected layouts and types of documents will surface. Your pipeline can’t be static but must adapt to changing inputs.

Public benchmarks aren’t enough.

What matters is how a system performs on your documents. Build a ground-truth evaluation set and score every vendor against the same criteria.

Accuracy, latency, and cost require trade-offs.

Hard-coding those choices into your own pipeline forces rebuilds when priorities shift. Use a platform that lets you evaluate and adjust these tradeoffs without re-engineering.

Future-proof your pipeline.

The OCR/model landscape changes monthly. Choose a solution that continuously benchmarks new models and routes tasks to the best performer automatically.

Moving upmarket required handling messier docs

Heading into the next tax season, the team wanted to expand coverage across more document types, handle more complexity, and improve reliability for the messiest uploads.

“The first version we shipped with a single vendor who we had evaluated and had used for other products on document extraction. It went okay, especially for people who were uploading clean documents, but we’re moving up-market as a company and need to be able to handle more”

Evaluating 16 vendors

“Accuracy is our P0 as a company.”

To ensure their users got the most seamless experience, Column Tax performed a comprehensive eval of the market. They chose 16 vendors across the full-range of extraction options, including:

foundation-model APIs (OpenAI, Gemini, Claude, etc)
legacy OCR products (AWS Textract, Azure Form Recognizer)
early-stage startups

To effectively test, Column Tax’s engineering team built a standardized evaluation set of 50 labeled documents ranging from crumpled PDFs to blurry 1099s photos. The goal was to evaluate accuracy across three dimensions:

Classification: identifying the correct type of tax document
Extraction: correctly pulling structured field data
Consistency: re-running the same document multiple times gives the same outputs

Out of the 16, only three made it past the accuracy floor: Extend and 2 foundation models. Other startups and legacy solutions were unable to scale up to the most complex of document layouts. Extend had the highest performance regardless of document type, layout, or quality and maintained consistency between multiple runs (which multimodal foundation models struggled with).

“Extend was over 35% more accurate across all the documents.”

Choosing Extend

Extend outperformed in more ways than just much higher accuracy across all of their document types:

Per-Element Model Routing

Extend doesn’t rely on a single OCR or VLM model. For every task, a layout model classifies every element of a document as a table, figure, text, handwriting, checkbox, and more. Each element then gets routed to the best-performing model benchmarked for that element, with OCR for deterministic text and VLMs for more complex visual elements. Extend integrates more accurate models automatically, removing the need to chase releases.

“What pushed us over the edge was we didn’t feel locked in. Others were model providers, but we didn’t want to bet on one being the best PDF extractor. We wanted the best of all worlds. When Mistral comes out with a new OCR model, we want that benchmarked immediately and integrated if it improves performance.”

User-facing latency optimization

Though accuracy is the team’s P0, speed is still a core element of their customer experience. Using their existing evaluation sets, they could spin up tests immediately, inspect outputs, and optimize for latency without a full rebuild.

“We loved the internal dashboard. It’s by far the most friendly interface to spin up tests on. In the same tests we were able to see accuracy and latency metrics right away. Made it super easy to get better from there.”

Reduced engineering effort

Before Extend, adding a new OCR vendor meant weeks of integration work: new APIs, new schemas, more ongoing maintenance. Extend’s platform eliminated that overhead, freeing the team to focus on their core tax product.

Powering millions of tax filings

As Column Tax scales, they’re confident they can achieve a world-class customer experience. Two things stood out once they went live on Extend:

Higher user conversion: faster, more accurate document processing led to smoother onboarding and fewer drop-offs.
Lower engineering overhead: one unified platform replaced multiple brittle integrations, saving time and complexity.

“Integrating a new OCR vendor is significant engineering work. We’re confident Extend is the most accurate and will only continue to get better. We’d rather focus on building the best tax product in the U.S.”
— Gavin Nachbar, CEO, Column Tax

How Column Tax Benchmarked Every OCR Option and Chose Extend

TL;DR

Moving upmarket required handling messier docs

Evaluating 16 vendors

Choosing Extend

Powering millions of tax filings

See other articles

AbstractOps Increases NPS & NRR with Extend

HomeLight Hits 99% Accuracy and Eliminates Manual Review with Extend

How Brex Reached 99% Accuracy Across Millions of Financial Documents

Turn your documents into high quality data