Why Layout Matters

Jing Reyhan, Eli Badgio

4 min read

Jun 1, 2026

Engineering

Our team recently shipped Parse 2.0, the next evolution of our document parsing API, reaching SOTA for end-to-end agent success on document Q&A. One of the biggest step changes was rebuilding the underlying layout model from the ground up.

We did it because layout is critical to parsing accuracy, closing the distance to deterministic document pipelines, and dictating appropriate downstream model routing.

Here's how we built our new layout model and what it enables for AI and ML teams.

What is layout detection?

Layout detection is how a system visually segments the regions, or "blocks," of a document: identifying, classifying, and drawing bounding boxes for each region.

A layout model provides the canonical definition for a document's structure and reading order for how each block should be sequenced to match the way a human would read the document. Both matter because documents are visual interfaces, which is to say meaning lives in proximity, grouping, and hierarchy. For example, the string "$4,820" means nothing by itself. But it becomes useful when you know it represents the total premium field inside a specific section of an insurance form.

Our layout model detects the following block types:

Layout model block types

What does a strong layout model make possible for AI teams?

If you build agents or workflows on top of documents, it is tempting to spend most of your time tuning prompts, swapping models, or adding retrieval logic. Unfortunately, these actions don't matter if the parsed output has already lost its semantic structure.

No prompt can reliably reconstruct a flattened table. No extraction schema can safely bind a repeated label to the right value if the parser flattened the grid. No agent can reason over 12 months of forecasts if the parser only gave it 10.

Aside from saving you time by avoiding those tempting pitfalls, here's what a strong layout model enables for builders.

Parsing accuracy

Most parsing failures can be traced back to layout failures. The reverse is also true: if you get layout and reading order right, most parsing failures can be mitigated.

Some common examples of layout decisions that impact accuracy:

Column interweaving. Poor reading order results in a two-column insurance policy getting read left to right across both columns instead of down the first column and then down the second. Every word is correct, but every other sentence comes from alternate sections.
Form-field scrambling. A dense medical or insurance form repeats labels like "Deductible," "Limit," "Premium," or "Authorization" across a grid. Without a layout model that specifically recognizes key-value regions, the parsed output loses the spatial relationship between value and label.
Table and form confusion. In a CMS 1500 claim, the vertical lines literally line up across different document sections, but treating a key-value region (labeled 9, in orange) as a table would be a mistake because it bears no row and column relationships. Even if two regions appear similar, the nuance of recognizing tables versus a form with key-value pairs leads to poorly parsed outputs.

CMS 1500 table and key-value layout example

When you get layout and reading order right, parsing accuracy improves. You can see the impact of our new layout model in RealDoc-Bench, which evaluates layout accuracy and document Q&A success rates.

More flexible configuration tuning for accuracy, cost, and latency

As we mentioned above, strong layout detection enables block-level routing to specialized downstream models. In addition to delivering higher performance, a multi-model approach also allows AI teams to better control the three dimensions of accuracy, cost, and latency.

A straightforward text-only paragraph should not incur the same cost as messy handwriting. A figure that doesn't need to be parsed for a specific use case does not need to be routed to a specialized VLM.

Once the layout model segments the pages into blocks, it also unlocks choice for our customers. Instead of a monolithic parsing step, AI teams can make granular decisions around which block types require higher accuracy and which ones they'd rather optimize for lower latency or cost.

Conclusion

Our new layout model is only one part of the foundational updates that went into Parse 2.0. Parse 2.0 delivers an adjusted F1 score of 0.847 for layout accuracy and 95.7% for Q&A accuracy.

For more information on layout model performance, check out our open source parsing benchmark RealDoc-Bench.

If you're trying to improve performance of your agents, search, and automations, try Parse 2.0 today: https://dashboard.extend.ai/

See other articles

Customers

How Opendoor processes millions of pages to accelerate homeownership

How Opendoor uses Extend to process millions of pages across settlement extraction and HUD QC, saving hundreds of thousands of dollars and accelerating homeownership.