Overview
Documents are central to how knowledge is created, communicated, and acted upon in domains such as science, healthcare, law, finance, and government. Yet real-world documents are rarely plain text. They combine natural language with structured and semi-structured content such as tables, forms, charts, figures, lists, and layout cues. Understanding these documents requires models that can reason not only over text, but also over structure, visual organization, and cross-element relationships.
Recent advances in document foundation models, multimodal language models, and structure-aware NLP have significantly improved document understanding. However, major challenges remain: reasoning across text and tables, grounding model outputs in document evidence, handling long and multi-page documents, supporting multilingual and domain-specific documents, and evaluating systems under OCR, layout, and extraction noise.
The workshop focuses on moving beyond plain text toward methods that deeply understand the structure, content, and purpose of complex documents. We especially encourage work that highlights real-world document challenges and proposes methods for trustworthy, scalable, and practical document understanding.
Important Dates
All deadlines are 11:59 PM UTC-12:00 (“Anywhere on Earth”).
| Event | Date |
|---|---|
| Direct Submission Deadline | August 2, 2026 |
| ARR Commitment Deadline | August 30, 2026 |
| Acceptance Notifications | September 13, 2026 |
| Camera-ready Deadline | September 27, 2026 |
| Workshop Date | During EMNLP 2026, October 24–29 |
News
- April 2026 – Website launched. Stay tuned for updates on invited speakers and the program schedule.
- Call for Papers is now open! See the Call for Papers page for details.