Awesome Key Infomation Extraction

A curated list of papers about key information extraction.

Paperswithcode links will be preferred.

Welcome contributions!

Tabel of Contents

Awesome Key Infomation Extraction

Datasets

Name	Title	Links
DUE	DUE: End-to-End Document Understanding Benchmark	[link]
RVL-CDIP	Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval	[link][download]
SROIE	ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction	[link][download]
FUNSD	FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents	[link][download]
XFUND	XFUND: A Multilingual Form Understanding Benchmark	[link]
CORD	CORD: A Consolidated Receipt Dataset for Post-OCR Parsing	[link]
EPHOIE	Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution	[link]
EATEN	EATEN: Entity-aware Attention for Single Shot Visual Text Extraction	[link]
Train Ticket	PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks	[link][download]
POIE	Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution	[link][download]

Survey

Year	Title	Links
2023	On the Hidden Mystery of OCR in Large Multimodal Models	[link]
2021	Document AI: Benchmarks, Models and Applications	[link]

Toolkits

Year	Title	Links
2022	DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding	[paper][code]
2021	MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding	[paper][code]
2020	PP-OCR: A Practical Ultra Lightweight OCR System	[paper][code]
2024	ANLS* -- A Universal Document Processing Metric for Generative Large Language Models	[paper][code]

Models

⭐LLM-Based

Pub.	Year	Title	Links
Arxiv	2024	mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding	[link]
Arxiv	2024	mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models	[link]
Arxiv	2024	A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding	[link]
ICML	2023	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	[link]
Arxiv	2023	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	[link]
Arxiv	2023	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	[link]
Arxiv	2023	Visual Instruction Tuning	[link]
Arxiv	2023	Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	[link]
Arxiv	2023	mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality	[link]
Arxiv	2023	mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding	[link]
Arxiv	2023	mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration	[link]
Arxiv	2023	Otter: A Multi-Modal Model with In-Context Instruction Tuning	[link]
Arxiv	2023	UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model	[link]
Blog	2023	Fuyu-8B: A Multimodal Architecture for AI Agents	[blog][model]

Graph-Based

Pub.	Year	Title	Links
ICDAR	2023	LayoutGCN: A Lightweight Architecture for Visually Rich Document Understanding	[paper]
ACL-Findings	2021	Spatial Dependency Parsing for Semi-Structured Document Information Extraction	[link]
Arxiv	2021	Spatial Dual-Modality Graph Reasoning for Key Information Extraction	[link]
ICPR	2020	PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks	[link]

Transformer-Based

Pub.	Year	Title	Links
ACL	2022	LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding	[link]
ACL	2022	FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction	[link]
CVPR	2022	XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding	[link]
Arxiv	2022	LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model	[link]
Arxiv	2022	LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking	[link]
Arxiv	2022	ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding	[link]
AAAI	2022	BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents	[link]
ICDAR	2021	ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents	[link][code]
Arxiv	2021	TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models	[link]
ACM-MM	2021	StrucTexT: Structured Text Understanding with Multi-Modal Transformers	[link]
ACL	2021	LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding	[link]
KDD	2020	LayoutLM: Pre-training of Text and Layout for Document Image Understanding	[link]

Grid-Based

Pub.	Year	Title	Links
ICDAR	2021	ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents	[link]
ICDAR	2021	VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach	[link]
NIPS	2019	BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding	[link]
EMNLP	2018	Chargrid: Towards Understanding 2D Documents	[link]

End-to-end

Pub.	Year	Title	Links
ICDAR	2023	Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution	[link]
ICML	2023	Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding	[link]
ECCV	2022	OCR-free Document Understanding Transformer	[link]
Arxiv	2022	TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents	[link]
ICCV	2021	DocFormer: End-to-End Transformer for Document Understanding	[link]
ACM-MM	2020	TRIE: End-to-End Text Reading and Information Extraction for Document Understanding	[link]
ICDAR	2019	EATEN: Entity-aware Attention for Single Shot Visual Text Extraction	[link]

Others

Pub.	Year	Title	Links
ICDAR	2023	Information Extraction from Documents: Question Answering vs Token Classification in real-world setups	[link]

Related Repositories

https://paperswithcode.com/task/key-information-extraction
https://github.com/tstanislawek/awesome-document-understanding/blob/main/topics/kie/README.md
⭐https://github.com/SCUT-DLVCLab/Document-AI-Recommendations#vie

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome Key Infomation Extraction

Tabel of Contents

Datasets

Survey

Toolkits

Models

⭐LLM-Based

Graph-Based

Transformer-Based

Grid-Based

End-to-end

Others

Related Repositories

Star History

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome Key Infomation Extraction

Tabel of Contents

Datasets

Survey

Toolkits

Models

⭐LLM-Based

Graph-Based

Transformer-Based

Grid-Based

End-to-end

Others

Related Repositories

Star History