Awesome Key Infomation Extraction

A curated list of papers about key information extraction.

Paperswithcode links will be preferred.

Welcome contributions!

Tabel of Contents

Awesome Key Infomation Extraction

Datasets

Name	Title	Links
DUE	DUE: End-to-End Document Understanding Benchmark	[link]
RVL-CDIP	Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval	[link][download]
SROIE	ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction	[link][download]
FUNSD	FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents	[link][download]
XFUND	XFUND: A Multilingual Form Understanding Benchmark	[link]
CORD	CORD: A Consolidated Receipt Dataset for Post-OCR Parsing	[link]
EPHOIE	Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution	[link]
EATEN	EATEN: Entity-aware Attention for Single Shot Visual Text Extraction	[link]
Train Ticket	PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks	[link][download]
POIE	Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution	[link][download]

Survey

Year	Title	Links
2023	On the Hidden Mystery of OCR in Large Multimodal Models	[link]
2021	Document AI: Benchmarks, Models and Applications	[link]

Toolkits

Year	Title	Links
2022	DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding	[paper][code]
2021	MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding	[paper][code]
2020	PP-OCR: A Practical Ultra Lightweight OCR System	[paper][code]
2024	ANLS* -- A Universal Document Processing Metric for Generative Large Language Models	[paper][code]

Models

⭐LLM-Based

Pub.	Year	Title	Links
Arxiv	2024	mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding	[link]
Arxiv	2024	mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models	[link]
Arxiv	2024	A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding	[link]
ICML	2023	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	[link]
Arxiv	2023	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	[link]
Arxiv	2023	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	[link]
Arxiv	2023	Visual Instruction Tuning	[link]
Arxiv	2023	Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	[link]
Arxiv	2023	mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality	[link]
Arxiv	2023	mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding	[link]
Arxiv	2023	mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration	[link]
Arxiv	2023	Otter: A Multi-Modal Model with In-Context Instruction Tuning	[link]
Arxiv	2023	UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model	[link]
Blog	2023	Fuyu-8B: A Multimodal Architecture for AI Agents	[blog][model]

Graph-Based

Pub.	Year	Title	Links
ICDAR	2023	LayoutGCN: A Lightweight Architecture for Visually Rich Document Understanding	[paper]
ACL-Findings	2021	Spatial Dependency Parsing for Semi-Structured Document Information Extraction	[link]
Arxiv	2021	Spatial Dual-Modality Graph Reasoning for Key Information Extraction	[link]
ICPR	2020	PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks	[link]

Transformer-Based

Pub.	Year	Title	Links
ACL	2022	LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding	[link]
ACL	2022	FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction	[link]
CVPR	2022	XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding	[link]
Arxiv	2022	LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model	[link]
Arxiv	2022	LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking	[link]
Arxiv	2022	ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding	[link]
AAAI	2022	BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents	[link]
ICDAR	2021	ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents	[link][code]
Arxiv	2021	TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models	[link]
ACM-MM	2021	StrucTexT: Structured Text Understanding with Multi-Modal Transformers	[link]
ACL	2021	LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding	[link]
KDD	2020	LayoutLM: Pre-training of Text and Layout for Document Image Understanding	[link]

Grid-Based

Pub.	Year	Title	Links
ICDAR	2021	ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents	[link]
ICDAR	2021	VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach	[link]
NIPS	2019	BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding	[link]
EMNLP	2018	Chargrid: Towards Understanding 2D Documents	[link]

End-to-end

Pub.	Year	Title	Links
ICDAR	2023	Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution	[link]
ICML	2023	Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding	[link]
ECCV	2022	OCR-free Document Understanding Transformer	[link]
Arxiv	2022	TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents	[link]
ICCV	2021	DocFormer: End-to-End Transformer for Document Understanding	[link]
ACM-MM	2020	TRIE: End-to-End Text Reading and Information Extraction for Document Understanding	[link]
ICDAR	2019	EATEN: Entity-aware Attention for Single Shot Visual Text Extraction	[link]

Others

Pub.	Year	Title	Links
ICDAR	2023	Information Extraction from Documents: Question Answering vs Token Classification in real-world setups	[link]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Key Infomation Extraction

Tabel of Contents

Datasets

Survey

Toolkits

Models

⭐LLM-Based

Graph-Based

Transformer-Based

Grid-Based

End-to-end

Others

Related Repositories

Star History

About

Releases

Packages

Contributors 2

entropy2333/awesome-key-information-extraction

Folders and files

Latest commit

History

Repository files navigation

Awesome Key Infomation Extraction

Tabel of Contents

Datasets

Survey

Toolkits

Models

⭐LLM-Based

Graph-Based

Transformer-Based

Grid-Based

End-to-end

Others

Related Repositories

Star History

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages