Skip to content
/ kbl Public

Korean Benchmark for Korean Legal Language Understanding

Notifications You must be signed in to change notification settings

lbox-kr/kbl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Korean Benchmark for Legal Language Understanding (KBL)

  • This is an official repository for the KBL dataset from LBox.
  • The work will be presented at EMNLP 2024 Findings and the NLLP Workshop.
  • The paper is available from here.

To Do

Datasets

Benchmarks

How to load examples

from pprint import pprint
import datasets

data = datasets.load_dataset("lbox/kbl", data_files={"test": [FILE_PATH]})
# Example
# data = datasets.load_dataset('lbox/kbl', data_files={"test": "knowledge/kbl_legal_concept_qa_v0.1.json"})["test"]
pprint(data[0])

Corpus

  • Korean statutes (220,160 articles. Dumped at Nov2024)
  • Korean precedents (From LBox-Open)

How to load corpus

from pprint import pprint
import datasets

# Load statutes corpus
data = datasets.load_dataset('lbox/kbl-rag', data_files={"train": "corpus/statutes.jsonl"})["train"]

# Load precedents corpus
# data = datasets.load_dataset('lbox/kbl', data_files={"train": "corpus/precedents.jsonl"})["train"]

# Load precedents and statutes corpus
# data = datasets.load_dataset('lbox/kbl', data_files={"train": "corpus/precedents_and_statutes.jsonl"})["train"]
pprint(data[0])

Citation

@inproceedings{kim2024kbl,
    title = "Developing a Pragmatic Benchmark for Assessing {K}orean Legal Language Understanding in Large Language Models",
    author = {Yeeun Kim and Young Rok Choi and Eunkyung Choi and Jinhwan Choi and Hai Jin Park and Wonseok Hwang},
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.319",
    pages = "5573--5595",
}

About

Korean Benchmark for Korean Legal Language Understanding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published