Skip to content

santipongth/luke

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 

Repository files navigation

license PyPI pyversions PyPI status

LUKE (Lightweight Unsupervised Keyword Extraction)

LUKE is a lightweight keyword extraction algorithm based on a term weighting scheme with multiple word features, including term frequency, inverse sentence frequency, term different sentence, term position, and term length. The goal is to automatically extract important words and phrases from unstructured text without training data or domain-specific knowledge.

Motivation

The motivation for this algorithm is to develop a lightweight and robust keyword extraction tool based on the statistical information of words in the original text. This tool could be applied on mobile devices.

Benchmark Datasets

The four common benchmark datasets are described to below:

  • SemEval-2010: This dataset consists of 243 scientific articles with long-length documents from conferences and workshops of the ACM Digital Libraries with author and reader-assigned keyphrase annotations.
  • NUS: This contains 211 scientific conference papers with long documents ranging from 4 to 12 pages.
  • Inspec: This consists of 2,000 short documents from scientific journal abstracts in the areas of computer science and information technology.
  • DUC-2001: This is a collection of 308 news articles with medium-length newspapers from TREC-9.

Supported language

Currently algorithms are available only in English, However, this algorithm provides the keyword extraction pipeline, which is easy to customize in other languages.

Citation

@inproceedings{10.1145/3587716.3587972,
author = {Thaiprayoon, Santipong and Unger, Herwig},
title = {A Lightweight Keyword Extraction Algorithm Using a Term Weighting Scheme with Word Features},
year = {2023},
isbn = {9781450398411},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3587716.3587972},
doi = {10.1145/3587716.3587972},
abstract = {The rapid growth of numerous collections of unstructured text increases the need to extract meaningful information. This paper proposes a new lightweight keyword extraction algorithm based on a term weighting scheme with multiple word features, including term frequency, inverse sentence frequency, term difference sentence, term position, and term length. The goal is to automatically extract important words and phrases from unstructured text without training data or domain-specific knowledge. The experimental results on several benchmark datasets show that the proposed algorithm significantly outperforms baseline and state-of-the-art approaches in terms of F1 scores.},
booktitle = {Proceedings of the 2023 15th International Conference on Machine Learning and Computing},
pages = {602–606},
numpages = {5},
keywords = {keyword extraction, feature extraction, term weighting, statistical model, unsupervised learning},
location = {Zhuhai, China},
series = {ICMLC '23}
}

References

Contact

For any question, feel free to create an issue, and we will try our best to solve.

Name: Santipong Thaiprayoon
E-mail: [email protected]

About

Lightweight Unsupervised Keyword Extraction (LUKE) Algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages