Skip to content

Undergraduate graduation project: Abstract text summarization

Notifications You must be signed in to change notification settings

MaryAhn/graduationProject_nlp

Repository files navigation

graduationProject_nlp

Explanation (English)

  1. File Description dataloader: Base dataloader class summary-reward-no-reference: Library for KR-WordRank (NLP library for the Korean language, not exist now)
    textrank: Library for Textrank

AbstractToTitle_krwordrank.py: Generate a title from the abstract using KR-WorkRank (only for Korean data)
AbstractToTitle_textrank.py: Generate a title from the abstract using WorkRank
CleanText.py: Preprocess Korean data with Mecab CsvToTxt.py: Convert the title and abstract of a paper from a .csv file to a .txt file and save
DataPreProcessing_okt.py: Preprocess Korean data with Okt
pdf_crawler.py: Convert a .pdf file to a .csv file and save

  1. Code Explanation

python AbstractToTitle_krwordrank.py --data_input_path=your_input_path --data_save_path=your_save_path
python AbstractToTitle_textrank.py --data_input_path=your_input_path --data_save_path=your_save_path
python CleanText_mecab.py --data_input_path=your_input_path --data_save_path=your_save_path

  1. Supplementary Material (video)
    youtube link: https://www.youtube.com/channel/UCvKmqttq--wUqcg53IRIbMw

  2. Project Summary

Abstract text summarization model which generates a title from a paper abstract.

  1. reference

TextRank: https://lovit.github.io/nlp/2019/04/30/textrank/
KR-WordRank: https://lovit.github.io/nlp/2019/05/01/krwordrank_sentence/

  1. Engineering Blog

Keonhee Ahn - Crawl .pdf file, Text pre-process including normalization, Text summarization (TextRank, KR-WordRank)
link: https://blog.naver.com/aws_lik

Jihwan Kim - Theorical explanation of natural language processing, extraction and generation
link: https://deli-ce.tistory.com/2

Explanation (Korean)

  1. 파일 및 폴더 설명
    dataloader: 입력한 경로의 데이터를 한꺼번에 부르기 위한 datalodader class의 기본 모듈
    summary-reward-no-reference: KR-WordRank를 사용하기 위한 라이브러리
    textrank: TextRank를 사용하기 위한 라이브러리

AbstractToTitle_krwordrank.py: KR-WordRank를 이용하여 초록에서 제목을 생성함
AbstractToTitle_textrank.py: TextRank를 이용하여 초록에서 제목을 생성함
CleanText.py: 한국어 텍스트 전처리 파일 (with Mecab)
CsvToTxt.py: Csv 파일의 제목, 초록을 txt 파일로 변환하여 저장함
DataPreProcessing_okt.py: 한국어 텍스트 전처리 파일 (with Okt)
pdf_crawler.py: pdf를 csv 파일로 변환하여 저장함

  1. 실행 코드

python AbstractToTitle_krwordrank.py --data_input_path=your_input_path --data_save_path=your_save_path
python AbstractToTitle_textrank.py --data_input_path=your_input_path --data_save_path=your_save_path
python CleanText_mecab.py --data_input_path=your_input_path --data_save_path=your_save_path

  1. 시연 영상
    youtube link: https://www.youtube.com/channel/UCvKmqttq--wUqcg53IRIbMw

  2. 프로젝트 소개

자연어 처리를 이용하여 논문 초록에서 제목을 생성하는 abstract text summarization 모델.

  1. reference

TextRank: https://lovit.github.io/nlp/2019/04/30/textrank/
KR-WordRank: https://lovit.github.io/nlp/2019/05/01/krwordrank_sentence/

  1. 기술 블로그 내용

안건희 - pdf 크롤링, 텍스트 전처리 및 정규화, 문서 요약 (TextRank, KR-WordRank)
link: https://blog.naver.com/aws_lik

김지환 - 자연어처리와 추출, 생성 요약에 대한 이론적 설명
link: https://deli-ce.tistory.com/2

About

Undergraduate graduation project: Abstract text summarization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published