docs: update README.md

lumyjuwon · Apr 27, 2024 · 06689c7 · 06689c7
1 parent 6a13833
commit 06689c7
Showing 1 changed file with 17 additions and 83 deletions.
diff --git a/README.md b/README.md
@@ -1,8 +1,8 @@
 # KoreaNewsCrawler
 
-이 크롤러는 네이버 포털에 올라오는 언론사 뉴스 기사들을 크롤링 해주는 크롤러입니다.  
-크롤링 가능한 기사 카테고리는 정치, 경제, 생활문화, IT과학, 사회, 세계, 오피니언입니다.  
-스포츠 기사같은 경우 해외야구, 해외축구, 한국야구, 한국축구, 농구, 배구, 골프, 일반 스포츠, e스포츠입니다.  
+This crawler is a crawler that crawls news articles from media organizations posted on NAVER portal.  
+Crawlable article categories include politics, economy, lifeculture, global, IT/science, society.
+In the case of sports articles, that include korea baseball, korea soccer, world baseball, world soccer, basketball, volleyball, golf, general sports, e-sports.
 
 ## How to install
     pip install KoreaNewsCrawler
@@ -11,82 +11,17 @@
 
 * **set_category(category_name)**
 
- 이 메서드는 수집하려고자 하는 카테고리는 설정하는 메서드입니다.  
- 파라미터에 들어갈 수 있는 카테고리는 '정치', '경제', '사회', '생활문화', 'IT과학', '세계', '오피니언'입니다.  
- 파라미터는 여러 개 들어갈 수 있습니다.  
- category_name: 정치, 경제, 사회, 생활문화, IT과학, 세계, 오피니언 or politics, economy, society, living_culture, IT_science, world, opinion
+ This method is to set the category you want to collect.  
+ The categories that can be included in the parameter are 'politics', 'economy', 'society', 'living_culture', 'IT_science', 'world', and 'opinion'.  
+ You can have multiple parameters.  
+ category_name: politics, economy, society, living_culture, IT_science, world, opinion
 
 * **set_date_range(startyear, startmonth, endyear, endmonth)**
 
- 이 메서드는 수집하려고자 하는 뉴스의 기간을 의미합니다. 기본적으로 startmonth월부터 endmonth월까지 데이터를 수집합니다.
+ This method refers to the time period of news you want to collect. By default, it collects data from the month of startmonth to the month of endmonth.
 
 * **start()**
 
- 이 메서드는 크롤링 실행 메서드입니다.
-
-## Article News Crawler Example
-```
-from korea_news_crawler.articlecrawler import ArticleCrawler
-
-Crawler = ArticleCrawler()  
-Crawler.set_category("정치", "IT과학", "economy")  
-Crawler.set_date_range("2017-01", "2018-04-20")
-Crawler.start()
-```
-  2017년 1월 ~ 2018년 4월 20일까지 정치, IT과학, 경제 카테고리 뉴스를 멀티프로세서를 이용하여 병렬 크롤링을 진행합니다.
-
-## Sports News Crawler Example 
-  Method는 ArticleCrawler()와 유사합니다.
-```
-from korea_news_crawler.sportcrawler import SportCrawler 
-
-Spt_crawler = SportCrawler()
-Spt_crawler.set_category('한국야구','한국축구')
-Spt_crawler.set_date_range("2017-01", "2018-04-20")
-Spt_crawler.start()
-```
-  2017년 1월 ~ 2018년 4월 20일까지 한국야구, 한국축구 뉴스를 멀티프로세서를 이용하여 병렬 크롤링을 진행합니다.
-
-## Results
- ![ex_screenshot](./img/article_result.PNG)
- ![ex_screenshot](./img/sport_resultimg.PNG)
-
- Colum A: 기사 날짜 & 시간  
- Colum B: 기사 카테고리  
- Colum C: 언론사  
- Colum D: 기사 제목  
- Colum E: 기사 본문  
- Colum F: 기사 주소  
- 수집한 모든 데이터는 csv 확장자로 저장됩니다.  
-
-
-# KoreaNewsCrawler (English version)
-
-This crawler crawles news from portal Naver  
-Crawlable article categories include politics, economy, lifeculture, global, IT/science, society.  
-In the case of sports articles, that include korea baseball, korea soccer, world baseball, world soccer, basketball, volleyball, golf, general sports, e-sports.  
-
-**In the case of sports articles, you can't use sport article crawler because html form is changed. I will update sport article crawler 
-as soon as possible.**
-
-## How to install
-    pip install KoreaNewsCrawler
-
-## Method
-
-* **set_category(category_name)**
-
- This method is setting categories that you want to crawl.  
- Categories that can be entered into parameters are politics, economy, society, living_culture, IT_science. 
- Multiple parameters can be entered.
-
-* **set_date_range(startyear, startmonth, endyear, endmonth)**
-
- This method represents the duration of the news you want to collect.  
- Data is collected from startmonth to endmonth.
-
-* **start()**
-
  This method is the crawl execution method.
 
 ## Article News Crawler Example
@@ -98,9 +33,9 @@ Crawler.set_category("politics", "IT_science", "economy")
 Crawler.set_date_range("2017-01", "2018-04-20") 
 Crawler.start()
 ```
- From January 2017 to April 20 2018, Parallel crawls will be conducted using multiprocessors for political, IT science, world, and economic category news.
+  Perform a parallel crawl of news in the categories Politics, IT Science, and Economy from January 2017 to April 20, 2018 using a multiprocessor.
 
-## Sports News Crawler Example
+## Sports News Crawler Example 
   Method is similar to ArticleCrawler().
 ```
 from korea_news_crawler.sportcrawler import SportCrawler 
@@ -110,17 +45,16 @@ Spt_crawler.set_category('korea baseball','korea soccer')
 Spt_crawler.set_date_range("2017-01", "2018-04-20") 
 Spt_crawler.start()
 ```
-  From January 2017 to April 20 2018, Parallel crawls will be conducted using multiprocessors for korea baseball, and korea soccer category news.
+  Execute a parallel crawl of Korean baseball and Korean soccer news from January 2017 to April 20, 2018 using a multiprocessor.
 
 ## Results
  ![ex_screenshot](./img/article_result.PNG)
  ![ex_screenshot](./img/sport_resultimg.PNG)
 
- Colum A: Article Date & Time  
+ Colum A: Article date & time  
  Colum B: Article Category  
- Colum C: Article Press  
- Colum D: Article headline  
- Colum E: Article Content  
- Colum F: Article URL  
-
- All collected data is saved as a csv.
+ Colum C: Media Company  
+ Colum D: Article title  
+ Colum E: Article body  
+ Colum F: Article address  
+ All the data you collect is saved with a CSV extension.