Skip to content

AIAML/Datasets_TextCategorization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Classification

Text Classification is one of the machine learning techniques which has been used :

  • natural language processing
  • sentiment analysis
  • spam & intent detection
  • searching
  • Organizing text(stories, papers)

Feature selection in Text Categorization

Due to the high dimensionality of feature space.The performance of machine learning algorithms for the categorization of documents reduces. Feature selection is an important data preprocessing strategy.

Best Datasets For TextCategorization

In order to test your approach in text classification yout these datsets. These sets conains 4 marvelous datasets for text classification which have been used in various machine learning approaches. All of these dataset are in the form of matlab and you can use it pretty straightforward.

Dataset Names

  • 20newsgroup
  • Reuter21578
  • RCV1_4
  • TDT2

All of datasets are can be used in Matlab application. You can open them easily in your workspace. Datasets are in 2-dimensional array which can be easily used by your algorithm.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published