Text Classification is one of the machine learning techniques which has been used :
- natural language processing
- sentiment analysis
- spam & intent detection
- searching
- Organizing text(stories, papers)
Due to the high dimensionality of feature space.The performance of machine learning algorithms for the categorization of documents reduces. Feature selection is an important data preprocessing strategy.
In order to test your approach in text classification yout these datsets. These sets conains 4 marvelous datasets for text classification which have been used in various machine learning approaches. All of these dataset are in the form of matlab and you can use it pretty straightforward.
- 20newsgroup
- Reuter21578
- RCV1_4
- TDT2
All of datasets are can be used in Matlab application. You can open them easily in your workspace. Datasets are in 2-dimensional array which can be easily used by your algorithm.