This project demonstrates how to perform text classification using Neural Networks. The primary task is to classify movie reviews as positive or negative based on textual data from the IMDB dataset.
The dataset used in this project is the IMDB dataset, which contains 50,000 movie reviews with the following structure:
- review: The textual content of the review.
- sentiment: The label indicating the sentiment of the review (positive or negative).
You can download the dataset from Kaggle link.
- Data Preprocessing:
- Remove special characters and HTML tags.
- Convert text to lowercase.
- Remove stop words using NLTK.
- Tokenize and pad sequences.
- Model Building:
- Utilize embedding layers for word representation.
- Build a sequential neural network with layers like LSTM and Dense.
- Evaluation:
- Split data into training and testing sets.
- Train the model using the training set and evaluate performance on the test set.
- Python
- Keras (for deep learning models)
- NLTK (for text preprocessing)
- Pandas & NumPy (for data manipulation)
- Google Colab (for execution environment)
The model consists of the following components:
- Embedding Layer: To convert words into dense vectors.
- LSTM Layer: For capturing sequential dependencies in the text.
- Dense Layers: For classification output.
- Clone this repository:
git clone https://github.com/thanghd1112/Text-Classification-with-Neural-Networks.git
- Open the
07_textClassification.ipynb
notebook. - Load the IMDB dataset into the specified path.
- Run the notebook cells sequentially.
The model achieves high accuracy on the test set, demonstrating its effectiveness in classifying movie reviews. The exact results (e.g., accuracy, loss) can be found in the notebook's output section.