HamHD: Spam Text Detection using Hyperdimensional Computing
- The SMS ham/spam dataset can be obtained from Kaggle at: https://www.kaggle.com/uciml/sms-spam-collection-dataset
- The Youtube comment ham/spam dataset can be obtained from Kaggle at: https://www.kaggle.com/lakshmi25npathi/images
- You can train / test on different subsets individually, or concatenate all the csvs to build one large dataset.
- In the python script, change the f_ variable to match with the dataset csv path you downloaded.
-
This application runs udner Python 3. Please have the newest python version installed.
-
This application requires the following packages: pandas, numpy, scikit-learn and matplotlib (if you would like to output figures). Packages can be installed via:
pip install pandas numpy scikit-learn matplotlib
- Application can be run by directly executing the python script, e.g. "python3 HamHD-text.py"
- For details about the script (parameters, encoding schemes etc) please check the comments inside HamHD-text.py.