This repository contains the source code used in our paper Explainable Authorship Verification in Social Media via Attention-based Similarity Learning published at 2019 IEEE International Conference on Big Data (IEEE BigData 2019)
Please, feel free to send any comments or suggestions! (benedikt.boenninghoff[at]
We used Python 3.6 (Anaconda 3.6). The following libraries are required:
- Tensorflow 1.12.0
- spacy 2.1.8
- textacy 0.8.0
- fasttext 0.9.1
- numpy 1.15.4
- scipy 1.1.0
- pandas 0.23.4
- scikit-learn 0.20.0
- bs4 0.0.1
The large-scale dataset of short Amazon reviews used in our paper will be published as soon as possible. Currently, this repository works with a small Amazon review dataset. You can download and uncompress the data as follows:
mkdir data
cd data
sudo apt-get install p7zip-full
7z x amazon.7z
We used pretrained word embeddings. You may prepare them as follows:
cd data
gunzip cc.en.300.bin.gz
cd preprocessing
You can choose two Siamese network models: AdHominem or HRSN:
cd training
python --model_type "AdHominem"
If you use our code or data, please cite the papers using the following BibTeX entries:
author={Benedikt Boenninghoff, Steffen Hessler, Dorothea Kolossa and Robert M. Nickel},
title={Explainable Authorship Verification in Social Media via Attention-based Similarity Learning},
booktitle={IEEE International Conference on Big Data (IEEE Big Data 2019), Los Angeles, CA, USA, December 9-12, 2019},
author={Benedikt Boenninghoff, Robert M. Nickel, Steffen Zeiler and Dorothea Kolossa},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), Brighton, UK, May 12-17, 2019},
title={Similarity Learning for Authorship Verification in Social Media},