Twish is a web application that allows you to host tweets classifiers (Machine Learning-based, rule-based, whatever-based). Once you have set up the app, your users can enter a search term and Twish will collect tweets based on it and classify them using the classifiers you set up.
- Tweets search box (uses Twitter API).
- Classified tweets visualizer.
- Email notifications when classification jobs complete.
- Search history.
- Customizable app name, logo and about page.
- Install Docker Compose on your machine.
- Get the source code on to your machine via git.
git clone https://github.com/xavierfigueroav/twish && cd twish
-
Edit the file
dev.env
and set values for the variablesTWITTER_CONSUMER_KEY
,TWITTER_CONSUMER_SECRET
,TWITTER_ACCESS_TOKEN
,TWITTER_ACCESS_TOKEN_SECRET
. -
Build and run the Docker containers.
docker-compose up --build
- That's it. Open a web browser and hit the URL http://127.0.0.1:3000.
To add your own classfiers, you need to follow six steps.
Your classification logic surely has Python dependencies like numpy, pandas, scikit-learn, etc. To install your dependencies, you need to add them to the requirements.txt
file. You can do that manually or can install them in the container and freeze them:
docker exec -it twish_django_1 pip install <lib_1> [<lib_2>...]
docker exec -it twish_django_1 pip freeze > requirements.txt
Not much to say about this. Just create a folder for your model files in the directory backend/classifier/models/
and put your files there. The example
folder is the home for the model files of the classifier Twish comes with out-of-the-box.
Very often, tweets need to be preprocessed before passing them into a classifier. If this is your case, you should place your preprocessing logic in backend/classifier/preprocessors.py
. You are not required to, but you can follow the example preprocessor LogisticRegressionPreprocessor
.
All the logic needed for your model to make predictions must be placed in the module backend/classifier/predictors.py
. Your prediction logic must be encapsulated in a class that subclasses from AbstractPredictor
and implements the predict
method.
The predict
method should return a collection of triples containing tweet and prediction information in the following order: tweet id, tweet date, tweet predicted label (instance of PredictionLabel
from backend/classifier/models.py
).
Although subclassing from AbstractPredictor
forces you to implement the method predict
, it does not force you to follow the parameter and return values format of it. However, you are strongly encouraged to follow it to avoid further changes in the codebase.
It is in your predictor class where you should use your preprocessor from step 3.
Take a look at the class LogisticRegression
for an example on how to implement your own predictor class.
Note: Instances of classes in predictors.py
are cached if you call get_predictor
(in backend/classifier/utils.py
, Twish already does so) instead of instantiating them directly. This is done to mitigate the cost of loading (likely) heavy model files for every prediction request. Yes, you should load your files from step 2 in your predictor class to take advantage of caching.
You need to tell Twish about your predictor for it to take it into account when making predictions.
a. Go to the Django Admin site. Log in using:
user: admin
password: admin
b. Add a new instance of the Predictor
model. Your predictor's name MUST MATCH your predictor class' name you created in step 4. Direct link.
c. Modify the existing instance of the App
model and set your new predictor as default predictor. Direct link.
Great! You are almost done. Now, you need to restart the containers that host the backend logic.
docker restart twish_celery_1 twish_django_1
Congratulations! You have your own web application to collect, classify and visualize tweets.
Pending documentation.
Pending documentation.