Webpage similarity is a library to compare two webpages.
We used fuzzy approach. You can find here:
- fuzzy text classification
- similarity between two texts based on classifications
- fuzzy images comparsion (RGBA, brightness and size)
- webpage crawler Example usage is shown in example directory.
Before you can use this library you must install dependencies from requirements.txt
and antlr3
And train system:
./create_db.py