- TextSplitter "cleans" and converts PDFs to documents with embedded sentences.
- SentenceMatcher computes TF-IDF frequency and clusters documents with K-means into manageable clusters. Sentences within the same clusters are then compared with their Sørensen–Dice coefficient.
After that, data can be ad-hoc mined from sentences
via some other tool,...
sbt -sbt-version 0.13.13 test