Skip to content

Latest commit

 

History

History
19 lines (11 loc) · 548 Bytes

README.md

File metadata and controls

19 lines (11 loc) · 548 Bytes

Document Classification Pipeline using Apache Spark - Scala

Requirements

  • Spark 1.5+
  • Scala 2.10+
  • Stanford Core NLP 3.6.0 Jar

Files Description

  • TextUtilities/TextTools.scala - contains functions for annotating the text

  • TextUtilities/TextCleaner.scala - contains function for cleaning and preprocessing the text documents

  • DocumentClassification/ModelArchitecture.scala - contains the comple classification architecture

Online Article

http://analyticsindiamag.com/document-classification-using-apache-spark-scala/