Split feature counting from feature extraction #10

thvitt · 2016-08-24T15:52:54Z

The FeatureGenerator currently implements feature extraction (reading files, tokenization), postprocessing (case sensitivity, ngrams), and the bag-of-words stuff (feature counting, merging into a dataframe).

For some implementations (zeta, chunking, topic modelling?) we need the tokenized data that has not yet been merged into the bag of words model.

The text was updated successfully, but these errors were encountered:

thvitt removed the next branch label Aug 24, 2016

thvitt self-assigned this Aug 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split feature counting from feature extraction #10

Split feature counting from feature extraction #10

thvitt commented Aug 24, 2016

Split feature counting from feature extraction #10

Split feature counting from feature extraction #10

Comments

thvitt commented Aug 24, 2016