Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split feature counting from feature extraction #10

Open
thvitt opened this issue Aug 24, 2016 · 0 comments
Open

Split feature counting from feature extraction #10

thvitt opened this issue Aug 24, 2016 · 0 comments
Assignees

Comments

@thvitt
Copy link
Collaborator

thvitt commented Aug 24, 2016

The FeatureGenerator currently implements feature extraction (reading files, tokenization), postprocessing (case sensitivity, ngrams), and the bag-of-words stuff (feature counting, merging into a dataframe).

For some implementations (zeta, chunking, topic modelling?) we need the tokenized data that has not yet been merged into the bag of words model.

@thvitt thvitt self-assigned this Aug 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant