Create a token/embedding creation preprocessing pipeline using tf-transform #124

iislucas · 2018-07-02T21:08:28Z

Issue:
We currently depend on vocabularies, like glove embeddings, that are:

Weirdly biased (although when you backprop to the embeddings, their initial bias is not very relevant anymore),
Depend on being consistent with the tokenizer we use.
Don't necessarily have the same words as our actual text.

Proposed solution project:
Use https://github.com/tensorflow/transform to develop text preprocessing pipelines, e.g. to select tokens that occur sufficiently frequently, and create either random or smarter word embeddings for them.

fprost · 2018-07-17T16:39:35Z

FYI: Not sure if that helps but here is a basic example with tft: https://github.com/tensorflow/transform/blob/master/examples/sentiment_example.py

…eaks tweaking docs to be clearer and better formatted

iislucas changed the title ~~Create a preprocessing pipeline using https://github.com/tensorflow/transform~~ Create a token/embedding creation preprocessing pipeline using https://github.com/tensorflow/transform Jul 2, 2018

iislucas changed the title ~~Create a token/embedding creation preprocessing pipeline using https://github.com/tensorflow/transform~~ Create a token/embedding creation preprocessing pipeline using tf-transform Jul 2, 2018

ipavlopoulos pushed a commit to ipavlopoulos/conversationai-models that referenced this issue Mar 2, 2019

Merge pull request conversationai#124 from conversationai/iislucas-tw…

5062e83

…eaks tweaking docs to be clearer and better formatted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a token/embedding creation preprocessing pipeline using tf-transform #124

Create a token/embedding creation preprocessing pipeline using tf-transform #124

iislucas commented Jul 2, 2018

fprost commented Jul 17, 2018

Create a token/embedding creation preprocessing pipeline using tf-transform #124

Create a token/embedding creation preprocessing pipeline using tf-transform #124

Comments

iislucas commented Jul 2, 2018

fprost commented Jul 17, 2018