seqDenoAntoencoder

Here is some code for tackling topic analysis task. That is, given a topic word and a bunch of text files, find the most relevant files with respect to the topic word.

I have used

tfidf
lad
doc2vec (paragraph vector)
average word embedding
sequence denoising autoencoder

for sequence denoising autoencoder, I have implement based on keras, while the Theano version and Tensorflow version has a slightly different on the implementation of the penalty function, but basically the same. ** note that our model use pretrain word embedding, so you should either pretrain some word vectors or change the default setting in encoder_xxx.py **

To use them, just run ** python xxx.py **

It requires:

gensim
numpy
lda
keras
theano or tensorflow

It will print out the result.

Have fun.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
averageEmbedding.py		averageEmbedding.py
docv2ec.py		docv2ec.py
encoder_tensorflow.py		encoder_tensorflow.py
encoder_theano.py		encoder_theano.py
lda_method.py		lda_method.py
parser.py		parser.py
tfidf.py		tfidf.py
wordEmbedding.py		wordEmbedding.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seqDenoAntoencoder

About

Releases

Packages

Languages

Vincent717/seqDenoAntoencoder

Folders and files

Latest commit

History

Repository files navigation

seqDenoAntoencoder

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages