-
Notifications
You must be signed in to change notification settings - Fork 100
Homework 4
Given a pair of words (e.g., king
and male
), your task is to find the most similar pair (e.g., queen
and female
) using word vectors and their cosine similarities.
-
Login to your azure account.
-
Install
numpy
:sudo apt-get install python-numpy
-
Download the following word vectors:
wget http://www.mathcs.emory.edu/~choi/courses/cs329/dat/w2v.bin
-
Download the following vocabulary list:
wget https://raw.githubusercontent.com/emory-courses/cs329/master/src/distributional_semantics/vocab_100_verbs.txt
-
Create
hw4.py
by modifyingw2v.py
such that: -
Construct a diff vector for each pair of words (e.g.,
v = v1 - v2
). Do not create diff vectors from the same words (e.g.,v = v1 - v1
). -
For each diff vector, find the top-k similar diff vectors, where
k = 5
. All 4 words in the diff vectors must be different (e.g.,w1 : w2 = w3 : w4
, where none of thew1
,w2
,w3
, andw4
are the same). -
Save your results to
hw4.txt
as follows:word1 : word2 = word3 : word4 ...
-
There are about
10,000
combinations, which means your output file should contain10,000 * 5
lines. You need to write less than 20 lines to complete this homework, although it will take a while to run. Please be wise and plan ahead to complete; no extension is allowed for this homework. -
Create the
cs329/hw4
directory and submithw4.py
,hw4.txt
, and a report showing the top-20 most interesting analogy pairs.
Copyright © 2016 Emory University - All Rights Reserved.