Skip to content
Jinho D. Choi edited this page Apr 19, 2016 · 7 revisions

Word Analogy

Given a pair of words (e.g., king and male), your task is to find the most similar pair (e.g., queen and female) using word vectors and their cosine similarities.

  • Login to your azure account.

  • Install numpy:

    sudo apt-get install python-numpy
    
  • Download the following word vectors:

    wget http://www.mathcs.emory.edu/~choi/courses/cs329/dat/w2v.bin
    
  • Download the following vocabulary list:

    wget https://raw.githubusercontent.com/emory-courses/cs329/master/src/distributional_semantics/vocab_100_verbs.txt
    
  • Create hw4.py by modifying w2v.py such that:

  • Construct a diff vector for each pair of words (e.g., v = v1 - v2). Do not create diff vectors from the same words (e.g., v = v1 - v1).

  • For each diff vector, find the top-k similar diff vectors, where k = 5. All 4 words in the diff vectors must be different (e.g., w1 : w2 = w3 : w4, where none of the w1, w2, w3, and w4 are the same).

  • Save your results to hw4.txt as follows:

    word1 : word2 = word3 : word4
    ...
    
  • There are about 10,000 combinations, which means your output file should contain 10,000 * 5 lines. You need to write less than 20 lines to complete this homework, although it will take a while to run. Please be wise and plan ahead to complete; no extension is allowed for this homework.

  • Create the cs329/hw4 directory and submit hw4.py, hw4.txt, and a report showing the top-20 most interesting analogy pairs.

(CS|LING|QTM) 329
Computational Linguistics

Instructor


Emory University

Clone this wiki locally