Counts of pairs of places appearing in same Wikipedia article
You can download the counts as a zipped tsv from here. The tsv looks like this:
a | b | count |
---|---|---|
New York City | United States of America | 500 |
Russia | Soviet Union | 300 |
You can also download the counts as a zipped pickled Python Counter from here. Here's how you can use it:
import pickle
with open("/tmp/cooccurrences.pickle", "rb") as f:
cooccurrences = pickle.load(f)
count = cooccurrences[("New York City", "United States of America")]
print("New York City and United States of America appear together " + str(count) + " times.")
The keys are sorted using Python's sorted
method. You must sort your keys before looking up the results. For example:
two_random_keys.sort()
a, b = two_random_keys
count = cooccurrences[tuple(two_random_keys)]
print(a + " and " + b + " appear together " + str(count) + " times.")
You can use this data to decide what cooccurrence of places makes the most sense. For example:
a | b | count |
---|---|---|
Azerbaijan | Georgia (country) | 2033 |
Azerbaijan | Georgia (U.S. state) | 101 |