-
Notifications
You must be signed in to change notification settings - Fork 1
Reproducing Larger Simulation Experiments
The values for the graphs in Figure 2 in section 4.2 are produced via the Scala program tmeval.LargeSimulatedExperiment.
To produce the 50 topics results, use the largesim-exp target of bin/tmeval. If you run it without arguments it gives a help message with the meanings of each argument (there are no defaults).
$ bin/tmeval largesim-exp
Usage: bin/tmeval largesim-exp <number-of-topics> <vocabulary-size> <number-of-documents> <document-length> <number-of-repetitions> <output-file>
The following is how we obtained the results for 50 topics (the left plot on Figure 2).
$ bin/tmeval largesim-exp 50 10000 1000 500 10 largesim50.csv
The values we obtained for the plot in the paper were:
LargeSim50,Kalman,-4532642.271402556,-4531623.836971008,-4542048.316374431,-4532854.324401179,-4545570.899549234,-4505062.678433816,-4535314.340924142,-4543509.586094868,-4542065.197551466,-4519823.295160259,-4528550.238565171
LargeSim50,L2R(1),-4539375.764650745,-4536838.162946332,-4549802.825231814,-4540338.70209626,-4552581.468640241,-4510213.118115451,-4542860.757147909,-4551092.380149816,-4551799.353117689,-4524244.021547841,-4533986.857514105
LargeSim50,L2R(50),-4533001.115480695,-4531929.1814836925,-4542562.260136852,-4533261.949166238,-4545856.8409332475,-4505362.104331555,-4535689.890292934,-4543851.117856992,-4542632.603849108,-4520064.737553305,-4528800.469203028
The format is experiment-name,evaluation-method,average-likelihood,list-of-likelihoods-from-each-run.
For 200 topics (middle plot in Figure 2), just change the first argument to 200.
$ bin/tmeval largesim-exp 200 10000 1000 500 10 largesim200.csv
The values we obtained for the plot in the paper were:
LargeSim200,Kalman,-4583213.16346041,-4583478.047113104,-4583708.427369431,-4572739.800123176,-4585708.140300082,-4583912.368151182,-4585459.778659023,-4582423.565804229,-4583949.922087632,-4586058.632124363,-4584692.9528718805
LargeSim200,L2R(1),-4584762.029026356,-4585092.322136145,-4585229.776368951,-4574092.573019889,-4587233.166732971,-4585301.909810677,-4587219.134465082,-4583887.575941862,-4585365.147593665,-4587766.231279148,-4586432.45291517
LargeSim200,L2R(50),-4583254.909956755,-4583506.896736844,-4583737.760591925,-4572772.117876212,-4585771.257728904,-4583944.766071291,-4585502.876098069,-4582474.694484035,-4583984.499804737,-4586105.904077231,-4584748.326098311
This file sits in the topicmodel-eval/results
directory. To produce the PDF files for the plots in Figure 4.2 from the paper, run the following in the results directory:
R CMD BATCH makeplots.R
This creates the file largesims.pdf
. You can obtain your own output as above, and substitute that for the files largesim50.csv
and largesim200.csv
in the results directory, and then run makeplots.R to get the plots for your output.
Note: this script will also produce the output for the real corpus experiments.