This is the (extended version) of the model presented in my paper "Have you Done Anything Like That? Predicting Performance Using Inter-category Reputation".
Run "Reputation.java" with "--help" to see all available options.
Available options:
-t Train: crete regression files.
-g Generate the necessary training files from the raw instances.
-e Run evaluation.
-w Print results to files.
-cv Run cross validation
-f Number of folds for cross validation.
-s Split files to -f number of folds. The spliting is by contractor.
-p Output predictions to "predictions.csv".
-a Show average accuracies for cross validation.
-n Evaluate on train set.
The raw input file should be a csv in the form:
contractor,category,score
All you have to change is the paths and the hierarchy levels in the config.properties file. The code supports only two levels at this point: r, and its children, rl and rr.
A simple run with "train.csv" and "test.csv" raw data files wiill be the following:
java Reputation -g -t -e
Sparse data - Hierarchies:
Create the raw files "syn_cluster_test.csv", "syn_cluster_train.csv" in the form contractor,category,score. In the config. properties file setup:
hierarchyStructure=r,rl,rr r=overall,non-technical,technical r-realMap=non-technical:rl,technical:rr rl=overall,writing,administrative,sales-and-marketing,one-more rr=overall,web-dev,soft-dev,des-mult,two-more category-mapping=0:overall,10:non-technical,20:technical,1:web-dev,2:soft-dev,5:writing,6:administrative,3:des-mult,7:sales-and-marketing,4:two-more,8:one-more category-to-root=1:20,2:20,3:20,4:20,5:10,6:10,7:10,8:10 basedon=r:_BasedOn_0_10_20,rl:_BasedOn_0_5_6_7_8,rr:_BasedOn_0_1_2_3_4
- Run Rputation -g -t -p -n -e -w -yc
- Run EMComputeLamabda
- Run Reputation -e -w -yc
No hierarchies: Create the raw files "syn_test_cat3","syn_train_cat3", where 3 is the number of ctagories. make Sure in the config.properties file you have: hierarchyStructure=r r=overall,non-technical,technical,one-more category-mapping=0:overall,1:non-technical,2:technical,3:extra basedon=r:_BasedOn_0_1_2_3
- Run Reputation -g -t -p -n -e -w -y
- Run EMComputeLambda -c 3 (the number of categories)
- Run Reputation -e -2 -y
In the config.properties have everything the same as in the hold out evaluation. Remember to check outOfScore
- Split the files to some folds: Run Reputation -cv -s -f 10
- Generate and Train. Run reputation: -g -t -cv -f 10
- Evaluate on Train. Run reputation -e -p -w -n -cv -f 10 (or combine steps 2 and 3)
- Run EMComputeLambdas -c (for cross validation)
Set up your config.properties. (including: outOfScore, hierarchy structure, priors, thresholds, mappings) 1.Create Train and evaluate on train: Reputation -g -t -p -w -n -e 2. Run EMComputeLambdas -r (option to denote real) 3. Run Reputation -e -w