-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathnotes.txt
743 lines (632 loc) · 54.5 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
Fake test:
ancestors_of_watched 2971 8431( 2.84) 288( 0.10) 55
children_of_watched_repos 3080 1078146(350.05) 17619( 5.72) 6021
coocs 3498 369004(105.49) 205980( 58.89) 162
in_cluster_repo 4788 736493(153.82) 651175(136.00) 200
in_cluster_user 4788 915321(191.17) 877177(183.20) 200
in_id_range 4788 43466( 9.08) 40461( 8.45) 57
parents_of_watched 2971 8143( 2.74) 8143( 2.74) 55
repos_by_watched_authors 4788 615732(128.60) 610637(127.53) 4948
repos_with_same_name 4273 1480456(346.47) 1451433(339.68) 6928
top_twenty 4788 95760( 20.00) 27101( 5.66) 20
Real test:
ancestors_of_watched 2833 9125( 3.22) 300( 0.11) 220
children_of_watched_repos 3490 1402558(401.88) 22617( 6.48) 7253
coocs 3920 420669(107.31) 226227( 57.71) 162
in_cluster_repo 4786 756122(157.99) 661863(138.29) 200
in_cluster_user 4788 912798(190.64) 864139(180.48) 200
in_id_range 4788 45004( 9.40) 42097( 8.79) 46
parents_of_watched 2833 8825( 3.12) 8825( 3.12) 219
repos_by_watched_authors 4786 806416(168.49) 800937(167.35) 7125
repos_with_same_name 4359 1805055(414.10) 1767657(405.52) 8784
top_twenty 4788 95760( 20.00) 24160( 5.05) 20
20000 (unbiased) users, looking at rule accuracy:
--------- absolute ---------- -------- incremental --------
fired corr rcl% nadded avg corr rcl% nadded avg maxsz
1 parents_of_watched 12464 6179(49.57%) 14890( 1.19) 6179(49.57%) 14890( 1.19) 32
2 ancestors_of_watched 12464 6184(49.61%) 15895( 1.28) 5( 0.04%) 1005( 0.08) 32
3 repos_by_watched_authors 19999 5174(25.87%) 2406984( 120.36) 4673(23.37%) 2405421( 120.28) 8173
4 repos_with_same_name 17823 6918(38.82%) 6147479( 344.92) 512( 2.87%) 6060832( 340.06) 12919
5 children_of_watched_repos 12917 822( 6.36%) 4513153( 349.40) 31( 0.24%) 73781( 5.71) 10178
6 in_cluster_user 20000 3847(19.23%) 3741561( 187.08) 2030(10.15%) 3654691( 182.73) 200
7 in_cluster_repo 20000 1690( 8.45%) 2955454( 147.77) 307( 1.53%) 2662277( 133.11) 200
8 in_id_range 20000 1406( 7.03%) 166920( 8.35) 423( 2.12%) 163830( 8.19) 106
9 coocs 14389 5434(37.76%) 1454873( 101.11) 1036( 7.20%) 863293( 60.00) 155
10 top_twenty 20000 1479( 7.39%) 387329( 19.37) 51( 0.26%) 127533( 6.38) 20
total 15247
generator fired corr
1 generator.parents_of_watched 12464 6179(49.57%) 14890( 1.19) 6179(49.57%) 14890( 1.19) 32
2 generator.ancestors_of_watched 12464 6184(49.61%) 15895( 1.28) 5( 0.04%) 1005( 0.08) 32
3 generator.by_watched_authors 19999 4278(21.39%) 683331( 34.17) 3874(19.37%) 682607( 34.13) 100
4 generator.same_name 17823 6808(38.20%) 1010006( 56.67) 517( 2.90%) 991585( 55.64) 100
5 generator.children_of_watched 12917 745( 5.77%) 759004( 58.76) 56( 0.43%) 330648( 25.60) 100
6 generator.in_cluster_user 20000 2987(14.94%) 1934443( 96.72) 1756( 8.78%) 1908692( 95.43) 100
7 generator.in_cluster_repo 20000 1563( 7.82%) 1936234( 96.81) 373( 1.86%) 1769798( 88.49) 100
8 generator.in_id_range 20000 1406( 7.03%) 166913( 8.35) 435( 2.17%) 164468( 8.22) 99
9 generator.coocs 14389 4638(32.23%) 1154169( 80.21) 1094( 7.60%) 775630( 53.90) 100
total 14289
Most losses occur in the by_watched_authors.
After same_name fixes (for ranked FV generation):
generator fired corr
ancestors_of_watched 6229 2997(48.11%) 8051( 1.29) 3( 0.05%) 500( 0.08) 30
by_watched_authors 10000 2389(23.89%) 537885( 53.79) 2167(21.67%) 537412( 53.74) 199
children_of_watched 6516 382( 5.86%) 383960( 58.93) 17( 0.26%) 160870( 24.69) 100
coocs 7213 2373(32.90%) 579690( 80.37) 552( 7.65%) 386728( 53.62) 100
in_cluster_repo 10000 777( 7.77%) 967541( 96.75) 175( 1.75%) 882519( 88.25) 100
in_cluster_user 10000 1524(15.24%) 966655( 96.67) 860( 8.60%) 950748( 95.07) 100
in_id_range 10000 662( 6.62%) 83359( 8.34) 207( 2.07%) 82125( 8.21) 99
most_watched 10000 1708(17.08%) 983412( 98.34) 178( 1.78%) 541768( 54.18) 100
parents_of_watched 6229 2994(48.07%) 7551( 1.21) 2994(48.07%) 7551( 1.21) 29
same_name 8864 3332(37.59%) 521225( 58.80) 252( 2.84%) 510648( 57.61) 100
Stats after by_watched_authors improvement:
generator fired corr
ancestors_of_watched 2971 1461(49.18%) 3712( 1.25) 3( 0.10%) 264( 0.09) 17
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 1096(31.99%) 265512( 77.50) 200
children_of_watched 3080 164( 5.32%) 182070( 59.11) 8( 0.26%) 73265( 23.79) 100
coocs 3498 1196(34.19%) 276337( 79.00) 165( 4.72%) 180246( 51.53) 100
in_cluster_repo 4788 787(16.44%) 463105( 96.72) 103( 2.15%) 420836( 87.89) 100
in_cluster_user 4788 1271(26.55%) 464027( 96.91) 613(12.80%) 453457( 94.71) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 105( 2.19%) 40604( 8.48) 57
most_watched 4788 795(16.60%) 471046( 98.38) 30( 0.63%) 240660( 50.26) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1601(37.74%) 246959( 58.22) 118( 2.78%) 241886( 57.02) 100
Stats after coocs and coocs2 improvement
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 1096(31.99%) 265512( 77.50) 200
children_of_watched 3080 164( 5.32%) 182070( 59.11) 8( 0.26%) 73265( 23.79) 100
coocs 3722 1301(34.95%) 254636( 68.41) 157( 4.22%) 153944( 41.36) 100
coocs2 3754 1413(37.64%) 274851( 73.22) 56( 1.49%) 70739( 18.84) 100
in_cluster_repo 4788 787(16.44%) 463105( 96.72) 103( 2.15%) 420836( 87.89) 100
in_cluster_user 4788 1271(26.55%) 464027( 96.91) 613(12.80%) 453457( 94.71) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 105( 2.19%) 40604( 8.48) 57
most_watched 4788 795(16.60%) 471046( 98.38) 23( 0.48%) 232982( 48.66) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1601(37.74%) 246959( 58.22) 118( 2.78%) 241886( 57.02) 100
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 1096(31.99%) 265512( 77.50) 200
children_of_watched 3080 164( 5.32%) 182070( 59.11) 8( 0.26%) 73256( 23.78) 100
coocs 3722 1296(34.82%) 255004( 68.51) 155( 4.16%) 154871( 41.61) 100
coocs2 3754 1386(36.92%) 275257( 73.32) 61( 1.62%) 75459( 20.10) 100
in_cluster_repo 4788 787(16.44%) 463105( 96.72) 103( 2.15%) 420836( 87.89) 100
in_cluster_user 4788 1271(26.55%) 464027( 96.91) 613(12.80%) 453457( 94.71) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 105( 2.19%) 40604( 8.48) 57
most_watched 4788 795(16.60%) 471046( 98.38) 26( 0.54%) 233535( 48.78) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1601(37.74%) 246959( 58.22) 118( 2.78%) 241887( 57.02) 100
in_cluster_user fixed up:
half way through:
generator fired corr
ancestors_of_watched 454 3( 0.66%) 500( 1.10) 3( 0.66%) 500( 1.10) 6
by_watched_authors 7181 2611(36.36%) 565230( 78.71) 2368(32.98%) 564561( 78.62) 200
children_of_watched 6516 382( 5.86%) 383960( 58.93) 16( 0.25%) 161234( 24.74) 100
coocs 7782 2600(33.41%) 531182( 68.26) 658( 8.46%) 368661( 47.37) 100
coocs2 7837 2763(35.26%) 576999( 73.62) 141( 1.80%) 175759( 22.43) 100
in_cluster_repo 10000 777( 7.77%) 967541( 96.75) 252( 2.52%) 906615( 90.66) 100
in_cluster_user 10000 904( 9.04%) 983522( 98.35) 499( 4.99%) 957028( 95.70) 100
in_id_range 10000 662( 6.62%) 83359( 8.34) 199( 1.99%) 81685( 8.17) 99
most_watched 10000 1708(17.08%) 983412( 98.34) 204( 2.04%) 616996( 61.70) 100
parents_of_watched 6229 2994(48.07%) 7551( 1.21) 2994(48.07%) 7551( 1.21) 29
same_name 8864 3332(37.59%) 521225( 58.80) 246( 2.78%) 510211( 57.56) 100
second lot of fixes, half way through:
----
Final, user and repo fixed
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 1096(31.99%) 265512( 77.50) 200
children_of_watched 3080 164( 5.32%) 182070( 59.11) 7( 0.23%) 69443( 22.55) 100
coocs 3722 1296(34.82%) 255004( 68.51) 151( 4.06%) 148715( 39.96) 100
coocs2 3754 1386(36.92%) 275257( 73.32) 68( 1.81%) 72964( 19.44) 100
in_cluster_repo 4788 311( 6.50%) 477770( 99.78) 29( 0.61%) 467288( 97.60) 100
in_cluster_user 4788 1344(28.07%) 478800( 100.00) 645(13.47%) 465371( 97.20) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 102( 2.13%) 40658( 8.49) 57
most_watched 4788 795(16.60%) 471046( 98.38) 24( 0.50%) 213613( 44.61) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1606(37.86%) 246959( 58.22) 120( 2.83%) 241763( 56.99) 100
----
Repo fixed again:
generator fired corr
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 1096(31.99%) 265512( 77.50) 200
children_of_watched 3080 164( 5.32%) 182070( 59.11) 7( 0.23%) 69443( 22.55) 100
coocs 3722 1296(34.82%) 255004( 68.51) 130( 3.49%) 143509( 38.56) 100
coocs2 3754 1386(36.92%) 275257( 73.32) 59( 1.57%) 69956( 18.64) 100
in_cluster_repo 4788 679(14.18%) 477770( 99.78) 104( 2.17%) 435446( 90.95) 100
in_cluster_user 4788 1344(28.07%) 478800( 100.00) 645(13.47%) 465371( 97.20) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 104( 2.17%) 40617( 8.48) 57
most_watched 4788 795(16.60%) 471046( 98.38) 24( 0.50%) 212581( 44.40) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1606(37.86%) 246959( 58.22) 120( 2.83%) 241763( 56.99) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3722 2103(56.50%) 40747 9915205(2663.95) 0.4322 0
coocs2 3754 2552(67.98%) 48276 24308132(6475.26) 0.2091 0
in_cluster_repo 4788 1300(27.15%) 0 2318617( 484.26) 0.0561 0
in_cluster_user 4788 1982(41.40%) 0 2391605( 499.50) 0.0829 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
GLZ fixed, half way through:
generator fired corr
ancestors_of_watched 454 3( 0.66%) 500( 1.10) 3( 0.66%) 500( 1.10) 6
by_watched_authors 7181 2611(36.36%) 565230( 78.71) 2368(32.98%) 564561( 78.62) 200
children_of_watched 6516 382( 5.86%) 383960( 58.93) 17( 0.26%) 151804( 23.30) 100
coocs 7782 2608(33.51%) 530524( 68.17) 403( 5.18%) 296155( 38.06) 100
coocs2 7837 2823(36.02%) 576014( 73.50) 107( 1.37%) 137146( 17.50) 100
in_cluster_repo 10000 1235(12.35%) 998196( 99.82) 214( 2.14%) 839579( 83.96) 100
in_cluster_user 10000 1812(18.12%) 1000000( 100.00) 968( 9.68%) 971915( 97.19) 100
in_id_range 10000 662( 6.62%) 83359( 8.34) 204( 2.04%) 82103( 8.21) 99
most_watched 10000 1708(17.08%) 983412( 98.34) 122( 1.22%) 492816( 49.28) 100
parents_of_watched 6229 2994(48.07%) 7551( 1.21) 2994(48.07%) 7551( 1.21) 29
same_name 8864 3326(37.52%) 521225( 58.80) 243( 2.74%) 509934( 57.53) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 454 3( 0.66%) 64 564( 1.24) 11.8794 0
by_watched_authors 7181 2649(36.89%) 0 1235283( 172.02) 0.2144 0
children_of_watched 6516 420( 6.45%) 10874 2324524( 356.74) 0.4859 0
coocs 7782 4224(54.28%) 89346 19992584(2569.08) 0.4680 0
coocs2 7837 5219(66.59%) 107029 50354835(6425.27) 0.2229 0
in_cluster_repo 10000 1614(16.14%) 0 4840586( 484.06) 0.0333 0
in_cluster_user 10000 2602(26.02%) 0 4992600( 499.26) 0.0521 0
in_id_range 10000 662( 6.62%) 4550 87916( 8.79) 5.9284 0
most_watched 10000 1708(17.08%) 16588 1000000( 100.00) 1.8296 0
parents_of_watched 6229 2994(48.07%) 10196 17747( 2.85) 74.3224 0
same_name 8864 3379(38.12%) 0 3152813( 355.69) 0.1072 0
------
After GLZ classifier fix:
fake test:
generator fired corr
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 1096(31.99%) 265512( 77.50) 200
children_of_watched 3080 164( 5.32%) 182070( 59.11) 7( 0.23%) 69443( 22.55) 100
coocs 3722 1301(34.95%) 254661( 68.42) 117( 3.14%) 139225( 37.41) 100
coocs2 3754 1413(37.64%) 274851( 73.22) 50( 1.33%) 64433( 17.16) 100
in_cluster_repo 4788 950(19.84%) 477770( 99.78) 122( 2.55%) 409969( 85.62) 100
in_cluster_user 4788 1344(28.07%) 478800( 100.00) 645(13.47%) 465372( 97.20) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 104( 2.17%) 40611( 8.48) 57
most_watched 4788 795(16.60%) 471046( 98.38) 21( 0.44%) 211197( 44.11) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1606(37.86%) 246959( 58.22) 120( 2.83%) 241763( 56.99) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3722 2103(56.50%) 40747 9915205(2663.95) 0.4322 0
coocs2 3754 2552(67.98%) 48276 24308132(6475.26) 0.2091 0
in_cluster_repo 4788 1300(27.15%) 0 2318617( 484.26) 0.0561 0
in_cluster_user 4788 1982(41.40%) 0 2391605( 499.50) 0.0829 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
==========
With 2000 candidates from in_cluster sources:
fake tst:
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 1096(31.99%) 265512( 77.50) 200
children_of_watched 3080 164( 5.32%) 182070( 59.11) 7( 0.23%) 69443( 22.55) 100
coocs 3722 1301(34.95%) 254661( 68.42) 111( 2.98%) 128361( 34.49) 100
coocs2 3754 1413(37.64%) 274851( 73.22) 46( 1.23%) 59735( 15.91) 100
in_cluster_repo 4788 981(20.49%) 477770( 99.78) 115( 2.40%) 413990( 86.46) 100
in_cluster_user 4788 1331(27.80%) 478800( 100.00) 653(13.64%) 466341( 97.40) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 103( 2.15%) 40590( 8.48) 57
most_watched 4788 795(16.60%) 471046( 98.38) 18( 0.38%) 190178( 39.72) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1606(37.86%) 246959( 58.22) 120( 2.83%) 241763( 56.99) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3722 2103(56.50%) 40747 9915205(2663.95) 0.4322 0
coocs2 3754 2552(67.98%) 48276 24308132(6475.26) 0.2091 0
in_cluster_repo 4788 1429(29.85%) 0 3268971( 682.74) 0.0437 0
in_cluster_user 4788 2548(53.22%) 0 8265274(1726.25) 0.0308 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
============
Before excluding parent features:
Fake test:
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
authored_by_me 593 516(87.02%) 1065( 1.80) 502(84.65%) 1049( 1.77) 100
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 594(17.34%) 264551( 77.22) 200
children_of_watched 3080 164( 5.32%) 182067( 59.11) 7( 0.23%) 69555( 22.58) 100
coocs 3722 1305(35.06%) 254646( 68.42) 112( 3.01%) 127102( 34.15) 100
coocs2 3754 1411(37.59%) 275172( 73.30) 45( 1.20%) 61047( 16.26) 100
in_cluster_repo 4788 1018(21.26%) 477770( 99.78) 122( 2.55%) 410999( 85.84) 100
in_cluster_user 4788 1323(27.63%) 478800( 100.00) 651(13.60%) 466407( 97.41) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 101( 2.11%) 40548( 8.47) 57
most_watched 4788 795(16.60%) 471046( 98.38) 18( 0.38%) 189649( 39.61) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1605(37.84%) 246959( 58.22) 120( 2.83%) 241771( 56.99) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
authored_by_me 593 516(87.02%) 0 1125( 1.90) 45.8667 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3722 2103(56.50%) 40747 9915205(2663.95) 0.4322 0
coocs2 3754 2552(67.98%) 48276 24308132(6475.26) 0.2091 0
in_cluster_repo 4788 1601(33.44%) 0 8134971(1699.03) 0.0197 0
in_cluster_user 4788 2548(53.22%) 0 8265274(1726.25) 0.0308 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
376.17user 1.48system 1:51.33elapsed 339%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+8outputs (0major+370947minor)pagefaults 0swaps
mv fake-results.txt~ fake-results.txt
tail -n20 fake-results.txt
fake test results:
total: real: 2955/4788 = 61.72% poss: 3733/4788 = 77.97% avg num: 391.9
non-zero scores:
total: real: 2955/4788 = 61.72% poss: 3733/4788 = 77.97% avg num: 391.9
Real test:
calculating test result... done.
fake test results:
total: real: 2506/4788 = 52.34% poss: 3548/4788 = 74.10% avg num: 411.9
non-zero scores:
total: real: 2506/4788 = 52.34% poss: 3548/4788 = 74.10% avg num: 411.9
generator fired corr
ancestors_of_watched 240 3( 1.25%) 261( 1.09) 3( 1.25%) 261( 1.09) 5
authored_by_me 394 352(89.34%) 481( 1.22) 344(87.31%) 473( 1.20) 16
by_watched_authors 3756 1241(33.04%) 316581( 84.29) 796(21.19%) 315736( 84.06) 200
children_of_watched 3490 211( 6.05%) 214060( 61.34) 5( 0.14%) 82093( 23.52) 100
coocs 4002 1574(39.33%) 291597( 72.86) 189( 4.72%) 139466( 34.85) 100
coocs2 4038 1646(40.76%) 314955( 78.00) 52( 1.29%) 71284( 17.65) 100
in_cluster_repo 4786 752(15.71%) 477649( 99.80) 113( 2.36%) 402405( 84.08) 100
in_cluster_user 4788 1162(24.27%) 478800( 100.00) 656(13.70%) 463469( 96.80) 100
in_id_range 4788 304( 6.35%) 43042( 8.99) 80( 1.67%) 42176( 8.81) 46
most_watched 4788 772(16.12%) 468814( 97.91) 20( 0.42%) 191206( 39.93) 100
parents_of_watched 2833 1135(40.06%) 3851( 1.36) 1135(40.06%) 3851( 1.36) 35
same_name 4340 1338(30.83%) 265694( 61.22) 155( 3.57%) 259963( 59.90) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 240 3( 1.25%) 39 300( 1.25) 14.0000 0
authored_by_me 394 352(89.34%) 0 481( 1.22) 73.1809 0
by_watched_authors 3756 1261(33.57%) 0 741156( 197.33) 0.1701 0
children_of_watched 3490 224( 6.42%) 5289 1402558( 401.88) 0.3931 0
coocs 4002 2496(62.37%) 50851 12703818(3174.37) 0.4199 0
coocs2 4038 2987(73.97%) 59633 30510866(7555.94) 0.2052 0
in_cluster_repo 4786 1132(23.65%) 0 8162619(1705.52) 0.0139 0
in_cluster_user 4788 2211(46.18%) 0 8146972(1701.54) 0.0271 0
in_id_range 4788 304( 6.35%) 1962 45004( 9.40) 5.0351 0
most_watched 4788 772(16.12%) 9986 478800( 100.00) 2.2469 0
parents_of_watched 2833 1136(40.10%) 4954 8825( 3.12) 69.0085 0
same_name 4340 1352(31.15%) 0 1794583( 413.50) 0.0753 0
So, the difference is in the parent repos: there are far more in the fake test than in the real test.
==============
After removing features:
Fake test:
calculating test result... done.
generator fired corr
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
authored_by_me 593 516(87.02%) 1065( 1.80) 502(84.65%) 1049( 1.77) 100
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 595(17.37%) 264547( 77.22) 200
children_of_watched 3080 164( 5.32%) 182067( 59.11) 6( 0.19%) 71499( 23.21) 100
coocs 3722 1303(35.01%) 254630( 68.41) 110( 2.96%) 124967( 33.58) 100
coocs2 3754 1414(37.67%) 275170( 73.30) 41( 1.09%) 59768( 15.92) 100
in_cluster_repo 4788 1012(21.14%) 477770( 99.78) 119( 2.49%) 409596( 85.55) 100
in_cluster_user 4788 1298(27.11%) 478800( 100.00) 651(13.60%) 466323( 97.39) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 102( 2.13%) 40546( 8.47) 57
most_watched 4788 795(16.60%) 471046( 98.38) 18( 0.38%) 186752( 39.00) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1602(37.77%) 246959( 58.22) 122( 2.88%) 241495( 56.93) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
authored_by_me 593 516(87.02%) 0 1125( 1.90) 45.8667 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3722 2103(56.50%) 40747 9915205(2663.95) 0.4322 0
coocs2 3754 2552(67.98%) 48276 24308132(6475.26) 0.2091 0
in_cluster_repo 4788 1601(33.44%) 0 8134971(1699.03) 0.0197 0
in_cluster_user 4788 2548(53.22%) 0 8265274(1726.25) 0.0308 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
376.66user 1.40system 1:49.34elapsed 345%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+8outputs (0major+417484minor)pagefaults 0swaps
mv fake-results.txt~ fake-results.txt
tail -n20 fake-results.txt
fake test results:
total: real: 2943/4788 = 61.47% poss: 3727/4788 = 77.84% avg num: 390.6
non-zero scores:
total: real: 2943/4788 = 61.47% poss: 3727/4788 = 77.84% avg num: 390.6
Real test:
calculating test result... done.
fake test results:
total: real: 2547/4788 = 53.20% poss: 3549/4788 = 74.12% avg num: 411.0
non-zero scores:
total: real: 2547/4788 = 53.20% poss: 3549/4788 = 74.12% avg num: 411.0
generator fired corr
ancestors_of_watched 240 3( 1.25%) 261( 1.09) 3( 1.25%) 261( 1.09) 5
authored_by_me 394 352(89.34%) 481( 1.22) 344(87.31%) 473( 1.20) 16
by_watched_authors 3756 1242(33.07%) 316581( 84.29) 797(21.22%) 315737( 84.06) 200
children_of_watched 3490 211( 6.05%) 214060( 61.34) 7( 0.20%) 83904( 24.04) 100
coocs 4002 1568(39.18%) 291581( 72.86) 181( 4.52%) 137421( 34.34) 100
coocs2 4038 1645(40.74%) 314946( 78.00) 55( 1.36%) 69929( 17.32) 100
in_cluster_repo 4786 753(15.73%) 477649( 99.80) 112( 2.34%) 400758( 83.74) 100
in_cluster_user 4788 1157(24.16%) 478800( 100.00) 656(13.70%) 463323( 96.77) 100
in_id_range 4788 304( 6.35%) 43042( 8.99) 80( 1.67%) 42177( 8.81) 46
most_watched 4788 772(16.12%) 468814( 97.91) 25( 0.52%) 190436( 39.77) 100
parents_of_watched 2833 1135(40.06%) 3851( 1.36) 1135(40.06%) 3851( 1.36) 35
same_name 4340 1334(30.74%) 265694( 61.22) 154( 3.55%) 259590( 59.81) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 240 3( 1.25%) 39 300( 1.25) 14.0000 0
authored_by_me 394 352(89.34%) 0 481( 1.22) 73.1809 0
by_watched_authors 3756 1261(33.57%) 0 741156( 197.33) 0.1701 0
children_of_watched 3490 224( 6.42%) 5289 1402558( 401.88) 0.3931 0
coocs 4002 2496(62.37%) 50851 12703818(3174.37) 0.4199 0
coocs2 4038 2987(73.97%) 59633 30510866(7555.94) 0.2052 0
in_cluster_repo 4786 1132(23.65%) 0 8162619(1705.52) 0.0139 0
in_cluster_user 4788 2211(46.18%) 0 8146972(1701.54) 0.0271 0
in_id_range 4788 304( 6.35%) 1962 45004( 9.40) 5.0351 0
most_watched 4788 772(16.12%) 9986 478800( 100.00) 2.2469 0
parents_of_watched 2833 1136(40.10%) 4954 8825( 3.12) 69.0085 0
same_name 4340 1352(31.15%) 0 1794583( 413.50) 0.0753 0
So it doesn't change much, except that the real test results get much better. Probably mostly noise.
=================
Added in collaborators:
Fake test:
generator fired corr
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
authored_by_collaborator 736 10( 1.36%) 28791( 39.12) 7( 0.95%) 28748( 39.06) 100
authored_by_me 593 516(87.02%) 1065( 1.80) 502(84.65%) 1049( 1.77) 100
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 570(16.64%) 258409( 75.43) 200
children_of_watched 3080 164( 5.32%) 182067( 59.11) 6( 0.19%) 71446( 23.20) 100
coocs 3722 1303(35.01%) 254630( 68.41) 109( 2.93%) 123165( 33.09) 100
coocs2 3754 1414(37.67%) 275170( 73.30) 39( 1.04%) 58973( 15.71) 100
in_cluster_repo 4788 1012(21.14%) 477770( 99.78) 116( 2.42%) 407635( 85.14) 100
in_cluster_user 4788 1298(27.11%) 478800( 100.00) 635(13.26%) 458018( 95.66) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 100( 2.09%) 40520( 8.46) 57
most_watched 4788 795(16.60%) 471046( 98.38) 18( 0.38%) 184785( 38.59) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1602(37.77%) 246959( 58.22) 120( 2.83%) 241164( 56.85) 100
watched_by_collaborator 736 379(51.49%) 34150( 46.40) 51( 6.93%) 31862( 43.29) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
authored_by_collaborator 736 13( 1.77%) 964 61098( 83.01) 1.5991 0
authored_by_me 593 516(87.02%) 0 1125( 1.90) 45.8667 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3722 2103(56.50%) 40747 9915205(2663.95) 0.4322 0
coocs2 3754 2552(67.98%) 48276 24308132(6475.26) 0.2091 0
in_cluster_repo 4788 1601(33.44%) 0 8134971(1699.03) 0.0197 0
in_cluster_user 4788 2548(53.22%) 0 8265274(1726.25) 0.0308 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
watched_by_collaborator 736 443(60.19%) 6161 127935( 173.82) 5.1620 0
376.84user 1.21system 1:49.07elapsed 346%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+8outputs (0major+410550minor)pagefaults 0swaps
mv fake-results.txt~ fake-results.txt
tail -n20 fake-results.txt
fake test results:
total: real: 2943/4788 = 61.47% poss: 3734/4788 = 77.99% avg num: 398.8
non-zero scores:
total: real: 2943/4788 = 61.47% poss: 3734/4788 = 77.99% avg num: 398.8
Real test:
calculating test result... done.
fake test results:
total: real: 2526/4788 = 52.76% poss: 3561/4788 = 74.37% avg num: 418.4
non-zero scores:
total: real: 2526/4788 = 52.76% poss: 3561/4788 = 74.37% avg num: 418.4
generator fired corr
ancestors_of_watched 240 3( 1.25%) 261( 1.09) 3( 1.25%) 261( 1.09) 5
authored_by_collaborator 642 11( 1.71%) 25982( 40.47) 10( 1.56%) 25933( 40.39) 100
authored_by_me 394 352(89.34%) 481( 1.22) 344(87.31%) 473( 1.20) 16
by_watched_authors 3756 1242(33.07%) 316581( 84.29) 764(20.34%) 308775( 82.21) 200
children_of_watched 3490 211( 6.05%) 214060( 61.34) 7( 0.20%) 83860( 24.03) 100
coocs 4002 1568(39.18%) 291581( 72.86) 172( 4.30%) 135716( 33.91) 100
coocs2 4038 1645(40.74%) 314946( 78.00) 55( 1.36%) 69232( 17.15) 100
in_cluster_repo 4786 753(15.73%) 477649( 99.80) 109( 2.28%) 398571( 83.28) 100
in_cluster_user 4788 1157(24.16%) 478800( 100.00) 645(13.47%) 454999( 95.03) 100
in_id_range 4788 304( 6.35%) 43042( 8.99) 80( 1.67%) 42126( 8.80) 46
most_watched 4788 772(16.12%) 468814( 97.91) 25( 0.52%) 188592( 39.39) 100
parents_of_watched 2833 1135(40.06%) 3851( 1.36) 1135(40.06%) 3851( 1.36) 35
same_name 4340 1334(30.74%) 265694( 61.22) 153( 3.53%) 259186( 59.72) 100
watched_by_collaborator 643 281(43.70%) 33859( 52.66) 59( 9.18%) 31701( 49.30) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 240 3( 1.25%) 39 300( 1.25) 14.0000 0
authored_by_collaborator 642 17( 2.65%) 2109 60879( 94.83) 3.4922 0
authored_by_me 394 352(89.34%) 0 481( 1.22) 73.1809 0
by_watched_authors 3756 1261(33.57%) 0 741156( 197.33) 0.1701 0
children_of_watched 3490 224( 6.42%) 5289 1402558( 401.88) 0.3931 0
coocs 4002 2496(62.37%) 50851 12703818(3174.37) 0.4199 0
coocs2 4038 2987(73.97%) 59633 30510866(7555.94) 0.2052 0
in_cluster_repo 4786 1132(23.65%) 0 8162619(1705.52) 0.0139 0
in_cluster_user 4788 2211(46.18%) 0 8146972(1701.54) 0.0271 0
in_id_range 4788 304( 6.35%) 1962 45004( 9.40) 5.0351 0
most_watched 4788 772(16.12%) 9986 478800( 100.00) 2.2469 0
parents_of_watched 2833 1136(40.10%) 4954 8825( 3.12) 69.0085 0
same_name 4340 1352(31.15%) 0 1794583( 413.50) 0.0753 0
watched_by_collaborator 643 345(53.65%) 8873 143180( 222.67) 6.4381 0
Didn't make much of a difference.
===============
Before using keywords for repo clustering (fake test):
calculating test result... done.
generator fired corr
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
authored_by_collaborator 736 10( 1.36%) 28791( 39.12) 7( 0.95%) 28748( 39.06) 100
authored_by_me 593 516(87.02%) 1065( 1.80) 502(84.65%) 1049( 1.77) 100
by_watched_authors 3426 1233(35.99%) 265847( 77.60) 570(16.64%) 258409( 75.43) 200
children_of_watched 3080 164( 5.32%) 182067( 59.11) 6( 0.19%) 71446( 23.20) 100
coocs 3722 1303(35.01%) 254630( 68.41) 109( 2.93%) 123174( 33.09) 100
coocs2 3754 1414(37.67%) 275170( 73.30) 41( 1.09%) 59156( 15.76) 100
in_cluster_repo 4788 1131(23.62%) 477770( 99.78) 120( 2.51%) 399104( 83.36) 100
in_cluster_user 4788 1298(27.11%) 478800( 100.00) 635(13.26%) 458018( 95.66) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 100( 2.09%) 40538( 8.47) 57
most_watched 4788 795(16.60%) 471046( 98.38) 18( 0.38%) 184785( 38.59) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1602(37.77%) 246959( 58.22) 120( 2.83%) 241164( 56.85) 100
watched_by_collaborator 736 379(51.49%) 34150( 46.40) 51( 6.93%) 31862( 43.29) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
authored_by_collaborator 736 13( 1.77%) 964 61098( 83.01) 1.5991 0
authored_by_me 593 516(87.02%) 0 1125( 1.90) 45.8667 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3722 2103(56.50%) 40747 9915205(2663.95) 0.4322 0
coocs2 3754 2552(67.98%) 48276 24308132(6475.26) 0.2091 0
in_cluster_repo 4788 1601(33.44%) 0 8134971(1699.03) 0.0197 0
in_cluster_user 4788 2548(53.22%) 0 8265274(1726.25) 0.0308 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
watched_by_collaborator 736 443(60.19%) 6161 127935( 173.82) 5.1620 0
After using keywords for repo clustering (fake test):
calculating test result... done.
generator fired corr
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
authored_by_collaborator 736 10( 1.36%) 28789( 39.12) 7( 0.95%) 28745( 39.06) 100
authored_by_me 593 516(87.02%) 1065( 1.80) 502(84.65%) 1049( 1.77) 100
by_watched_authors 3426 1232(35.96%) 265847( 77.60) 568(16.58%) 258384( 75.42) 200
children_of_watched 3080 164( 5.32%) 182055( 59.11) 7( 0.23%) 71533( 23.23) 100
coocs 3722 1301(34.95%) 254775( 68.45) 135( 3.63%) 127199( 34.17) 100
coocs2 3754 1422(37.88%) 273870( 72.95) 49( 1.31%) 57065( 15.20) 100
in_cluster_repo 4788 1502(31.37%) 478127( 99.86) 53( 1.11%) 394043( 82.30) 100
in_cluster_user 4788 1303(27.21%) 478800( 100.00) 638(13.32%) 458014( 95.66) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 102( 2.13%) 40537( 8.47) 57
most_watched 4788 795(16.60%) 471046( 98.38) 19( 0.40%) 183442( 38.31) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1601(37.74%) 246959( 58.22) 119( 2.81%) 241154( 56.85) 100
watched_by_collaborator 736 381(51.77%) 34150( 46.40) 52( 7.07%) 31868( 43.30) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
authored_by_collaborator 736 13( 1.77%) 964 61098( 83.01) 1.5991 0
authored_by_me 593 516(87.02%) 0 1125( 1.90) 45.8667 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3722 2103(56.50%) 40747 9915205(2663.95) 0.4322 0
coocs2 3754 2552(67.98%) 48276 24308132(6475.26) 0.2091 0
in_cluster_repo 4788 1862(38.89%) 0 7518897(1570.36) 0.0248 0
in_cluster_user 4788 2548(53.22%) 0 8265274(1726.25) 0.0308 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
watched_by_collaborator 736 443(60.19%) 6161 127935( 173.82) 5.1620 0
After using both keywords and users (fake test):
generator fired corr
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
authored_by_collaborator 736 10( 1.36%) 28788( 39.11) 7( 0.95%) 28745( 39.06) 100
authored_by_me 593 516(87.02%) 1065( 1.80) 502(84.65%) 1049( 1.77) 100
by_watched_authors 3426 1232(35.96%) 265847( 77.60) 569(16.61%) 258394( 75.42) 200
children_of_watched 3080 164( 5.32%) 182071( 59.11) 7( 0.23%) 71009( 23.05) 100
coocs 3722 1303(35.01%) 254721( 68.44) 115( 3.09%) 121391( 32.61) 100
coocs2 3754 1421(37.85%) 273895( 72.96) 45( 1.20%) 55120( 14.68) 100
in_cluster_repo 4788 1200(25.06%) 477979( 99.83) 87( 1.82%) 391902( 81.85) 100
in_cluster_user 4788 1303(27.21%) 478800( 100.00) 636(13.28%) 458018( 95.66) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 106( 2.21%) 40551( 8.47) 57
most_watched 4788 795(16.60%) 471046( 98.38) 18( 0.38%) 182315( 38.08) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1601(37.74%) 246959( 58.22) 119( 2.81%) 241161( 56.85) 100
watched_by_collaborator 736 380(51.63%) 34146( 46.39) 51( 6.93%) 31850( 43.27) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
authored_by_collaborator 736 13( 1.77%) 964 61098( 83.01) 1.5991 0
authored_by_me 593 516(87.02%) 0 1125( 1.90) 45.8667 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3722 2103(56.50%) 40747 9915205(2663.95) 0.4322 0
coocs2 3754 2552(67.98%) 48276 24308132(6475.26) 0.2091 0
in_cluster_repo 4788 1676(35.00%) 0 7321405(1529.12) 0.0229 0
in_cluster_user 4788 2548(53.22%) 0 8265274(1726.25) 0.0308 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
watched_by_collaborator 736 443(60.19%) 6161 127935( 173.82) 5.1620 0
393.50user 1.18system 1:57.96elapsed 334%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+8outputs (0major+383457minor)pagefaults 0swaps
mv fake-results.txt~ fake-results.txt
tail -n20 fake-results.txt
fake test results:
total: real: 2967/4788 = 61.97% poss: 3723/4788 = 77.76% avg num: 393.7
non-zero scores:
total: real: 2967/4788 = 61.97% poss: 3723/4788 = 77.76% avg num: 393.7
Looks better.
===============
Don't allow in_cluster to include results by the same author or with the same name
generator fired corr
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
authored_by_collaborator 736 10( 1.36%) 28788( 39.11) 7( 0.95%) 28745( 39.06) 100
authored_by_me 593 516(87.02%) 1065( 1.80) 502(84.65%) 1049( 1.77) 100
by_watched_authors 3426 1232(35.96%) 265847( 77.60) 569(16.61%) 258394( 75.42) 200
children_of_watched 3080 164( 5.32%) 182071( 59.11) 7( 0.23%) 71009( 23.05) 100
coocs 3722 1303(35.01%) 254721( 68.44) 115( 3.09%) 118068( 31.72) 100
coocs2 3754 1421(37.85%) 273895( 72.96) 43( 1.15%) 52906( 14.09) 100
in_cluster_repo 4788 338( 7.06%) 470245( 98.21) 118( 2.46%) 411393( 85.92) 100
in_cluster_user 4788 636(13.28%) 478800( 100.00) 615(12.84%) 470056( 98.17) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 107( 2.23%) 40555( 8.47) 57
most_watched 4788 795(16.60%) 471046( 98.38) 16( 0.33%) 163984( 34.25) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1601(37.74%) 246959( 58.22) 119( 2.81%) 241161( 56.85) 100
watched_by_collaborator 736 380(51.63%) 34146( 46.39) 51( 6.93%) 31850( 43.27) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
authored_by_collaborator 736 13( 1.77%) 964 61098( 83.01) 1.5991 0
authored_by_me 593 516(87.02%) 0 1125( 1.90) 45.8667 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3722 2103(56.50%) 40747 9915205(2663.95) 0.4322 0
coocs2 3754 2552(67.98%) 48276 24308132(6475.26) 0.2091 0
in_cluster_repo 4788 503(10.51%) 0 7229320(1509.88) 0.0070 0
in_cluster_user 4788 1186(24.77%) 0 8192664(1711.08) 0.0145 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
watched_by_collaborator 736 443(60.19%) 6161 127935( 173.82) 5.1620 0
385.13user 0.63system 3:24.60elapsed 188%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+8outputs (0major+324309minor)pagefaults 0swaps
mv fake-results.txt~ fake-results.txt
tail -n20 fake-results.txt
fake test results:
total: real: 2943/4788 = 61.47% poss: 3730/4788 = 77.90% avg num: 395.3
non-zero scores:
total: real: 2943/4788 = 61.47% poss: 3730/4788 = 77.90% avg num: 395.3
===============
After removing same name and same user from coocs (and repos only watched once):
ancestors_of_watched 246 3( 1.22%) 264( 1.07) 3( 1.22%) 264( 1.07) 4
authored_by_collaborator 736 10( 1.36%) 28788( 39.11) 7( 0.95%) 28745( 39.06) 100
authored_by_me 593 516(87.02%) 1065( 1.80) 502(84.65%) 1049( 1.77) 100
by_watched_authors 3426 1232(35.96%) 265847( 77.60) 569(16.61%) 258394( 75.42) 200
children_of_watched 3080 164( 5.32%) 182071( 59.11) 7( 0.23%) 71009( 23.05) 100
coocs 3296 575(17.45%) 264542( 80.26) 107( 3.25%) 123169( 37.37) 100
coocs2 3378 646(19.12%) 289417( 85.68) 38( 1.12%) 51336( 15.20) 100
in_cluster_repo 4788 338( 7.06%) 470245( 98.21) 118( 2.46%) 411393( 85.92) 100
in_cluster_user 4788 636(13.28%) 478800( 100.00) 615(12.84%) 470056( 98.17) 100
in_id_range 4788 340( 7.10%) 41247( 8.61) 107( 2.23%) 40555( 8.47) 57
most_watched 4788 795(16.60%) 471046( 98.38) 12( 0.25%) 150889( 31.51) 100
parents_of_watched 2971 1458(49.07%) 3448( 1.16) 1458(49.07%) 3448( 1.16) 16
same_name 4242 1601(37.74%) 246959( 58.22) 119( 2.81%) 241161( 56.85) 100
watched_by_collaborator 736 380(51.63%) 34146( 46.39) 51( 6.93%) 31850( 43.27) 100
candidate source nfired correct already nentries maxsz
ancestors_of_watched 246 3( 1.22%) 24 288( 1.17) 9.3750 0
authored_by_collaborator 736 13( 1.77%) 964 61098( 83.01) 1.5991 0
authored_by_me 593 516(87.02%) 0 1125( 1.90) 45.8667 0
by_watched_authors 3426 1245(36.34%) 0 562516( 164.19) 0.2213 0
children_of_watched 3080 175( 5.68%) 5072 1078146( 350.05) 0.4867 0
coocs 3296 1057(32.07%) 0 6174911(1873.46) 0.0171 0
coocs2 3378 1280(37.89%) 0 14879115(4404.71) 0.0086 0
in_cluster_repo 4788 503(10.51%) 0 7229320(1509.88) 0.0070 0
in_cluster_user 4788 1186(24.77%) 0 8192664(1711.08) 0.0145 0
in_id_range 4788 340( 7.10%) 2219 43466( 9.08) 5.8874 0
most_watched 4788 795(16.60%) 7754 478800( 100.00) 1.7855 0
parents_of_watched 2971 1458(49.07%) 4695 8143( 2.74) 75.5618 0
same_name 4242 1617(38.12%) 0 1470496( 346.65) 0.1100 0
watched_by_collaborator 736 443(60.19%) 6161 127935( 173.82) 5.1620 0
359.24user 1.30system 1:48.27elapsed 333%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+8outputs (0major+339664minor)pagefaults 0swaps
mv fake-results.txt~ fake-results.txt
tail -n20 fake-results.txt
fake test results:
total: real: 2934/4788 = 61.28% poss: 3713/4788 = 77.55% avg num: 393.3
non-zero scores:
total: real: 2934/4788 = 61.28% poss: 3713/4788 = 77.55% avg num: 393.3
**********
elapsed: elapsed: [380.20s cpu, 251392.0456 mticks, 94.06s wall]
done