-
Notifications
You must be signed in to change notification settings - Fork 14
/
RULES
34 lines (23 loc) · 1.03 KB
/
RULES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
RULE INPUT FORMATS
Rule types and formats (one rule per line in *.rules file)
GENERAL FORM
SampWeight-StepWeight-RuleType-[rule parameters]
SampWeight = rule weight for purposes of stochastic gradient term sampling
StepWeight = rule weight for purposes of stochastic gradient step size
RuleType = type of rule (seed, sentincl, sentexcl, cl, ml, ...)
[rule parameters] = rest of arguments vary depending on rule type
SPECIFIC RULES
[seed]
1-50-seed-2-0 = assign all occurrences of word 2 to topic 0
1-50-seed-2,7-0 = assign all occurrences of word 2 or 7 to topic 0
1-50-seed-2 7-0 = assign all occurrences of bigram 2 7 to topic 0
[sentincl]
100.0-100.0-sentincl-6-0 = allow topic 0 only if co-sentenence w/ topic 6
[sentexcl]
1.0-10.0-sentexcl-0-7 = dis-allow co-sentence topic 0 and topic 7
[cl]
1.0-10.0-cl-3-5 = word 3 and word 5 should not be assigned to the same topic
[ml]
1.0-10.0-ml-2-4 = word 2 and word 4 should be assigned to the same topic
[doc]
1.0-10.0-doc-1-0,1,2 = assign words in docs labeled 1 to topic 0,1, or 2