-
Notifications
You must be signed in to change notification settings - Fork 12
Home
Welcome to the XCSF wiki! Please feel free to help improve these pages.
XCSF is rule-based and maintains a population of classifiers where each classifier cl consists of:
- a condition component cl.C that determines whether the rule matches input x
- an action component cl.A that selects an action a to be performed for a given x
- a prediction component cl.P that computes the expected payoff for performing a upon receipt of x
XCSF thus generates rules of the general form:
IF matches ← cl.C(x) THEN perform action a ← cl.A(x) and EXPECT payoff p ← cl.P(x).
For each step within a learning trial, XCSF constructs a match set [M] composed of classifiers in the population set [P] whose cl.C matches x. If [M] contains fewer than θmna actions, a covering mechanism generates classifiers with matching cl.C and random cl.A.
For each possible action ak in [M], XCSF estimates the expected payoff by computing the fitness-weighted average prediction P(ak). That is, for each action ak and classifier prediction pj in [M], the system prediction Pk = ∑j Fjpj / ∑jFj.
A system action is then randomly or probabilistically selected during exploration, and the highest payoff action Pk used during exploitation. Classifiers in [M] advocating the chosen action are subsequently used to construct an action set [A]. The action is then performed and a scalar reward r ∈ ℜ received, along with the next sensory input.
Upon reaching a terminal state within the environment (as is always the case in single-step problems), each classifier clj ∈ [A] has its experience incremented and fitness, error, and set size as updated using the Widrow-Hoff delta rule with learning rate β ∈ [0,1] as follows.
- Error: εj ← εj + β (| r − pj | − εj)
- Accuracy: κj =
- 1 if εj < ε0
-
α ( εj / ε0 )−ν otherwise.
With target error threshold ε0 and accuracy offset α ∈ [0,1], and slope ν.
- Relative accuracy: κj' = (κj · numj) / ∑j κj · numj
Where classifiers are initialised with numerosity num = 1. - Fitness: Fj ← Fj + β(κj' − Fj)
- Set size estimate: asj ← asj + β(|[A]| − asj)
Thereafter, cl.C, cl.A, and cl.P are updated according to the representation adopted.
The evolutionary algorithm (EA) is applied to classifiers within [A] if the average time since its previous execution exceeds θEA. Upon invocation, the time stamp ts of each classifier is updated. Two parents are chosen based on their fitness via roulette wheel selection (or tournament) and λ number of offspring are created via crossover with probability χ and mutation with probability μ.
Offspring parameters are initialised by setting the error and fitness to the parental average, and discounted by reduction parameters for error εR and fitness FR. Offspring exp and num are set to one. If subsumption is enabled and the offspring are subsumed by either parent with sufficient accuracy (εj < ε0) and experience (expj > θsub) it is not included in [P]; instead the parents' num is incremented.
The resulting offspring are added to [P] and the maximum (micro-classifier) population size N is enforced by removing classifiers selected via roulette with the deletion vote.
The deletion vote is set proportionally to the set size estimate as. However, the vote is increased by a factor F̅ / Fj for classifiers that are sufficiently experienced (expj > θdel) and with small fitness (Fj < δF̅) where F̅ is the [P] mean fitness, and typically δ = 0.1. Classifiers selected for deletion have their (micro-classifier) num decremented, and in the event that num < 1, are removed from [P].
In multi-step problems, the previous action set [A]-1 is updated after each step using a γ ∈ [0,1] discounted reward (similar to Q-learning) and the EA may be run therein.
A schematic illustration of XCSF is shown above. For supervised learning, a single (dummy) action can be used such that [A] = [M] and the system prediction made directly accessible to the environment.
A number of interacting pressures have been identified. A set pressure provides more frequent reproduction opportunities for more general rules. In opposition is a fitness pressure which represses the reproduction of inaccurate and over-general rules. Many forms of cl.C, cl.A, and cl.P have been used for classifier knowledge representation since the original ternary conditions, integer actions, and scalar predictions. Some of these are implemented here.
Related Literature:
- S. W. Wilson (1995) "Classifier fitness based on accuracy" https://doi.org/10.1162/evco.1995.3.2.149
- S. W. Wilson (2001) "Function approximation with a classifier system"
- S. W. Wilson (2002) "Classifiers that approximate functions" https://doi.org/10.1023/A:1016535925043
- M. V. Butz (2006) "Rule-based evolutionary online learning systems" https://doi.org/10.1007/b104669
- R. J. Urbanowicz and J. H. Moore (2009) "Learning classifier systems: A complete introduction, review, and roadmap" https://doi.org/10.1155/2009/736398
- L. Bull (2015) "A brief history of learning classifier systems: From CS-1 to XCS and its variants" https://doi.org/10.1007/s12065-015-0125-y
- R. J. Urbanowicz and W. N. Browne (2017) "Introduction to learning classifier systems" https://doi.org/10.1007/978-3-662-55007-6
This project implements both supervised learning via the updating of match set predictions directly and reinforcement learning via the updating of action set predictions with an environment reward. All mutation rates self-adapted.
See Python Library Usage.
- Always matching dummy conditions
- Hyperrectangles
- Hyperellipsoids
- Neural networks
- GP trees
- Dynamical GP graphs
- Ternary bitstrings (binarises inputs)
- Both conditions and actions in single dynamical GP graphs
- Both conditions and actions in single neural networks
- Integers
- Neural networks
- Piece-wise constant
- Linear least squares
- Quadratic least squares
- Linear recursive least squares
- Quadratic recursive least squares
- Stochastic gradient descent neural networks
This project is released under the terms of the GNU General Public License v3.0 (GPLv3).