-
Notifications
You must be signed in to change notification settings - Fork 2
Concept: Sequence compression
TODO - defragment, document the python prototype with it, write script to pull out literate programming to *.md file and dump it here
one notate about "term complexity" in NAR(S) / NAL, "term complexity" was in all to me known implementations a integer but maybe a real number makes a lot of sense
then it would directly map to bits (how many bits does it need to store the information in a compressed way) in a information theoretic sense
but I've used a "strange"(as unusual, didn't see it in any implementation or theory) "virtual" complexity "measure"(I don't find a better word for it, it's not a real measure) as a real number which counts how much information was conveyed over time
mainly for AIKR reasons
I let it decay, it would be equal to the counter of observations if no decay was necessary to put it under AIK
the more often it was observed the less bits it would need to encode
but it's not possible to store all counters for all encounted (sub)sequences, so some decay is necessary
and removing of the items with the lowest counts as usual
(note here that it's different from the budget of a concept too)
budget tells how useful the item is expected to be to the future system
while this discussed "virtual" complexity measure how much information is conveyed by the sequence
about the compression for perception
lets say the agent perceives the events A IS C and laters A B IS D, how should it perceive the first sequence(A, IS, C) as something more general and the later sequence (A, B, IS, D) as something more general. There is clearly a deeper structure (#0, IS, #2) but how to uncover it and what are the reasons for the conclusion.
One way I am and need to experiment with is the following:
a) build possible perceptions as sequences with sub-sequences of the original sequence and introduce variables to let it generalize structures
b) decide on how the original sequence is perceived only based on how often the "view" of the sequence with vars was perceived.
so first perception of sequence (A, IS, C) leads to the following candidates (cnt = absolute observation count over lifetime of agent)
(A, IS, C) cnt=1
(#0, IS, C) cnt=1
(A, IS, #2) cnt=1
(#0, IS, #2) cnt=1
(#0, #1, C) cnt=1
etc.
so after perciving (A, IS, C) it can also be perceived as either of these candidates, just because the counters have all the same value
but what happens when (A, B, IS, D) is later perceived?
then it builds also a possible "unifcation" with subsequences (#0, IS, #2) and bumps the counter to
(#0, IS, #2) cnt=2
it then can clearly perceive n (A, B, IS, D) as ((A, B), IS, D) because (#0, IS, #2) has the highest counter (2) of all fitting known variants
the same selection can't be made if only complexity is considered among the candidates
only the information theoretic aspects matter for this compression, (candidates with a higher cnt encode something which needs less bits than something which is encoded with something with a lower cnt)
while the "term complexity" ignores this consideration completely
- runs under AIKR
- latency to compress is relativly low on current machines
- finds good enough compressions for current small scale experiments
Universal Artificial Intelligence - Hutters assumption is that "better compressors lead to AGI", related to this work because this is a compressor with a information theoretic motivation