Skip to content

Commit

Permalink
Future Work minor updates (#126)
Browse files Browse the repository at this point in the history
  • Loading branch information
nielsleadholm authored Jan 21, 2025
1 parent d089d92 commit 6093d26
Show file tree
Hide file tree
Showing 15 changed files with 64 additions and 20 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ In Monty systems, low-level LMs project to high-level LMs, where this projection

For example, a high-level LM of a dinner-set might have learned that the fork is present at a particular location in its internal reference frame. When at that location, it would therefore predict that the low-level LM should be sensing a fork, enabling the perception of a fork in the low-level LM even when there is a degree of noise or other source of uncertainty in the low-level LM's representation.

In the brain, these top-down projections correspond to L6 to L1 connections, where the synapses at L1 would support predictions about object ID. However, these projections also form local synapses en-route through the L6 layer of the lower-level cortical column. In a Monty LM, this would correspond to the top-down connection predicting not just the object that the low-level LM should be sensing, but also the specific location that it should be sensing it at. This could be complemented with predicting a particular pose of the low-level object (see [Use Better Priors for Hypothesis Initialization](../learning-module-improvements/use-better-priors-for-hypothesis-initialization.md)).
In the brain, these top-down projections correspond to L6 to L1 connections, where the synapses at L1 would support predictions about object ID. However, these projections also form local synapses en-route through the L6 layer of the lower-level cortical column. In a Monty LM, this would correspond to the top-down connection predicting not just the object that the low-level LM should be sensing, but also the specific location that it should be sensing it at. This could be complemented with predicting a particular pose of the low-level object (see [Use Better Priors for Hypothesis Initialization](../learning-module-improvements/use-better-priors-for-hypothesis-initialization.md)).

This location-specific association in both models is key to how we believe compositional objects are represented. For example, if you had a coffee mug with a logo on it, that logo might make an (unusual) 90 degree bend half-way along its length. This could be learned by associating the logo with the mug multiple times, where different locations in the logo space, as well as different poses of the logo, would be associated with the logo depending on the location on the mugs surface.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Figure out Performance Measures and Supervision in Heterarchy
---
As we introduce hierarchy and compositional objects, such as a dinner-table setting, we need to figure out both how to measure the performance of the system, and how to supervise the learning. For the latter, we might choose to train the system on component objects in isolation (a fork, a knife, etc.) before then showing Monty the full compositional object (the dinner-table setting). When evaluating performance, we might then see how well the system retrieves representations at different levels of the hierarchy. However, in the more core setting of unsupervised learning, representations of the sub-objects would likely also emerge at the high level (a coarse knife representation, etc.), while we may also find some representations of the dinner scene in low-level LMs. Deciding then how we measure performance will be more difficult.
As we introduce hierarchy and compositional objects, such as a mug with a logo on it, we need to figure out both how to measure the performance of the system, and how to supervise the learning. For the latter, we might choose to train the system on component objects in isolation (various logos, mugs, bowls, etc.) before then showing Monty the full compositional object (a mug with a logo on it). When evaluating performance, we might then see how well the system retrieves representations at different levels of the hierarchy. However, in the more core setting of unsupervised learning, representations of the sub-objects would likely also emerge at the high level (a coarse logo representation, etc.), while we may also find some representations of the mug in low-level LMs. Deciding then how we measure performance will be more difficult.

When we move to objects with less obvious composition (i.e. where the sub-objects must be disentangled in a fully unsupervised manner), representations will emerge at different levels of the system that may not correspond to any labels present in our datasets. For example, handles, or the head of a spoon, may emerge as object-representations in low-level LMs, even though the dataset only recognizes labels like "mug" and "spoon".

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@
title: Make Dataset to Test Compositional Objects
---

We have developed an initial dataset based on recognizing a variety of dinner table sets with different arrangements of plates and cutlery. For example, the objects can be arranged in a normal setting, or aligned in a row (i.e. not a typical dinner-table setting). Similarly, the component objects can be those of a modern dining table, or those from a "medieval" time-period. As such, this dataset can be used to test the ability of Monty systems to recognize compositional objects based on the specific arrangement of objects, and to test generalization to novel compositions.
To test compositional objects, we would like to develop a minimal dataset based on common objects (such as mugs and bowls) with logos on their surfaces. This will enable us to learn on the component objects in isolation, while moving towards a more realistic setting where the component objects must be disentangled from one another. The logo-on-surface setup also enables exploring interesting challenges of object distortion, and learning multiple location-specific associations, such as when a logo has a 90 degree bend half-way along its length.

By using explicit objects to compose multi-part objects, this dataset has the advantage that we can learn on the component objects in isolation, using supervised learning signals if necessary. It's worth noting that this is often how learning of complex compositional objects takes place in humans. For example, when learning to read, children begin by learning individual letters, which are themselves composed of a variety of strokes. Only when letters are learned can they learn to combine them into words. More generally, disentangling an object from other objects is difficult without the ability to interact with it, or see it in a sufficient range of contexts that it's separation from other objects becomes clear.
It's worth noting that observing objects and sub-objects in isolation is often how compositional objects are learned in humans. For example, when learning to read, children begin by learning individual letters, which are themselves composed of a variety of strokes. Only when letters are learned can they learn to combine them into words. More generally, disentangling an object from other objects is difficult without the ability to interact with it, or see it in a sufficient range of contexts that its separation from other objects becomes clear.

However, we would eventually expect compositional objects to be learned in an unsupervised manner. When this is consistently possible, we can consider more diverse datasets where the component objects may not be as explicit. At that time, the challenges described in [Figure out Performance Measure and Supervision in Heterarchy](../cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md) will become more relevant.
We would eventually expect compositional objects to be learned in an unsupervised manner, such as that a wing on a bird is a sub-object, even though it may never have been observed in isolation. When this is consistently possible, we can consider more diverse datasets where the component objects may not be as explicit. At that time, the challenges described in [Figure out Performance Measure and Supervision in Heterarchy](../cmp-hierarchy-improvements/figure-out-performance-measure-and-supervision-in-heterarchy.md) will become more relevant.

In the future, we will move towards policies that change the state of the world. At this time, an additional dataset that may prove useful is a "dinner-table setting" with different arrangements of plates and cutlery. For example, the objects can be arranged in a normal setting, or aligned in a row (i.e. not a typical dinner-table setting). Similarly, the component objects can be those of a modern dining table, or those from a "medieval" time-period. As such, this dataset can be used to test the ability of Monty systems to recognize compositional objects based on the specific arrangement of objects, and to test generalization to novel compositions. Because of the nature of the objects, they can also be re-arranged in a variety of ways, which will enable testing policies that change the state of the world.

![Dinner table set](../../figures/future-work/dinner_variations_standard.png)
*Example of compositional objects made up of modern cutlery and plates.*
Expand Down
2 changes: 1 addition & 1 deletion docs/future-work/learning-module-improvements.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ We have a guide on [customizing learning modules](learning-module-improvements/c
These are the things we would like to implement:

- [Use off-object observations](learning-module-improvements/use-off-object-observations.md) #numsteps #multiobj
- [Reinitialize hypotheses when starting to recognize a new object](learning-module-improvements/reinitialize-hypotheses-when-starting-to-recognize-a-new-object.md) #multiobj
- [Implement and test rapid evidence decay as form of unsupervised memory resetting](learning-module-improvements/implement-and-test-rapid-evidence-decay-as-form-of-unsupervised-memory-resetting.md) #multiobj
- [Improve bounded evidence performance](learning-module-improvements/improve-bounded-evidence-performance.md) #multiobj
- [Use models with fewer points](learning-module-improvements/use-models-with-fewer-points.md) #speed #generalization
- [Make it possible to store multiple feature maps on one graph](learning-module-improvements/make-it-possible-to-store-multiple-feature-maps-on-one-graph.md) #featsandmorph
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: Implement and Test Rapid Evidence Decay as Form of Unsupervised Memory Resetting
---

In a natural, unsupervised setting, the object that a Monty system is observing will change from time to time. Currently, Monty's internal representations of objects is only explicitly reset by the experimenter, for example at the start of an episode of training.

We would like to see if we can achieve reasonable performance when objects change (including during learning) by having a shorter memory horizon that rapidly decays. Assuming policies are sufficiently efficient in their exploration of objects, this should enable us to effectively determine whether we are still on the same object, on a different (but known) object, or on an entirely new object. This can subsequently inform changes such as switching to an exploration-focused policy (see [Implement Switching Between Learning and Inference-Focused Policies](../motor-system-improvements/implement-switching-between-learning-and-inference-focused-policies.md)).

Note that we already have the `past_weight` and `present_weight` parameters, which can be used for this approach. As such, the main task is to set up experiments where objects are switched out without resetting the LMs evidence values, and then evaluate the performance of the system.

If this fails to achieve the results we hope for, we might add a mechanism to explicitly reset evidence values when an LM believes it has moved on to a new object. In particular, we have implemented a method to detect when we have moved on to a new object based on significant changes in the accumulated evidence values for hypotheses. Integrating this method into the LMs is still in progress, but once complete, we would like to complement it with a process to reinitialize the evidence scores within the learning module. That way, when the LM detects it is on a new object, it can cleanly estimate what this new object might be.

Eventually this could be complemented with top-down feedback from a higher-level LM modeling a scene or compositional object. In this case, the high-level LM biases the evidence values initialized in the low-level LM, based on what object should be present there according to the higher-level LM's model. Improvements here could also interact with the tasks of [Re-Anchor Hypotheses](../learning-module-improvements/re-anchor-hypotheses.md), and [Use Better Priors for Hypothesis Initialization](../learning-module-improvements/use-better-priors-for-hypothesis-initialization.md).

This file was deleted.

4 changes: 3 additions & 1 deletion docs/future-work/motor-system-improvements.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,13 @@ These are the things we would like to implement:
- [Interpret goal states in motor system & switch policies](motor-system-improvements/interpret-goal-states-in-motor-system-switch-policies.md) #goalpolicy
- [Implement switching between learning and inference-focused policies](motor-system-improvements/implement-switching-between-learning-and-inference-focused-policies.md) #learning
- [Bottom-up exploration policy for surface agent](motor-system-improvements/bottom-up-exploration-policy-for-surface-agent.md) #learning
- [Top-down exploration policy](motor-system-improvements/top-down-exploration-policy.md) #learning #numsteps
- [Model-based exploration policy](motor-system-improvements/model-based-exploration-policy.md) #learning #numsteps
- [Implement efficient saccades driven by model-free and model-based signals](motor-system-improvements/implement-efficient-saccades-driven-by-model-free-and-model-based-signals.md) #numsteps #multiobj
- [Learn policy using RL and simplified action space](motor-system-improvements/learn-policy-using-rl.md) #numsteps #speed
- [Decompose goals into subgoals & communicate](motor-system-improvements/decompose-goals-into-subgoals-communicate.md) #goalpolicy
- [Reuse hypothesis testing policy target points](motor-system-improvements/reuse-hypothesis-testing-policy-target-points.md) #goalpolicy #numsteps
- [Implement a simple cross-modal policy](motor-system-improvements/implement-a-simple-cross-modal-policy-for-sensory-guidance.md) #learning #multiobj #goalpolicy #numsteps
- [Model-based policy to recognize an object before moving onto a new object](motor-system-improvements/model-based-policy-to-recognize-an-object-before-moving-on-to-a-new-object.md) #multiobj #compositional
- [Policy to quickly move to a new object](motor-system-improvements/policy-to-quickly-move-to-a-new-object.md) #speed #multiobj #compositional

Please see the [instructions here](project-roadmap.md#how-you-can-contribute) if you would like to tackle one of these tasks.
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ title: Bottom-Up Exploration Policy for Surface Agent

For the distant agent, we have a policy specifically tailored to learning, the naive scan policy, which systematically explores the visible surface of an object. We would like a similar policy for the surface agent that systematically spirals or scans across the surface of an object, at least in a local area.

This would likely be complemented by [Top-Down Exploration Policies](top-down-exploration-policy.md).
This would likely be complemented by [Model-Based Exploration Policies](model-based-exploration-policy.md).
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,8 @@ In the model-free case, salient information available in the view-finder could d
In the model-based case, two primary settings should be considered:
- A single LM has determined that the agent should move to a particular location in order to test a hypothesis, and it sends a goal-state that can be satisfied with a saccade, rather than the entire agent jumping/teleporting to a new location. For example, saccading to where the handle of a mug is believed to be will refute or confirm the current hypothesis. This is the more important/immediate use case.
- Multiple LMs are present, including a smaller subset of more peripheral LMs. If one of these peripheral LMs observes something of interest, it can direct a goal-state to the motor system to perform a saccade such that a dense sub-set of LMs are able to visualize the object. This is analogous to cortical feedback bringing the fovea to an area of interest.

Such policies are particularly important in an unsupervised setting, where we will want to more efficiently explore objects in order to rapidly determine their identity, given we have no supervised signal to tell us whether this is a familiar object, or an entirely new one. This will be compounded by the fact that [evidence for objects will rapidly decay](../learning-module-improvements/implement-and-test-rapid-evidence-decay-as-form-of-unsupervised-memory-resetting) in order to better support the unsupervised setting.

Unlike the hypothesis-testing policy, we would specifically like to implement these as a saccade policy for the distant agent (i.e. no translation of the agent, only rotation), as this is a step towards an agent that can sample efficiently in the real world without having to physically translate sensors through space. Ideally, this would be implemented via a unified mechanism of specifying a location in e.g. ego-centric 3D space, and the policy determining the necessary rotation to focus at that point in space.

Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ Currently, a Monty system cannot flexibly switch between a learning-focused poli

This would be a specific example of a more general mechanism for switching between different policies, as discussed in [Switching Policies via Goal States](interpret-goal-states-in-motor-system-switch-policies.md).

Similarly, an LM should be able to determine the most appropriate *model-based* policies to initialize, such as the hypothesis-testing policy vs. a [top-down exploration policy](top-down-exploration-policy.md).
Similarly, an LM should be able to determine the most appropriate *model-based* policies to initialize, such as the hypothesis-testing policy vs. a [model-based exploration policy](model-based-exploration-policy.md).
Loading

0 comments on commit 6093d26

Please sign in to comment.