Add vectorized figures (#53)

thousandbrainsproject · Nov 22, 2024 · 0d33c9c · 0d33c9c
1 parent 70d610a
commit 0d33c9c
Show file tree

Hide file tree

Showing 30 changed files with 5 additions and 5 deletions.
diff --git a/docs/figures/how-monty-works/agent_vs_sensors_w_labels.png b/docs/figures/how-monty-works/agent_vs_sensors_w_labels.png
diff --git a/docs/figures/how-monty-works/evidence_init_all.png b/docs/figures/how-monty-works/evidence_init_all.png
diff --git a/docs/figures/how-monty-works/evidence_update1.png b/docs/figures/how-monty-works/evidence_update1.png
diff --git a/docs/figures/how-monty-works/evidence_update2.png b/docs/figures/how-monty-works/evidence_update2.png
diff --git a/docs/figures/how-monty-works/example_movement.png b/docs/figures/how-monty-works/example_movement.png
diff --git a/docs/figures/how-monty-works/five_lm_monty.png b/docs/figures/how-monty-works/five_lm_monty.png
diff --git a/docs/figures/how-monty-works/five_lm_views.jpg b/docs/figures/how-monty-works/five_lm_views.jpg
diff --git a/docs/figures/how-monty-works/five_lm_views.png b/docs/figures/how-monty-works/five_lm_views.png
diff --git a/docs/figures/how-monty-works/full_graph.png b/docs/figures/how-monty-works/full_graph.png
diff --git a/docs/figures/how-monty-works/learn_from_scratch.png b/docs/figures/how-monty-works/learn_from_scratch.png
diff --git a/docs/figures/how-monty-works/observations_w_labels.png b/docs/figures/how-monty-works/observations_w_labels.png
diff --git a/docs/figures/how-monty-works/overview_diagram.png b/docs/figures/how-monty-works/overview_diagram.png
diff --git a/docs/figures/how-monty-works/policies.png b/docs/figures/how-monty-works/policies.png
diff --git a/docs/figures/how-monty-works/possible_matches.png b/docs/figures/how-monty-works/possible_matches.png
diff --git a/docs/figures/how-monty-works/search_radius.png b/docs/figures/how-monty-works/search_radius.png
diff --git a/docs/figures/how-monty-works/step_episode_epoch.png b/docs/figures/how-monty-works/step_episode_epoch.png
diff --git a/docs/figures/how-monty-works/touch_vs_vision.png b/docs/figures/how-monty-works/touch_vs_vision.png
diff --git a/docs/figures/how-monty-works/voting.png b/docs/figures/how-monty-works/voting.png
diff --git a/docs/figures/overview/cc_voting.png b/docs/figures/overview/cc_voting.png
diff --git a/docs/figures/overview/cortical_columns_lm.png b/docs/figures/overview/cortical_columns_lm.png
diff --git a/docs/figures/overview/overview_diagram.png b/docs/figures/overview/overview_diagram.png
diff --git a/docs/figures/overview/reference_frames.png b/docs/figures/overview/reference_frames.png
diff --git a/docs/figures/overview/s_mand_lm.png b/docs/figures/overview/s_mand_lm.png
diff --git a/docs/figures/overview/scaling.png b/docs/figures/overview/scaling.png
diff --git a/docs/figures/overview/sm_and_lm.png b/docs/figures/overview/sm_and_lm.png
diff --git a/docs/how-monty-works/environment-agent.md b/docs/how-monty-works/environment-agent.md
@@ -5,7 +5,7 @@ The 3D environment used for most experiments is **Habitat**, wrapped into a `Emb
 
 The environment is currently initialized with one agent that has N sensors attached to it using the `PatchAndViewFinderMountConfig`. This config by default has two sensors. The first sensor is the **sensor patch** which will be used for learning. It is a **camera, zoomed in 10x** such that it only perceives a small patch of the environment. The second sensor is the view-finder which is at the same location as the patch and moves together with it but its camera is not zoomed in. This one is only used at the beginning of an episode to get a good view of the object (more details in the policy section) and for visualization, but not for learning or inference. The agent setup can also be customized to use more than one sensor patch (such as in `TwoLMMontyConfig` or `FiveLMMontyConfig`, see figure below). The configs also specify the type of sensor used, the features that are being extracted, and the motor policy used by the agent.
 
-![Example of six sensors in Habitat. The view-finder is not connected to any sensor or learning module and is only used by the policy at the beginning of an episode and for visualization. Each patch connects to one sensor module.](../figures/how-monty-works/five_lm_views.jpg)
+![Example of six sensors in Habitat. The view-finder is not connected to any sensor or learning module and is only used by the policy at the beginning of an episode and for visualization. Each patch connects to one sensor module.](../figures/how-monty-works/five_lm_views.png)
 
 
 Generally, one can also initialize multiple agents and connect them to the same Monty model but there is currently no policy to coordinate them. The difference between adding more agents vs. adding more sensors to the same agent is that **all sensors connected to one agent have to move together** (like neighboring patches on the retina) while **separate agents can move independently** like fingers on a hand (see figure below).

diff --git a/docs/how-monty-works/how-learning-modules-work.md b/docs/how-monty-works/how-learning-modules-work.md
@@ -15,7 +15,7 @@ There are two modes the learning module could be in: **training** and **evaluati
 
 The training mode is split into two phases that alternate: The matching phase and the exploration phase. During the **matching phase** the module tries to determine the object ID and pose from a series of observations and actions. This is the same as in evaluation. After a terminal condition is met (object recognized or no possible match found) the module goes into the **exploration phase**. This phase continues to collect observations and adds them into the buffer the same way as during the previous phase, only the matching step is skipped. The exploration phase is used to add more information to the graph memory at the end of an episode. For example, the matching procedure could be done after three steps telling us that the past three observations are not consistent with any models in memory. Therefore we would want to store a new graph in memory but a graph made of only three observations is not very useful. Hence, we keep moving for `num_exploratory_steps` to collect more information about this object before adding it to memory. This is not necessary during evaluation since we do not update our models then.
 
-![First two episodes (separated by a vertical double line) during learning. After we recognize an object (matching phase) we can explore the object further to collect new information about it (green lines). This information can then be added to the model of the object in memory. The top row shows the agent’s movements during the episodes. The bottom row shows the models in memory. As we are learning from scratch, we have no model in memory during the first episode.](../figures/how-monty-works/learn_from_scratch.png)
+![First two episodes (separated by a vertical double line) during learning. After we recognize an object (matching phase) we can explore the object further to collect new information about it (pink lines). This information can then be added to the model of the object in memory. The top row shows the agent’s movements during the episodes. The bottom row shows the models in memory. As we are learning from scratch, we have no model in memory during the first episode.](../figures/how-monty-works/learn_from_scratch.png)
 
 
 <br />

diff --git a/docs/how-monty-works/observations-transforms-sensor-modules.md b/docs/how-monty-works/observations-transforms-sensor-modules.md
@@ -23,7 +23,7 @@ The Cortical Messaging Protocol is defined in the State class. The output of eve
 
 The State class is quite general and depending who outputs it, it can be interpreted in different ways. As output of the sensor module, it can be seen as the observed state. When output by the learning module it can be interpreted as the hypothesized or most likely state. When it is the motor output of the LM it can be seen as a goal state (for instance specifying the desired location and orientation of a sensor or object in the world). Lastly, when sent as lateral votes between LMs we send a list of state class instances which can be interpreted as all possible states (where states do not contain non-morphological, modality-specific features but only pose information associated with object IDs).
 
-![Observation processing into Cortical Messaging Protocol. The sensor patch comprises a small area of the object (yellow square) and if the sensor is a camera it returns an RGBD image. We apply a transform to this image which calculates the x, y, z locations relative to the agent’s body for each pixel using the depth values and the sensor location. From these points in space, the sensor module then calculates the point normal and principal curvature directions at the center point of the patch (pose). Additionally, the sensor module can extract pose-independent features such as color and the magnitude of curvature. The pose (location + point normal and curvature direction) and features make up the observation at time step t and are the output of the sensor module.](../figures/how-monty-works/observations_w_labels.png)
+![Observation processing into Cortical Messaging Protocol. The sensor patch comprises a small area of the object (blue square) and if the sensor is a camera it returns an RGBD image. We apply a transform to this image which calculates the x, y, z locations relative to the agent’s body for each pixel using the depth values and the sensor location. From these points in space, the sensor module then calculates the point normal and principal curvature directions at the center point of the patch (pose). Additionally, the sensor module can extract pose-independent features such as color and the magnitude of curvature. The pose (location + point normal and curvature direction) and features make up the observation at time step t and are the output of the sensor module.](../figures/how-monty-works/observations_w_labels.png)
 
 ## Point Normals and Principle Curvatures
 [Point Normals and Principle Curvatures](https://res.cloudinary.com/dtnazefys/video/upload/v1731342526/point_normal.mp4)

diff --git a/docs/overview/architecture-overview/bringing-it-together.md b/docs/overview/architecture-overview/bringing-it-together.md
@@ -5,4 +5,4 @@ To consolidate these concepts, please see figure below) for a potential instanti
 
 While this hopefully makes the key concepts described above more concrete and tangible, keep in mind that this is just one way in which the architecture can be instantiated. By design, the Platform can be applied to any application that involves sensing and active interaction with an environment. Indeed, this might include more abstract examples such as browsing the web, or interacting with the instruments that control a scientific experiment.
 
-![High-level overview the architecture with all the main conceptual components applied to a concrete example. Green lines indicate the main flow of information up the hierarchy. Purple lines show top-down connections, biasing the lower-level learning modules. Light blue lines show lateral voting connections. Red lines show the communication of goal states which eventually translate into motor commands in the motor system. Every LM has a direct motor output. Information communicated along solid lines follows the CMP (contains features and pose). Discontinuations in the diagram are marked with dots on line-ends.  Dashed lines are the interface of the system with the world and subcortical compute units and do not need to follow the CMP. Green dashed lines communicate raw sensory input from sensors. Red dashed lines communicate motor commands to the actuators. The dark red dashed lines send sensory information directly to the motor system and implement a fast, model-free policies. The large, semi-transparent green arrow is an example of a connection carrying sensory outputs from a larger receptive field directly to the higher-level LM](../../figures/overview/overview_diagram.png)
+![High-level overview the architecture with all the main conceptual components applied to a concrete example. Blue lines indicate the main flow of information up the hierarchy. Purple lines show top-down connections, biasing the lower-level learning modules. Green lines show lateral voting connections. Pink lines show the communication of goal states which eventually translate into motor commands in the motor system. Every LM has a direct motor output. Information communicated along solid lines follows the CMP (contains features and pose). Discontinuations in the diagram are marked with dots on line-ends.  Dashed lines are the interface of the system with the world and subcortical compute units and do not need to follow the CMP. Blue dashed lines communicate raw sensory input from sensors. Pink dashed lines communicate motor commands to the actuators. The direct blue line from the SM to the motor system sends sensory information to support fast, model-free policies. The large, semi-transparent blue arrow is an example of a connection carrying sensory outputs from a larger receptive field directly to the higher-level LM](../../figures/overview/overview_diagram.png)
diff --git a/docs/overview/architecture-overview/sensor-modules.md b/docs/overview/architecture-overview/sensor-modules.md
@@ -7,4 +7,4 @@ The sensor module contains a sensor and associates the input to the sensor with
 
 A general principle of the system is that **any processing specific to a modality happens in the sensor module**. The output of the sensor module is not modality-specific anymore and can be processed by any learning module. A crucial requirement here is that each sensor module knows the pose of the feature relative to the sensor. This means that sensors need to be able to detect features and poses of features. The system can work with any type of sensor (vision, touch, radar, LiDAR,...) and integrate information from multiple sensory modalities without effort. For this to work, sensors need to communicate sensory information in a common language.
 
-![Sensor modules receive and process the raw sensory input. This is then communicated via a common messaging protocol to a learning module which uses this to learn and recognize models of anything in the environment.](../../figures/overview/s_mand_lm.png)
+![Sensor modules receive and process the raw sensory input. This is then communicated via a common messaging protocol to a learning module which uses this to learn and recognize models of anything in the environment.](../../figures/overview/sm_and_lm.png)
-Original file line number
+Diff line change
@@ Expand Up @@
     The training mode is split into two phases that alternate: The matching phase and the exploration phase. During the **matching phase** the module tries to determine the object ID and pose from a series of observations and actions. This is the same as in evaluation. After a terminal condition is met (object recognized or no possible match found) the module goes into the **exploration phase**. This phase continues to collect observations and adds them into the buffer the same way as during the previous phase, only the matching step is skipped. The exploration phase is used to add more information to the graph memory at the end of an episode. For example, the matching procedure could be done after three steps telling us that the past three observations are not consistent with any models in memory. Therefore we would want to store a new graph in memory but a graph made of only three observations is not very useful. Hence, we keep moving for `num_exploratory_steps` to collect more information about this object before adding it to memory. This is not necessary during evaluation since we do not update our models then.
-    ![First two episodes (separated by a vertical double line) during learning. After we recognize an object (matching phase) we can explore the object further to collect new information about it (green lines). This information can then be added to the model of the object in memory. The top row shows the agent’s movements during the episodes. The bottom row shows the models in memory. As we are learning from scratch, we have no model in memory during the first episode.](../figures/how-monty-works/learn_from_scratch.png)
+    ![First two episodes (separated by a vertical double line) during learning. After we recognize an object (matching phase) we can explore the object further to collect new information about it (pink lines). This information can then be added to the model of the object in memory. The top row shows the agent’s movements during the episodes. The bottom row shows the models in memory. As we are learning from scratch, we have no model in memory during the first episode.](../figures/how-monty-works/learn_from_scratch.png)
     <br />
@@ Expand Down @@
Original file line number	Diff line number	Diff line change
Expand Up		@@ -5,4 +5,4 @@ To consolidate these concepts, please see figure below) for a potential instanti

		While this hopefully makes the key concepts described above more concrete and tangible, keep in mind that this is just one way in which the architecture can be instantiated. By design, the Platform can be applied to any application that involves sensing and active interaction with an environment. Indeed, this might include more abstract examples such as browsing the web, or interacting with the instruments that control a scientific experiment.

		![High-level overview the architecture with all the main conceptual components applied to a concrete example. Green lines indicate the main flow of information up the hierarchy. Purple lines show top-down connections, biasing the lower-level learning modules. Light blue lines show lateral voting connections. Red lines show the communication of goal states which eventually translate into motor commands in the motor system. Every LM has a direct motor output. Information communicated along solid lines follows the CMP (contains features and pose). Discontinuations in the diagram are marked with dots on line-ends. Dashed lines are the interface of the system with the world and subcortical compute units and do not need to follow the CMP. Green dashed lines communicate raw sensory input from sensors. Red dashed lines communicate motor commands to the actuators. The dark red dashed lines send sensory information directly to the motor system and implement a fast, model-free policies. The large, semi-transparent green arrow is an example of a connection carrying sensory outputs from a larger receptive field directly to the higher-level LM](../../figures/overview/overview_diagram.png)
		![High-level overview the architecture with all the main conceptual components applied to a concrete example. Blue lines indicate the main flow of information up the hierarchy. Purple lines show top-down connections, biasing the lower-level learning modules. Green lines show lateral voting connections. Pink lines show the communication of goal states which eventually translate into motor commands in the motor system. Every LM has a direct motor output. Information communicated along solid lines follows the CMP (contains features and pose). Discontinuations in the diagram are marked with dots on line-ends. Dashed lines are the interface of the system with the world and subcortical compute units and do not need to follow the CMP. Blue dashed lines communicate raw sensory input from sensors. Pink dashed lines communicate motor commands to the actuators. The direct blue line from the SM to the motor system sends sensory information to support fast, model-free policies. The large, semi-transparent blue arrow is an example of a connection carrying sensory outputs from a larger receptive field directly to the higher-level LM](../../figures/overview/overview_diagram.png)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -7,4 +7,4 @@ The sensor module contains a sensor and associates the input to the sensor with

		A general principle of the system is that any processing specific to a modality happens in the sensor module. The output of the sensor module is not modality-specific anymore and can be processed by any learning module. A crucial requirement here is that each sensor module knows the pose of the feature relative to the sensor. This means that sensors need to be able to detect features and poses of features. The system can work with any type of sensor (vision, touch, radar, LiDAR,...) and integrate information from multiple sensory modalities without effort. For this to work, sensors need to communicate sensory information in a common language.

		![Sensor modules receive and process the raw sensory input. This is then communicated via a common messaging protocol to a learning module which uses this to learn and recognize models of anything in the environment.](../../figures/overview/s_mand_lm.png)
		![Sensor modules receive and process the raw sensory input. This is then communicated via a common messaging protocol to a learning module which uses this to learn and recognize models of anything in the environment.](../../figures/overview/sm_and_lm.png)