Welcome to pymdp Discussions! #4

conorheins · 2020-12-09T17:05:04Z

conorheins
Dec 9, 2020
Maintainer

👋 Welcome!

Here is a discussion thread where we can discuss things like issues, bugs, and ideas for extensions that relate to the pymdp package.

yasasp · 2021-10-23T19:37:01Z

yasasp
Oct 23, 2021

Thank you for creating the library and groundwork to bring the Active Inference to the Python environment.

I have a few points on the Grid World example. As you use identity for the likelihood matrix (A), the uncertainty calculation (entropy.dot(Qs_pi)) value is always zero. This means there is no ambiguity at all. But, your algorithm still "looks as" it's doing an exploration. This is due to the random. multinomial(), where it always pick any random policy, where it seems to us that it's doing more like exploration. But actually, it's not. It's more like a Brownian motion. When the observation is getting close to the preference, the probability of getting an action policy resulting in the matching observation increases. But the exploration that we see is not due to uncertainty, which is one of the core concepts in FEP.

Another point is that when you evaluate the expected FEP of a policy by integrating over its actions, you keep the same best_guess belief distribution (Qs) constant. I think you should update the Qs distribution after each action using something similar to Qs = maths.softmax(log_stable(Qo_pi) + log_stable(Qs_pi)). I think this is important.

Lastly, it seems to me, the method you use to select the best action is not purely based on the best policy. You sum the normalized expected FEP's of a certain action over all policies and then select the action. But, if you update the best guess belief distribution when calculating the expected FEP (like I mentioned above), you can directly pick the first action of the best policy.
You can get the best action using below:
policies[np.random.choice(len(Q_pi.squeeze()), replace=False, p=Q_pi.squeeze())][0][0]

I notice that using this method, the agent finds the preference observation quickly. But, ergodicity is low as no exploration due to the identity matrix used for the likelihood (i.e. no ambiguity). Let's explore other ways to bring ambiguity to the algorithm, rather than using Python's "random" functions.
I am looking forward to seeing future developments.

Thank you~

Yasas Ponweera

1 reply

conorheins Oct 27, 2021
Maintainer Author

Hi Yasas,
Thanks for your interest in the package and your comments.

You're absolutely correct that the Grid World demo notebook doesn't exhibit the actively uncertainty-resolving behavior that has come to be associated with active inference. The purpose of that notebook isn't as much to demonstrate the various contributions to the expected free energy, but rather to show how simple navigation can be achieved with active inference. I agree that it'd be worth also including an example with perhaps an "ambiguous" grid world where some spatial grid locations have higher (expected) uncertainty than other locations.

And I agree that action sampling is more like 'greedy' or 'dumb' exploration - although this sort of exploration is also something that active inference models may include. The original SPM implementation, for instance, includes an alpha parameter that serves as the inverse temperature parameter of a softmax over the action posterior. This directly determines how "inherently noisy" actions are, and is sometimes an parameter to include when you're trying to compare different contributions to exploratory behaviour. So for now, I see no problem having intrinsic uncertainty in action selection, which in this case as you point out is the only source of "exploration" that the agent can use, in that particular grid world demo notebook.

Another point is that when you evaluate the expected FEP of a policy by integrating over its actions, you keep the same best_guess belief distribution (Qs) constant. I think you should update the Qs distribution after each action using something similar to Qs = maths.softmax(log_stable(Qo_pi) + log_stable(Qs_pi)). I think this is important. Lastly, it seems to me, the method you use to select the best action is not purely based on the best policy. You sum the normalized expected FEP's of a certain action over all policies and then select the action. But, if you update the best guess belief distribution when calculating the expected FEP (like I mentioned above), you can directly pick the first action of the best policy.
You can get the best action using below:
policies[np.random.choice(len(Q_pi.squeeze()), replace=False, p=Q_pi.squeeze())][0][0]

This is a good point, but the current mathematical derivation for the EFE (the 'standard form', as described here) does not update the predictive posterior (here, qs_pi) using the anticipated evidence (qo_pi) in the way you're describing. The mean-field factorization of adjacent timesteps in the variational posterior means that the predictive posterior is simply "run forward" through the transition dynamics (B), and the EFE is integrated over time without taking into account the anticipated effects of predicted observations on predicted beliefs. What you're describing is more akin to Sophisticated Inference where anticipated changes in belief are used to recursively compute the free energy for policies. Sophisticated inference would also be something worth implementing / exploring in the future for pymdp.

I notice that using this method, the agent finds the preference observation quickly. But, ergodicity is low as no exploration due to the identity matrix used for the likelihood (i.e. no ambiguity). Let's explore other ways to bring ambiguity to the algorithm, rather than using Python's "random" functions.

Although you're correct that in this simple grid world example there is no ambiguity component to the behavior, I encourage you to explore the other demo scripts we've developed as well as the functions within the control.py module, which includes all the uncertainty-resolving functions (computations of expected Bayesian surprise, e.g. spm_MDP_G) that you're interested in.

mahault · 2022-04-15T17:50:04Z

mahault
Apr 15, 2022

In line 95 of agent.py

assert utils.is_normalized(self.B), "A matrix is not normalized (i.e. A.sum(axis = 0) must all equal 1.0"

I believe this is an error, as it should be the B matrix not the A matrix.

1 reply

conorheins Apr 18, 2022
Maintainer Author

Thanks for noticing this @mahault -- I'm going to reference this in a new issue and address it in the next release!

helenegu · 2022-06-24T14:40:40Z

helenegu
Jun 24, 2022

Hi,
I had another question about some of the parameters of the agent class in pyMDP compared to mdp. In the Cognitive Behavioral Therapy paper I'm reading, they had an mdp.beta = 1 (policy precision), mdp.alpha = 4 (action precision), and mdp.eta = 1 (learning rate).
From what I understand, gamma in pyMDP is the equivalent of beta in mdp, and pA_lr/pB_lr/pD_lr are the equivalents of eta.

The specific things I was wondering are

does gamma work on the same scale as beta? ie the default for gamma in pyMDP is 16, but the paper sets mdp.beta=1, which is a little suspicious.
is there a parameter for action precision in pyMDP?

3 replies

conorheins Jun 27, 2022
Maintainer Author

Hi @helenegu,

First off, just to clarify, when you say "pymdp compared to mdp", I'm assuming you're asking about how pymdp compares to DEM / SPM in MATLAB? Namely, how POMDPs are specified in functions like spm_MDP_VB_X.m and spm_MDP_VB_XX.m?

If beta is the policy precision (i.e. it's related to the policy posterior as an inverse temperature of a softmax, AKA posterior_policy = softmax(- beta * G), where G is the expected free energy of policies), then you're absolutely correct that gamma is the equivalent of beta, and they should be on the same scale. For the purposes of reproducing the paper you're interested in, there's nothing stopping you from setting gamma = 1.0. I think in practice we set the default gamma = 16.0 just to make action selection a little more deterministic or "argmax-y".
There is actually a parameter for action precision (alpha), but currently it's relatively "deep" in the function hierarchy, as a parameter for the function sample_action() from the control module. Unfortunately, this parameter is currently defaulted to alpha = 16.0 and is not specifiable at the level of agent() -- you would have to call sample_action() explicitly from the control module and pass in that alpha parameter to change it. This would be relatively straightforward to do, it would basically require just writing out the lines in agent.sample_action() explicitly in your active inference loop, rather than using the agent-specific method. But based on this discussion, I will create an issue / feature-request based, so that we can pass in action_precision as a parameter to the Agent class or at least to the sample_action() method of Agent().

conorheins Jul 25, 2022
Maintainer Author

Hi @helenegu,
Just following up on this -- I've now allowed passing in alpha as a parameter to the Agent() constructor, see #81. I will close #81 once pymdp 0.0.6 is released, so that this enhancement is available in the PyPi version. Until then you can use the new feature by doing a local dev install of pymdp

helenegu Jul 26, 2022

Okay! Sounds good

helenegu · 2022-07-06T01:33:50Z

helenegu
Jul 6, 2022

Hi!
I was debugging some code and realizing that I didn't properly implement policies into the agent. In trying to fix it, I came across this message

This is the code I was writing, trying to set the num_controls to the list [2] so the length of num_controls can be 1 to match my policies matrix. I know the agent uses a num_controls, but is there a (different) way I can set it?

(for further reference, this is my policies matrix V.

The third one is the one I originally used, and the code compiled, but it had 3 control factors instead of 3 timesteps. The second one is the V I am using right now, it matches the MATLAB code exactly and compiles in Python, but I don't have 3 control factors, and the second/third columns in each of the 2 policies don't make sense to me. The one at the top is the V I ideally want to use, with one factor and 3 timesteps, however, when I use it, I get the error below)

**My factor 2 and 3 are associated with identity matrices in the B matrix/won't change regardless of actions, so I think it should be okay to have a 3 by 3. But I feel like the 3 by 3 should at least look like this picture below, with a column of 1s for a policy.
There's an "AssertionError: Maximum number of actions is not consistent with num_controls" associated with that though.

Any advice would be greatly appreciated,
Helene

4 replies

conorheins Jul 10, 2022
Maintainer Author

Hi @helenegu,

From the way you've created your policies matrix (your V), you're implying that there are three control factors (so length of num_controls should be 3 in this case).
Please see the following description of how the policies argument should be formatted:

policies: ``list`` of 2D ``numpy.ndarray``
        ``list`` that stores each policy in ``policies[p_idx]``. Shape of ``policies[p_idx]`` is ``(num_timesteps, num_factors)``  where `num_timesteps` is the temporal depth of the policy and ``num_factors`` is the number of control factors.

So if your generative model only has a single control factor (am I understanding that correctly?), with 2 possible actions along that control factor (implied by num_controls = [2]), then a given policy-matrix (a given entry of V, e.g. V[1]), should be a numpy matrix with shape (num_timesteps, 1), not shape (num_timesteps, 3).

For example, it seems like your V could be the following:

V = [ np.array([[0], [0], [0]]), np.array([[1], [0], [0]]) ]

Which will make two "long" columns, one of all zeros, and one with just a 1 at the first timestep. The length of V corresponds to the number of total policies, but not the number of control factors.

Once you have V, then your num_controls can be set to [2] as it seems you desire.

Alternatively, given a matrix of policies, you can use this function from the control.py module: control.get_num_controls_from_policies(policies) (where policies is your list of policy-specific matrices) to generate num_controls automatically. Note however: this function assumes a policy space such that for each control factor, there is at least one policy that entails taking the action with the maximum index along that control factor. This works in your case however, since your second policy V[1] entails taking action 1 at the first timestep, the maximum index of the control factor (whose total dimensionality is 2).

I hope this helps!

Conor

helenegu Jul 12, 2022

Hi,
I think I tried to make that matrix in the first V, with [0,0,0].reshape(-1,1) and [1,1,1].reshape(-1,1) but I just tried your code too, and it still says there's an attribute error. I'm not sure I can pass any num_controls into the agent?

So I'll look more into the control.get_num_controls_from_policies() you mentioned

Thank you!
Helene

conorheins Jul 12, 2022
Maintainer Author

Hi Helene,
Sorry, I see that the original problem is just with passing in num_controls as an argument -- this is a good find. I tried a minimal example and ran into the same error. Will start a PR now to deal with this. Basically we need an else statement after this line in agent.py that assigns num_controls to a property of Agent if you pass it in as an argument.

In the meantime, while I'm fixing that bug, can you do this instead? num_controls will be automatically generated for you:

V = [ np.array([[0], [0], [0]]), np.array([[1], [0], [0]]) ]
updated_agent = Agent(A = ..., B = ..., policies = V)

And don't pass in a num_controls argument at all. If your B matrix is appropriately shaped (a single-factor B matrix, with B[0] having shape (num_states[0], num_states[0], 2), then the updated_agent.num_controls property will be [2].

conorheins Jul 12, 2022
Maintainer Author

Update, I've now fixed this bug, see the merged PR here
and the issue I created for it here.

So if you install the local development version of pymdp (see instructions for how to do that here), you shouldn't be facing this bug with num_controls anymore.

I will close Issue #84 after the next release when the bugfix will be applied to the PyPi version of the package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome to pymdp Discussions! #4

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 9 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Welcome to pymdp Discussions! #4

conorheins Dec 9, 2020 Maintainer

👋 Welcome!

Replies: 4 comments · 9 replies

yasasp Oct 23, 2021

conorheins Oct 27, 2021 Maintainer Author

mahault Apr 15, 2022

conorheins Apr 18, 2022 Maintainer Author

helenegu Jun 24, 2022

conorheins Jun 27, 2022 Maintainer Author

conorheins Jul 25, 2022 Maintainer Author

helenegu Jul 26, 2022

helenegu Jul 6, 2022

conorheins Jul 10, 2022 Maintainer Author

helenegu Jul 12, 2022

conorheins Jul 12, 2022 Maintainer Author

conorheins Jul 12, 2022 Maintainer Author

conorheins
Dec 9, 2020
Maintainer

Replies: 4 comments 9 replies

yasasp
Oct 23, 2021

conorheins Oct 27, 2021
Maintainer Author

mahault
Apr 15, 2022

conorheins Apr 18, 2022
Maintainer Author

helenegu
Jun 24, 2022

conorheins Jun 27, 2022
Maintainer Author

conorheins Jul 25, 2022
Maintainer Author

helenegu
Jul 6, 2022

conorheins Jul 10, 2022
Maintainer Author

conorheins Jul 12, 2022
Maintainer Author

conorheins Jul 12, 2022
Maintainer Author