Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UPDATE] Misleading characterization of state in the Q-Table #452

Closed
lutzvdb opened this issue Dec 22, 2023 · 3 comments
Closed

[UPDATE] Misleading characterization of state in the Q-Table #452

lutzvdb opened this issue Dec 22, 2023 · 3 comments

Comments

@lutzvdb
Copy link
Contributor

lutzvdb commented Dec 22, 2023

What do you want to improve?

In the images depicting the Q-table in Unit 2, I am assuming that the first column is supposed to represent the state. In this courses glossary, the state is defined as the "Complete description of the state of the world.", hence I'd expect the state to be something like a screenshot of a given game situation. However, the column items depict the single six tiles of the board.

It is not really clear how this relates to the state of the game, which - if I understand it correctly - should be a state of all six tiles of the board at the same time. Two examples:

  • Since we moved right and then down, the cheese in tile A2 has already been eaten and we are on the poison tile.
  • If we moved right twice, the cheese on A2 has already been eaten and has disappeared. Therefore, the consequences of going back to tile A2 are different than they were before when we were on A1 and there still was a cheese on A2.

I hope I'm not misreading the concepts here, but the way I see it the table is not exactly aiding the understanding of what a "state" is. Especially, in my perception, the Q-table has a lot more than six rows since there are a lot more than six possible states.

@Ivan-267
Copy link
Contributor

Ivan-267 commented Dec 22, 2023

Hello, as I am relatively new to RL terminology, my comment below is partially guessing.

I think in this case, the state is only the current position of the agent. In the hands-on, this environment is used: https://www.gymlibrary.dev/environments/toy_text/frozen_lake/

If we take a look at the observations:

Observation Space
The observation is a value representing the agent’s current position as current_row * nrows + current_col (where both the row and col start at 0). For example, the goal position in the 4x4 map can be calculated as follows: 3 * 4 + 3 = 15. The number of possible observations is dependent on the size of the map. For example, the 4x4 map has 16 possible observations.

So for an environment such as this, we can calculate the best action for the agent to take based on its position in the grid.

While the full state of the environment itself may include various additional information (position of everything on the grid, image data for rendering graphical elements, additional internal game state), the agent receives only the observations that are necessary for learning, and in a simple environment, that can be only the current position of the agent.

However, the inputs to the Q function are called states (the function gives you the value for any state and action that you input).
From: https://huggingface.co/learn/deep-rl-course/unit2/q-learning

Given a state and action, our Q-function will search its Q-table for the corresponding value.

There is a note here that clarifies the terminology used:
https://huggingface.co/learn/deep-rl-course/unit1/rl-framework

In this course, we use the term "state" to denote both state and observation, but we will make the distinction in implementations.

Of course, for a more complex environment, we may have to provide the agent with more information about the current state of the environment.

@lutzvdb
Copy link
Contributor Author

lutzvdb commented Dec 28, 2023

Thank you for weighing in. I think you're right in that for these simple environments, the position alone is considered sufficient for describing the state of the environment. A clarification in the text explaining this would however be very helpful in understanding. I'll create a PR adding a clarification statement!

@lutzvdb
Copy link
Contributor Author

lutzvdb commented Dec 28, 2023

See PR 454. Closing this issue!

@lutzvdb lutzvdb closed this as completed Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants