-
Notifications
You must be signed in to change notification settings - Fork 937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RBC] Various issues encountered during the use of RBC #811
Comments
Thanks @acxz. @michalsustr can you take a look and respond to Akash? |
Hi @acxz, I will try to have a look on the weekend. If you have an implementation that would fix these I could review the PR :) |
Sweet! Sadly I'm a bit busy for ~2 weeks, but after that can put in the effort to help resolve this issue. for me 1) & 5) should be p quick. I think 2) and 4) (recording the previous taken action) is the most important and would appreciate it if you could tackle those. |
Thanks! Sorry, this is not high priority for me at the moment due to neurips deadline coming, but I will try to have a look then :) |
For book-keeping: a PR was opened for issue 1) here: #817 but was never finished. |
@acxz @michalsustr just circling back to this now. Wondering what we should do here. Are these issues serious enough to warrant removal? I don't like the idea of keeping buggy games in the code base. Will these ever be fixed? |
Actually, I think I have a better idea for games in this category: #1163 |
Hello @lanctot! I definitely want to get back to this, but I can't say when I'll get back to RBC related work. Adding the warning I think is a wonderful idea, that still lets other use and work on top of it! |
Hello, first off thanks for creating this amazing group of environments for board games! It is only place where I have found an implementation of ReconBuildChess besides APL's own implementation.
I've been trying to utilize the RBC environment and here are some issues/pain points that I have encountered and thought I'd share:
Note: I am using the RBC environment through a gym like wrapper created by the ray-project team here
So far my main goal has been to recreate the random, attacker, and trout bots that APL provides located here.
Here is my progress. Which I have been able to recreate with some slight issues and workarounds.
Ultimately, I'd like to have the same API for creating bots as APL does so that a class that defines an APL bot can be used one to one for a ray/rllib policy with their open_spiel wrapper. Right now I have a rough skeleton on where the APL bot functions go in the ray/rllib pipeline, but don't separate them into their own functions.
Basically, be able to train an RBC agent using a rllib with the open_spiel environment and then deploy it on the APL rbc servers.
Anyway here are some issues I have had concerning open_spiel's rbc implementation.
1) Not all sense actions are encoded.
https://github.com/deepmind/open_spiel/blob/b6f88b6f2fddadd7e0f8e01c78d2becb61c51a33/open_spiel/games/rbc.cc#L421-L423
From what I understand of the above code, since a sense grid is 3x3, it make less sense for an agent to sense at the edges of the board, i.e. ranks 1 and 8 and files a and h. First off, this is different from APL's implementation as they do allow you to sense any location. Secondly, this is currently not implemented properly as the
cached_legal_actions_
is simply resized. this gives the first (inner_size x inner_size 6x6) 36 actions, instead of the 36 actions corresponding to the inner board. Effectively, allowing the agent to sense file a, but not files g or h.I believe the proper way to fix this is to allow the agent to sense anywhere on the game board, even on the edges, for APL consistency's sake, but honestly just fixing the logic for the
cached_legal_actions_
should be good enough.Related commit: d1f6735
2) Return the previous taken_action in ObservationString/ObservationTensor
RBC is an uncertain game and the requested action may not be the action that is taken due to the board state. This information is required for agents to properly keep track of the game state and is information that APL's RBC provides. See 6i) for more info on how I incorrectly workaround this issue currently using state.history. But essentially, the previous taken_action should be provided to the agents via both the ObservationString and the ObservationTensor. See APL's
hande_move_result
for more info.Maybe this information exists, but I have not been able to extract it from ObservationString.
Current rationale:
https://github.com/deepmind/open_spiel/blob/0c265fbaa56c418788333fdf3bd373ff12620d14/open_spiel/games/rbc.h#L93-L94
I believe this should be changed to provide the information explicitly in the ObservationString/ObservationTensor.
This could also help speed up learning, as this important doesn't have to be extracted.
3) Return your captured piece and location in ObservationString/ObservationTensor
Previously related issue: #696
APL's RBC will return the your captured piece and location for you to handle any internal logic based on the opponents move. This allows you to update say your internal board representation based on pieces you have lost.
From what I can tell, in open_spiel, you are told that your piece is captured, but not which one or where.
https://github.com/deepmind/open_spiel/blob/b6f88b6f2fddadd7e0f8e01c78d2becb61c51a33/open_spiel/games/rbc.cc#L220-L221
This looks to be missing functionality.
See
handle_opponent_move_result
for more info.Current rationale:
https://github.com/deepmind/open_spiel/blob/0c265fbaa56c418788333fdf3bd373ff12620d14/open_spiel/games/rbc.h#L75-L76
I believe this should be changed to provide the information explicitly in the ObservationString/ObservationTensor.
This could also help speed up learning, as this important doesn't have to be extracted.
4) Return the previous sense action in ObservationString/ObservationTensor
This is similar to (2): return the previous taken_action. The ObservationString contains the current observed board and does contain the newly sensed information when calling state.observation_string() during the move phase. However, the open_spiel env should return which grids were requested to sense so that the agent can parse the ObservationString to determine the latest sense results. See:
handle_sense_result
for what APL provides the agents.5) Pawn promotion to queen by default
Certain moves that are illegal will still have an effect on the board and this case is not captured in open_spiel's implementation. Particularly, say my pawn is at d7 and the move my algorithm choose is d7e8 to capture an opponent piece. Based on the legal moves (for both open_spiel and APL), d7e8 does not exist, but d7e8q does. (with the queen promotion). APL's RBC will convert my d7e8 to d7e8q but open_spiel's rbc will throw an error saying d7e8 does not exist.
Note: The qualms below are only valid since (2)-(4) are not currently properly implemented. This has made try to use state.action_to_string() to attempt to obtain the missing information in uci form, which made me realize the limitations that the current
action_to_string()
method has. (6) may not/is not worth the time if (2)-(4) are properly handled and the information is provided in ObservationString.6) Cannot call
action_to_string()
andstring_to_action()
with the different action spaces (sense action space, move action space).https://github.com/deepmind/open_spiel/blob/b6f88b6f2fddadd7e0f8e01c78d2becb61c51a33/open_spiel/games/rbc.cc#L444-L454
This one might be a bit harder to explain, but essentially when creating an agent that uses the uci action encoding (for example an agent based on stockfish) then moving between the open_spiel action representation and the uci representation is important. Moreover, it is important to be able to use the
action_to_string
method on the input space of move actions while I am currently in sense mode and vice versa. Here are the following cases where it is particularly needed.i) During an agent's sense phase, the agent would like to
hande_move_result
of its last move that it took. This means that (See (2)) the open_spiel env should provide the agent the taken_move (note which is different from the requested move as rbc is an uncertain game). This taken_move should be provided in theObservationString
, but since it is not I actually use thestate.history()
to obtain the last requested_move by the player (which is in open_spiel move action space). For me to convert this into something that my agent can utilize (say to keep track of its internal board representation) then I need to useaction_to_string(taken_action)
. This will fail sinceaction_to_string
is being called during the sense phase and not the move phase. To handle this, I actually keep a deep copy of a state which is in sense phase, and a copy of a state which is in move phase, so that I can callmove_state.action_to_string(taken_action)
during the sense phase to get the proper string of the corresponding previous move action.See the following for how I handle this in detail:
Note: I am assuming that the previous requested action was the taken action which is not the case (See: (2))
There are two other places (See: (3)/(4)) where I utilized this action_to_string, but they are sort of needed because of the lack of functionality elsewhere. Similarly, in case i) I would not be resorted to using
state.action_to_string
if taken_move was provided to me in the ObservationString.cc'ing as original implementer and avid user
@michalsustr and @VitamintK
if you got here then have a cookie: 🍪
The text was updated successfully, but these errors were encountered: