Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinforcement Learning formalization #7

Open
johnnyflame opened this issue Aug 25, 2018 · 0 comments
Open

Reinforcement Learning formalization #7

johnnyflame opened this issue Aug 25, 2018 · 0 comments

Comments

@johnnyflame
Copy link

Hi, I was wondering if you would clarify some assumptions regarding the RL aspect of the paper and code.

In a RL context, we have State, Action, Reward, and Next State (SARS).

In this particular problem, it appears that the the state is the original query, and the action is essentially a bit-vector acquired from running a sliding window through the candidate terms (1 for append to q', and 0 otherwise). The reward is the optimization metric (in this instance, MAP@40).

Is this interpretation correct?

The paper mentions that the next state is represented by the set of documents retrieved by the reformulated-query. But it appears that this next state isn't actually being fed back into the network, i.e, it's a one-shot approach. How does the agent learn the correlation between states, and the resulting next-state of taking various actions?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant