Reinforcement Learning formalization #7

johnnyflame · 2018-08-25T02:33:27Z

Hi, I was wondering if you would clarify some assumptions regarding the RL aspect of the paper and code.

In a RL context, we have State, Action, Reward, and Next State (SARS).

In this particular problem, it appears that the the state is the original query, and the action is essentially a bit-vector acquired from running a sliding window through the candidate terms (1 for append to q', and 0 otherwise). The reward is the optimization metric (in this instance, MAP@40).

Is this interpretation correct?

The paper mentions that the next state is represented by the set of documents retrieved by the reformulated-query. But it appears that this next state isn't actually being fed back into the network, i.e, it's a one-shot approach. How does the agent learn the correlation between states, and the resulting next-state of taking various actions?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinforcement Learning formalization #7

Reinforcement Learning formalization #7

johnnyflame commented Aug 25, 2018

Reinforcement Learning formalization #7

Reinforcement Learning formalization #7

Comments

johnnyflame commented Aug 25, 2018