You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was wondering if you would clarify some assumptions regarding the RL aspect of the paper and code.
In a RL context, we have State, Action, Reward, and Next State (SARS).
In this particular problem, it appears that the the state is the original query, and the action is essentially a bit-vector acquired from running a sliding window through the candidate terms (1 for append to q', and 0 otherwise). The reward is the optimization metric (in this instance, MAP@40).
Is this interpretation correct?
The paper mentions that the next state is represented by the set of documents retrieved by the reformulated-query. But it appears that this next state isn't actually being fed back into the network, i.e, it's a one-shot approach. How does the agent learn the correlation between states, and the resulting next-state of taking various actions?
Thanks
The text was updated successfully, but these errors were encountered:
Hi, I was wondering if you would clarify some assumptions regarding the RL aspect of the paper and code.
In a RL context, we have State, Action, Reward, and Next State (SARS).
In this particular problem, it appears that the the state is the original query, and the action is essentially a bit-vector acquired from running a sliding window through the candidate terms (1 for append to q', and 0 otherwise). The reward is the optimization metric (in this instance, MAP@40).
Is this interpretation correct?
The paper mentions that the next state is represented by the set of documents retrieved by the reformulated-query. But it appears that this next state isn't actually being fed back into the network, i.e, it's a one-shot approach. How does the agent learn the correlation between states, and the resulting next-state of taking various actions?
Thanks
The text was updated successfully, but these errors were encountered: