The key idea of Beam Search is to sample the most likely values for each time step and use them as input for the further. Similar to greedy search the Beam Search algorithm is commonly used in encoder decoder architectures. However, the main difference is in not only using the most likey word, but the B most likely ones. Based on those different parallel computations will be made to calculate the further best values with respect to the maximum probability of the prediction.
In comparison to greedy search the beam search algorithm does not prefer common values since several likely will be selected for further calculations.
- Define the Beam width:
- If you choose to define a large B you obtain better but slower results
- A small B will result worse but faster
-
Repeat picking B values
- In each iteration only B values will propagate
- B values over all results
Refinement:
Case 1:
- Beam search is at faul -> change B width
Case 2:
- RNN is at fault -> change architecture