You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all: Thanks a lot for providing this library, it is super useful! I'm opening this "issue" merely as a means to brainstorm some algorithmic ideas. Feel free to just close the ticket at any time.
I noticed the following behavior: In general it looks like the algorithm is sensitive to the time direction, i.e., the confidence is rather weak directly after the onset of a note, but once a stable pitch has been established, the pitch is tracked robustly into the decay phase of the note. To verify this, I simply applied the algorithm to the time-reversed signal, which just shows the opposite behavior. For example, the following plot shows the transition of 3 consecutive chromatic notes:
The middle plot is pitch/frequency, the bottom plot the periodicity/confidence. The blue lines corresponds to applying the algorithm in forward direction. The yellow line corresponds to the application in backwards time direction. Note that the pitch estimates and confidences are basically "shifted" in either forward or backward time direction around the "area of uncertainty".
As a very naive approach, I have simply combined the information of the forward and backward pass (the green and the red curve). I'm basically just weighting the two results with weights corresponding to |confidence|^p where p is an exponent that allows to transition from just averaging to taking the one with maximum confidence. Even this naive forward+backward seems to improve the results quite a bit.
I'm wondering if it would be worthwhile to actually incorporate such a "forward + backward" logic directly into the decoder? Originally my assumption was that Viterbi decoding would be invariant with the time direction, but that doesn't seem to be the case. Perhaps exploiting the information from both time direction on the decoder level could even lead to better results?
The text was updated successfully, but these errors were encountered:
First of all: Thanks a lot for providing this library, it is super useful! I'm opening this "issue" merely as a means to brainstorm some algorithmic ideas. Feel free to just close the ticket at any time.
I noticed the following behavior: In general it looks like the algorithm is sensitive to the time direction, i.e., the confidence is rather weak directly after the onset of a note, but once a stable pitch has been established, the pitch is tracked robustly into the decay phase of the note. To verify this, I simply applied the algorithm to the time-reversed signal, which just shows the opposite behavior. For example, the following plot shows the transition of 3 consecutive chromatic notes:
The middle plot is pitch/frequency, the bottom plot the periodicity/confidence. The blue lines corresponds to applying the algorithm in forward direction. The yellow line corresponds to the application in backwards time direction. Note that the pitch estimates and confidences are basically "shifted" in either forward or backward time direction around the "area of uncertainty".
As a very naive approach, I have simply combined the information of the forward and backward pass (the green and the red curve). I'm basically just weighting the two results with weights corresponding to
|confidence|^p
wherep
is an exponent that allows to transition from just averaging to taking the one with maximum confidence. Even this naive forward+backward seems to improve the results quite a bit.I'm wondering if it would be worthwhile to actually incorporate such a "forward + backward" logic directly into the decoder? Originally my assumption was that Viterbi decoding would be invariant with the time direction, but that doesn't seem to be the case. Perhaps exploiting the information from both time direction on the decoder level could even lead to better results?
The text was updated successfully, but these errors were encountered: