Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithmic idea: Implement a forward-backward decoder? #35

Open
bluenote10 opened this issue Jan 21, 2024 · 0 comments
Open

Algorithmic idea: Implement a forward-backward decoder? #35

bluenote10 opened this issue Jan 21, 2024 · 0 comments

Comments

@bluenote10
Copy link

First of all: Thanks a lot for providing this library, it is super useful! I'm opening this "issue" merely as a means to brainstorm some algorithmic ideas. Feel free to just close the ticket at any time.

I noticed the following behavior: In general it looks like the algorithm is sensitive to the time direction, i.e., the confidence is rather weak directly after the onset of a note, but once a stable pitch has been established, the pitch is tracked robustly into the decay phase of the note. To verify this, I simply applied the algorithm to the time-reversed signal, which just shows the opposite behavior. For example, the following plot shows the transition of 3 consecutive chromatic notes:

image

The middle plot is pitch/frequency, the bottom plot the periodicity/confidence. The blue lines corresponds to applying the algorithm in forward direction. The yellow line corresponds to the application in backwards time direction. Note that the pitch estimates and confidences are basically "shifted" in either forward or backward time direction around the "area of uncertainty".

As a very naive approach, I have simply combined the information of the forward and backward pass (the green and the red curve). I'm basically just weighting the two results with weights corresponding to |confidence|^p where p is an exponent that allows to transition from just averaging to taking the one with maximum confidence. Even this naive forward+backward seems to improve the results quite a bit.

I'm wondering if it would be worthwhile to actually incorporate such a "forward + backward" logic directly into the decoder? Originally my assumption was that Viterbi decoding would be invariant with the time direction, but that doesn't seem to be the case. Perhaps exploiting the information from both time direction on the decoder level could even lead to better results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant