Optimize fill_holes_and_remove_small_masks #1116
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Identifying and replacing bottlenecks in the
fill_holes_and_remove_small_masks
functionFor my own projects, I was depending on
fill_holes_and_remove_small_masks
, which would often slow down the processing throughput of large datasets. I managed to narrow down the bottleneck toscipy.ndimage.morphology.binary_fill_holes
. When this is replaced by a more optimized algorithm likefill_voids.fill
(documentation), we can get massive speedups, especially since it can easily be rewritten into a multithreaded calculation in this way.Another change proposed in this PR is the separation of small mask filtering and the fill_holes operation. This approach results in rapid filtering of small masks by counting the labels in the flattened label image (so it supports 2D & 3D) using
np.bincount
. If any of the counts are belowmin_size
, they are set to 0 through thenp.isin
filter. The advantage of splitting the for-loop into two components is that we can now omit calculating the sum of every individual mask, which saves more time in the end.