HerdNet requires input images of 512x512 at minimum #553

calebrob6 · 2025-01-14T23:07:44Z

Search before asking

I have searched the Pytorch-Wildlife issues and found no similar bug report.

Bug

Following https://github.com/microsoft/CameraTraps/blob/main/demo/image_detection_demo_herdnet.ipynb with a path to an image that is 256x256 results in the following error:

RuntimeError: Expected size of input's dimension 1 to be divisible by the product of kernel_size, but got input.size(1)=81920 and kernel_size=(256, 256).

Resizing the image to 512x512 makes this error go away. It seems to be caused by the default parameters of the stitcher.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

… with dimension less than 512 are scaled to use default herdnet values

Issue #553 and #554 solved

idchacon28 · 2025-01-27T19:51:49Z

Hello @calebrob6,

Thank you for reporting this issue. You are correct that HerdNet requires a minimum image size of 512x512 due to the default parameters of the stitcher, and inputting an image with smaller dimensions causes a runtime error.

To resolve this, I have implemented a ResizeIfSmaller class that resizes an image to the minimum required size if it is smaller. This ensures compatibility with the default stitcher parameters without manual resizing. Here's the code for the class:

class ResizeIfSmaller:  
    def __init__(self, min_size, interpolation=Image.BILINEAR):  
        self.min_size = min_size  
        self.interpolation = interpolation  
  
    def __call__(self, img):  
        if isinstance(img, np.ndarray):  
            img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))  
        assert isinstance(img, Image.Image), "Image should be a PIL Image"  
  
        width, height = img.size  
        if height < self.min_size or width < self.min_size:  
            ratio = max(self.min_size / height, self.min_size / width)  
            new_height = int(height * ratio)  
            new_width = int(width * ratio)  
            img = img.resize((new_width, new_height), self.interpolation)  
        return img

You can use this class to preprocess your images before feeding them into HerdNet. This should prevent the error you were experiencing. To incorporate this fix into your workflow, please pull the latest changes from the repository. I am going to close this issue now, but please feel free to reopen it if the problem persists or if you have any more questions.

Best regards,
@idchacon28

calebrob6 added the bug Something isn't working label Jan 14, 2025

idchacon28 added a commit that referenced this issue Jan 25, 2025

issue #553 and #554 solved. now images can be numpy arrays and images…

dfaf20d

… with dimension less than 512 are scaled to use default herdnet values

idchacon28 mentioned this issue Jan 25, 2025

Issue #553 and #554 solved #559

Merged

zhmiao added a commit that referenced this issue Jan 25, 2025

Merge pull request #559 from microsoft/PreRelease

f903c9c

Issue #553 and #554 solved

idchacon28 closed this as completed Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HerdNet requires input images of 512x512 at minimum #553

HerdNet requires input images of 512x512 at minimum #553

calebrob6 commented Jan 14, 2025

idchacon28 commented Jan 27, 2025 •

edited

Loading

HerdNet requires input images of 512x512 at minimum #553

HerdNet requires input images of 512x512 at minimum #553

Comments

calebrob6 commented Jan 14, 2025

Search before asking

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

idchacon28 commented Jan 27, 2025 • edited Loading

idchacon28 commented Jan 27, 2025 •

edited

Loading