Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misbehaviour of dimensions for some warped images #17

Open
StaelTchinda opened this issue Nov 18, 2023 · 4 comments
Open

Misbehaviour of dimensions for some warped images #17

StaelTchinda opened this issue Nov 18, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@StaelTchinda
Copy link

Hi,

first I want to thank you for your work.

The algorithm works fine for images, where all the document can be seen and is flat.
However, I have a case of a document, where the dimensions of the image are incorrectly computed. (Look part of the verbose below.) Maybe more constraints on the dimensions or corners or coordinates computations are required.

Loaded 67c656c099c941ae759.jpeg at size='1800x1013' --> resized='900x506'
  got 3 spans with 17 points.
  initial objective is 0.00017673946556242466
  optimizing 28 parameters...
  optimization took 0.21 sec.
  final objective is 7.04562825312913e-05
  got page dims 811571190.8768755 x 1.1532271561006338
  output will be 416613624176x592

I am not very familiar with how the code works, but if you know how the problem may be solved, you could quickly explain to me so that I implement it and create a pull request.

Best regards,

@SpicyCatGames
Copy link

Having the document would be very helpful. Please provide it if it's something you can share.

@joguy56
Copy link

joguy56 commented Apr 9, 2024

I encountered the same issue on pages that are near to be blank pages for example title pages where there is only one line, one block of text.

I observed that if the page contains multiple lines of text, it is fine.

@lmmx lmmx self-assigned this Sep 8, 2024
@lmmx lmmx added this to Planner Sep 8, 2024
@lmmx lmmx added the bug Something isn't working label Sep 8, 2024
@lmmx lmmx moved this to 🔮 Future in Page Dewarp Release Planner Sep 8, 2024
@lmmx lmmx moved this to 🐣 Hatching in Planner Sep 10, 2024
@lmmx
Copy link
Owner

lmmx commented Sep 14, 2024

To reiterate what @SpicyCatGames has said, thank you @StaelTchinda @joguy56 for the bugs! I am working on upgrading my triage process to drive them to completion, it would be super if anyone has a reproducible demo image they can share, this has proven key to solving issues in the past (I know how it works and my intuition can still be off!).

On that note there is a blog post repository wiki explaining how it works and I've made a note to increase the prominence and clarity of docs in forthcoming releases.

Edit it was a wiki I made notes in here, I recall it being quite extensively detailed (probably too detailed for most users but interesting for anyone curious about the inner working), you can find it at https://github.com/lmmx/page-dewarp/wiki

It was the original author Matt Zucker who wrote the blog post, which you can find here: https://mzucker.github.io/2016/08/15/page-dewarping.html

I'm triaging right now but I think a good way to debug this could be to simply attach the repo wiki docs and the source code files in a Claude Project and see what the LLM thinks 😸 I will also give it some good old-fashioned human investigation haha

@lmmx
Copy link
Owner

lmmx commented Sep 14, 2024

I encountered the same issue on pages that are near to be blank pages for example title pages where there is only one line, one block of text.

I observed that if the page contains multiple lines of text, it is fine.

This seems like a hint yes, so the algorithm works by finding "line contours" and then using these to find the orientation of the page as input to the "dewarping" algorithm.

To see the intermediate states, I highly recommend running with the debug flag (-d) set to its top value of 3, which will produce them for manual inspection. This was used last week to get to the bottom of an issue with poor results (due to default page margin removing valid lines, which can be fixed by lowering the value of the page margin).

I recently reviewed the code here and as a few years have gone by, I can now more easily see how to simplify it, and I'll be scheduling these upgrades in the coming months.

@lmmx lmmx moved this from 🐣 Hatching to 🔙 Backlog in Planner Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🔮 Future
Status: 📚 Shelved
Development

No branches or pull requests

4 participants