Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

marker_single bbox detection crash on non-simple PDFs #127

Open
bjpcjp opened this issue May 16, 2024 · 4 comments
Open

marker_single bbox detection crash on non-simple PDFs #127

bjpcjp opened this issue May 16, 2024 · 4 comments

Comments

@bjpcjp
Copy link

bjpcjp commented May 16, 2024

about:

  • ubuntu 23.10, Linux 6.5.0-28
  • Intel i7-2650Hx16
  • 16GB DDR3

history:

$marker_single <file1.pdf>

  • crashed when detecting bboxes (0/6)
  • crashed when detecting bboxes (2/6)

$marker_single <file2.pdf>

  • crashed when detecting bboxes (0/6)

$marker_single <file3.pdf>

  • successful

file1 is from ArXiV 2401.14295v1 (topologies of reasoning)
file2 is a chapter from a book on game theory. Lots of images.
file3 is a simple HTML-to-pdf glossary doc. No images, just a list of terms & definitions.

@VikParuchuri
Copy link
Owner

Try again after updating the package, I fixed a memory leak after you posted this

@bjpcjp
Copy link
Author

bjpcjp commented May 17, 2024

TY @VikParuchuri!

This time file1 (ArXiV 2401.12495v1) made it through the first bbox detection loop (5/5 successful). It crashed on the second bbox detection loop (0/4).

I'm using marker-pdf v0.2.6. There's some dependency errors that need to be sorted out:

langchain-core 0.1.48 --> packaging<24.0,>=23.2; 24.0 installed.
mkdocs 1.4.2 --> markdown<3.4,>=3.2.1; 3.4.4 installed.
torchvision 0.16.1 --> torch 2.1.1; 2.3.0 installed.

@VikParuchuri
Copy link
Owner

VikParuchuri commented May 17, 2024

If you can share the files, it would help me debug. Langchain and mkdocs aren't marker dependencies - installing marker in a virtualenv might help with isolating other dependencies

@bjpcjp
Copy link
Author

bjpcjp commented May 17, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants