Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marker Improvements and Bugfixes #403

Merged
merged 41 commits into from
Dec 3, 2024
Merged

Marker Improvements and Bugfixes #403

merged 41 commits into from
Dec 3, 2024

Conversation

iammosespaulr
Copy link
Collaborator

@iammosespaulr iammosespaulr commented Nov 30, 2024

  • Add CLI option for Paginating the output --paginate_output
  • Add CLI option to disable image extraction --disable_image_extraction as well as a config option extract_image which defaults to True
  • Add ListProcessor for merging lists across pages and columns
  • Clean up dangling newlines in the outputs
  • Configurable OUTPUT_ENCODING
  • Fix bug with PageFooter being moved to the top of the page
  • Fix Section Hierarchy bug
  • Fix text inputs from all sources, the PdfProvider, surya OCR and tabled
  • Standardize DebugProcessor outputs
  • Add BlockquoteProcessor
  • Add support for nested list handling in ListProcessor

@VikParuchuri VikParuchuri merged commit f446e56 into master Dec 3, 2024
2 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants