Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printing equation division lines in pdf2txt text output #5

Merged
merged 9 commits into from
Sep 28, 2022

Conversation

dakaza98
Copy link
Collaborator

We have done issue euske#305 where we have to display the division lines when outputting to the console.

before
image

after
image

Note, the file i used in these command are the same, they are just put in different directories.

Some of the files contain a lot of formatting changes since they have not been merged from sprint 1 but the relevent changes are in
converter.py around line 184
layout.py around line 177

@kakann kakann requested review from kakann and ghmo2789 September 26, 2022 07:18
Copy link
Owner

@kakann kakann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly everything looks great, just fix the dynamic resizing of the equation line.

If possible, there seem to be some extra linebreaks in the output from reading an equation, could those linebreaks be removed? This can be seen in the example screenshot of the output as shown in this pullrequest. After the numenator of the first equation there is a linebreak before the Vinculum which cannot be seen in equation.pdf. Is that possible to fix?

@@ -177,8 +181,17 @@ def render(item):
elif isinstance(item, LTImage):
if self.imagewriter is not None:
self.imagewriter.export_image(item)
elif isinstance(item, LTLine):
self.write_text("-----\n")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line length is fixed atm, but if an equation is very large, the line will not scale with its size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is because it is diffucult to know how long the line should be in the conosle. We do have access to how wide the line was in the pdf but converting that width into characters in the console is realy tricky and tedious. So for now it is like this

Copy link
Collaborator

@ghmo2789 ghmo2789 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me and Martin good the review together, so same comment as him.

@dakaza98
Copy link
Collaborator Author

Mostly everything looks great, just fix the dynamic resizing of the equation line.

If possible, there seem to be some extra linebreaks in the output from reading an equation, could those linebreaks be removed? This can be seen in the example screenshot of the output as shown in this pullrequest. After the numenator of the first equation there is a linebreak before the Vinculum which cannot be seen in equation.pdf. Is that possible to fix?

The line breaks could be removed but it's very tricky to do. The linebreaks comes from when pdfminer groups all elements into textboxes and there seems to be something going on there. But basically it is a challanging task that we leave as a future problem to solve

@ghmo2789 ghmo2789 self-requested a review September 28, 2022 13:07
Copy link
Collaborator

@ghmo2789 ghmo2789 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve

@ghmo2789 ghmo2789 merged commit a288cde into master Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants