-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Printing equation division lines in pdf2txt text output #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly everything looks great, just fix the dynamic resizing of the equation line.
If possible, there seem to be some extra linebreaks in the output from reading an equation, could those linebreaks be removed? This can be seen in the example screenshot of the output as shown in this pullrequest. After the numenator of the first equation there is a linebreak before the Vinculum which cannot be seen in equation.pdf. Is that possible to fix?
@@ -177,8 +181,17 @@ def render(item): | |||
elif isinstance(item, LTImage): | |||
if self.imagewriter is not None: | |||
self.imagewriter.export_image(item) | |||
elif isinstance(item, LTLine): | |||
self.write_text("-----\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line length is fixed atm, but if an equation is very large, the line will not scale with its size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is because it is diffucult to know how long the line should be in the conosle. We do have access to how wide the line was in the pdf but converting that width into characters in the console is realy tricky and tedious. So for now it is like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me and Martin good the review together, so same comment as him.
The line breaks could be removed but it's very tricky to do. The linebreaks comes from when pdfminer groups all elements into textboxes and there seems to be something going on there. But basically it is a challanging task that we leave as a future problem to solve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve
We have done issue euske#305 where we have to display the division lines when outputting to the console.
before
after
Note, the file i used in these command are the same, they are just put in different directories.
Some of the files contain a lot of formatting changes since they have not been merged from sprint 1 but the relevent changes are in
converter.py around line 184
layout.py around line 177