Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically make an enter paragraph #662

Open
cutegitcat opened this issue Dec 21, 2023 · 2 comments
Open

automatically make an enter paragraph #662

cutegitcat opened this issue Dec 21, 2023 · 2 comments

Comments

@cutegitcat
Copy link

Hello everybody
I find this software tool a good tool to convert text images into simple text format.
I have a request:
If I have e.g. 100 (short) text images files and would convert all of them to a text format,
all of them are put together as text without an enter paragraph.
It should be technically feasible to automatically make an enter paragraph.
Explained in more detail with the example:
with 100 (short) text images makes so far infinite text on a text format without enter paragraph.
Should be solved, with 100 (short) text images makes text with 100 times enter paragraph.
Solved text would be: text, enter-paragraph, text, enter-paragraph, text, enter-paragraph,
text, enter-paragraph, text, enter-paragraph, text...
Thank you very much in advance, only if this is feasible. :-)

@manisandro
Copy link
Owner

Can you elaborate? When recognizing in plain text mode, the text from each recognized image is separated with a line break. If you recognize to hOCR, the output is split into separate pages. Not sure where you get one continuous text block with the text of all recognized images?

@cutegitcat
Copy link
Author

Hello manisandro !

Thanks for the message.
From practical experience it has been time to convert an image file with the text it has in text form (with OCR) the lines in order - it is fine. Only I mean when many files it is not in order. There is a missing line break between the files of the image file.

The following is a simplified explanation based on an example:

Image-File 1: The melting Arctic is a crime scene.

Image-File 2: J7 is the anonymous perpetrator leaving evidence and clues for me to discover,

Image-File 3: like breadcrumbs leading back to him. James, he had said,

Image File 4: the day we first met at the research institute,

Image File 5: "If you are going to make it up here, don’t lock your doors."

Image File 6: It seemed like a life philosophy, rather than a survival tip.

This was converted without a line break and looks like this:

The melting Arctic is a crime scene.
J7 is the anonymous perpetrator leaving evidence and clues for me to discover,
like breadcrumbs leading back to him. James, he had said,
the day we first met at the research institute,
“If you are going to make it up here, don’t lock your doors.”
It seemed like a life philosophy, rather than a survival tip.

Actually, it should look like this with a line break (this mean automatically make an enter paragraph):

The melting Arctic is a crime scene.

J7 is the anonymous perpetrator leaving evidence and clues for me to discover,

like breadcrumbs leading back to him. James, he had said,

the day we first met at the research institute,

“If you are going to make it up here, don’t lock your doors.”

It seemed like a life philosophy, rather than a survival tip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants