I hope to find a way to remove headers and footers #433

bjtangseng · 2024-12-19T12:40:16Z

I used the marker project and felt that it was very good. I don't know if it was a problem with my use or if I didn't pay attention to some details.
I hope to find a way to filter out PDFs without footers, because the content in those areas is generally some irrelevant badges or some common languages. I don't know if a parameter can be added to reduce the interference of these useless information on the results of file conversion.

Thank you.

VikParuchuri · 2024-12-19T17:45:07Z

Can you please share an example PDF?

bjtangseng · 2024-12-20T06:50:46Z

Thank you very much for your reply.
I will give you a sample file. This file is a PDF file that can be searched publicly in China and does not involve confidentiality issues.
You will find that the header of the first page will have a logo and the address of the organization that wrote this file. From the second page, there will be some small headers with logos. Some files will also have some footers, mainly some information such as the organization introduction and disclaimer.

I hope to add a parameter to skip this information, because I see that Surya can analyze the layout and also give clear footer and header positioning areas. Can it be used as an exclusion item and not perform corresponding identification and operations?

Thank you

fileView.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I hope to find a way to remove headers and footers #433

I hope to find a way to remove headers and footers #433

bjtangseng commented Dec 19, 2024

VikParuchuri commented Dec 19, 2024

bjtangseng commented Dec 20, 2024

I hope to find a way to remove headers and footers #433

I hope to find a way to remove headers and footers #433

Comments

bjtangseng commented Dec 19, 2024

VikParuchuri commented Dec 19, 2024

bjtangseng commented Dec 20, 2024