Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of Meta Data from Tika Parser for Office Documents #356

Open
a-nau opened this issue Jan 20, 2025 · 2 comments
Open

Handling of Meta Data from Tika Parser for Office Documents #356

a-nau opened this issue Jan 20, 2025 · 2 comments
Assignees
Labels
bugfix Inconsistencies or issues which will cause a problem for users or implementers. stale There has not been activity on this issue or PR for quite some time.

Comments

@a-nau
Copy link

a-nau commented Jan 20, 2025

Hi @tb1337,

thanks for the awesome work! I'm using this scripts from The-Compiler to double check imports.

I use Tika + Gotenberg (see docs) to handle office documents and when I run the mentioned script, I get

Value ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.odf.OpenDocumentParser'] of type <class 'list'> is 
invalid for DocumentMetadataType.value, expected value of type str | None
Value ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser'] of type <class 'list'> 
is invalid for DocumentMetadataType.value, expected value of type str | None
Value ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.OfficeParser'] of type <class 'list'> is 
invalid for DocumentMetadataType.value, expected value of type str | None

It seems like docx etc. documents are still found when running remote = get_remote(), but I wanted to point this out in case it's easy to fix.

Best
Alex

@tb1337
Copy link
Owner

tb1337 commented Feb 2, 2025

Thank you for applying this issue, I gonna check that.

@tb1337 tb1337 self-assigned this Feb 2, 2025
@tb1337 tb1337 added the bugfix Inconsistencies or issues which will cause a problem for users or implementers. label Feb 2, 2025
Copy link
Contributor

github-actions bot commented Mar 4, 2025

There hasn't been any activity on this issue recently, so we have to clean up some inactive issues.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by leaving a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale There has not been activity on this issue or PR for quite some time. label Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Inconsistencies or issues which will cause a problem for users or implementers. stale There has not been activity on this issue or PR for quite some time.
Projects
None yet
Development

No branches or pull requests

2 participants