You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When partitioning partition("/path/file.doc") I seem to receive also names of interface elements (from libreoffice?) in the text output such as:
Luisteren
Fonetisch lezen
Woordenboek - Gedetailleerd woordenboek weergeven
Note: The text itself is in English, as is my system -- these elements are for some reason in Dutch (I am indeed in the Netherlands).
To Reproduce
from unstructured.partition.auto import partition
elements = partition("/path/file.doc")
print("\n".join([str(el) for el in elements[:100]]))
Expected behavior
I would not expect interface elements to be part of the processed output.
The text was updated successfully, but these errors were encountered:
@SlawaLoev-KSO can you be more specific about what you mean by "interface elements"? For example, do you mean menu-bar options perhaps? or something like form-field labels?
tbh I don't exactly know what it is -- just weird words (sounding like it could be interface elements) in the text output that are not in the .doc itself, I just guessed at what it is.
Describe the bug
When partitioning
partition("/path/file.doc")
I seem to receive also names of interface elements (from libreoffice?) in the text output such as:Luisteren
Fonetisch lezen
Woordenboek - Gedetailleerd woordenboek weergeven
Note: The text itself is in English, as is my system -- these elements are for some reason in Dutch (I am indeed in the Netherlands).
To Reproduce
Expected behavior
I would not expect interface elements to be part of the processed output.
The text was updated successfully, but these errors were encountered: