Skip to content

Commit

Permalink
get_mime_type bugfix
Browse files Browse the repository at this point in the history
pdf_object.attrs.get() return PSLiteral object not a dict.
Because they returned dict, It raised Exception when they call .name
So I fixed it.
  • Loading branch information
dongcartney92 authored Oct 24, 2024
1 parent 0986c20 commit e997dd5
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions src/openparse/text/pdfminer/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
LTTextContainer,
LTTextLine,
)
from pdfminer.psparser import PSLiteral
from pydantic import BaseModel, model_validator

from openparse.pdf import Pdf
Expand Down Expand Up @@ -64,8 +65,8 @@ def _extract_chars(text_line: LTTextLine) -> List[CharElement]:


def get_mime_type(pdf_object: LTImage) -> Optional[str]:
subtype = pdf_object.stream.attrs.get("Subtype", {"name": None}).name
filter_ = pdf_object.stream.attrs.get("Filter", {"name": None}).name
subtype = pdf_object.stream.attrs.get("Subtype", PSLiteral(None)).name
filter_ = pdf_object.stream.attrs.get("Filter", PSLiteral(None)).name
if subtype == "Image":
if filter_ == "DCTDecode":
return "image/jpeg"
Expand Down

0 comments on commit e997dd5

Please sign in to comment.