-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add doughnut http endpoint #230
base: main
Are you sure you want to change the base?
Conversation
…ngest into feat/doughnut-http-endpoint-1
@@ -62,7 +64,8 @@ def _update_metadata(row: pd.Series, cached_client: Any, deplot_client: Any, tra | |||
# Only modify if content type is structured and subtype is 'chart' and chart_metadata exists | |||
if ((content_metadata.get("type") != "structured") or | |||
(content_metadata.get("subtype") != "chart") or | |||
(chart_metadata is None)): | |||
(chart_metadata is None) or | |||
(chart_metadata.get("table_format") != TableFormatEnum.IMAGE)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. Even if the table_format is image, we can still extract the content in the chart extractor. Am I thinking about this wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this so that the tables extracted from the Doughnut model don't go through the table/chart extraction stages. Doughnut tables will already have text (as LaTex), so they don't need to go through the table/chart extraction stages. YOLOX tables need text extraction in table/chart extraction stages, and they are tagged as IMAGE tables so they do get processed in these stages. I'm not sure if that made sense, but I needed a way to skip table/chart extraction for tables identified by Doughnut, and thought TableFormat could be useful here to distinguish between yolox (== TableFormatEnum.IMAGE) and doughnut (==TableFormatEnum.LATEX).
@@ -63,7 +67,8 @@ def _update_metadata(row: pd.Series, paddle_client: Any, paddle_version: Any, tr | |||
# Only modify if content type is structured and subtype is 'table' and table_metadata exists | |||
if ((content_metadata.get("type") != "structured") or | |||
(content_metadata.get("subtype") != "table") or | |||
(table_metadata is None)): | |||
(table_metadata is None) or | |||
(table_metadata.get("table_format") != TableFormatEnum.IMAGE)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here.
|
||
def _call_image_inference_http_client(client, model_name: str, image_data: np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're getting a bit too specific in terms of data handling here. We probably want to standardize on a data format for each model and process it into that before we get to this function.
Small nitpick: please use base64_img(s) or base64_image(s), rather than base64_img and base64_images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned it up a bit in c5b5d3e by basically grouping doughnut together with deplot and removing batching. Similarly to deplot, request-level batching is currently not supported in doughnut, and they have the same output format (for now). Note: The input/output format is not finalized and will likely change and actually be closer to the yolox output format so there will be more PRs to follow.
321052f adds support for preserving the text bounding boxes in the metadata (in the |
Description
Part of https://github.com/NVIDIA/nv-ingest-private/issues/52
Also closes https://github.com/NVIDIA/nv-ingest-private/issues/49.
Checklist