Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text classification to inference client #1606

Merged

Conversation

martinbrose
Copy link
Contributor

Add text-classification to HuggingFace🤗 Hub

References #1539

This is an ongoing list of model tasks to implement in the Hugging Face Hub inference client. Each task is planned to be its own PR. The task for this is text-classification.

Key Features

  • Modifies _client.py to call a text classification model.
  • Modifies testing and documentation to reflect changes.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 20, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @martinbrose, I'm starting to review all your inference-related PRs. Thanks a ton for the massive work, it's good quality for what I saw! 👏 🙏 I have left some comments on this PR that are also relevant to the other PRs (especially simplifying as much as possible the methods signature + the merge conflict issue). In the meantime, I'll take the time to thoroughly review the other PRs.

FYI, I'm off this Thursday/Friday and be fully back on the project starting from next week :)

src/huggingface_hub/inference/_client.py Outdated Show resolved Hide resolved
src/huggingface_hub/inference/_client.py Outdated Show resolved Hide resolved
docs/source/guides/inference.md Outdated Show resolved Hide resolved
tests/test_inference_client.py Outdated Show resolved Hide resolved
@martinbrose
Copy link
Contributor Author

Hey @martinbrose, I'm starting to review all your inference-related PRs. Thanks a ton for the massive work, it's good quality for what I saw! 👏 🙏 I have left some comments on this PR that are also relevant to the other PRs (especially simplifying as much as possible the methods signature + the merge conflict issue). In the meantime, I'll take the time to thoroughly review the other PRs.

FYI, I'm off this Thursday/Friday and be fully back on the project starting from next week :)

Thanks for the review!
Apologies for this long list of separate PRs. I started with one... and then couldn't stop using them as bite-sized challenges at night.

@codecov
Copy link

codecov bot commented Sep 4, 2023

Codecov Report

Patch coverage: 62.50% and project coverage change: -0.53% ⚠️

Comparison is base (b94f891) 82.30% compared to head (2cca3fd) 81.78%.

❗ Current head 2cca3fd differs from pull request most recent head 206c06b. Consider uploading reports for the commit 206c06b to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1606      +/-   ##
==========================================
- Coverage   82.30%   81.78%   -0.53%     
==========================================
  Files          62       60       -2     
  Lines        6964     6785     -179     
==========================================
- Hits         5732     5549     -183     
- Misses       1232     1236       +4     
Files Changed Coverage Δ
...gingface_hub/inference/_generated/_async_client.py 58.37% <25.00%> (-0.62%) ⬇️
src/huggingface_hub/inference/_client.py 79.51% <100.00%> (+0.40%) ⬆️

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes @martinbrose :) I left some comments but most of them are due to the change between a multi-text input and a single-text input. Once those are addressed, I think we'll be good to merge 🚀

Comment on lines 794 to 795
>>> output
{'label': 'POSITIVE', 'score': 0.9998695850372314}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert back to list output

Suggested change
>>> output
{'label': 'POSITIVE', 'score': 0.9998695850372314}
[[{'label': 'POSITIVE', 'score': 0.9998695850372314}, {'label': 'NEGATIVE', 'score': 0.0001304351753788069}]]

Comment on lines 798 to 803
payload: Dict[str, Any] = {"inputs": text}
response = self.post(
json=payload,
model=model,
task="text-classification",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) having a separate payload variable is fine as well but when it's tiny like this one (no parameter other than inputs), I prefer to pass it directly to .post method. No big deal anyway.

Suggested change
payload: Dict[str, Any] = {"inputs": text}
response = self.post(
json=payload,
model=model,
task="text-classification",
)
response = self.post(json={"inputs": text}, model=model, task="text-classification")

Comment on lines 802 to 803
>>> output
{'label': 'POSITIVE', 'score': 0.9998695850372314}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> output
{'label': 'POSITIVE', 'score': 0.9998695850372314}
[{'label': 'POSITIVE', 'score': 0.9998695850372314}, {'label': 'NEGATIVE', 'score': 0.0001304351753788069}]

(same as sync version)

Comment on lines 806 to 811
payload: Dict[str, Any] = {"inputs": text}
response = await self.post(
json=payload,
model=model,
task="text-classification",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
payload: Dict[str, Any] = {"inputs": text}
response = await self.post(
json=payload,
model=model,
task="text-classification",
)
response = await self.post(json={"inputs": text}, model=model, task="text-classification")

(same as sync version)

@@ -0,0 +1,48 @@
interactions:
- request:
body: '{"inputs": ["I like you", "I love you."]}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
body: '{"inputs": ["I like you", "I love you."]}'
body: '{"inputs": ["I like you"]}'

should be only 1 sample now

uri: https://api-inference.huggingface.co/models/distilbert-base-uncased-finetuned-sst-2-english
response:
body:
string: '[[{"label":"POSITIVE","score":0.9998695850372314},{"label":"NEGATIVE","score":0.0001304351753788069}],[{"label":"POSITIVE","score":0.9998705387115479},{"label":"NEGATIVE","score":0.00012938841246068478}]]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
string: '[[{"label":"POSITIVE","score":0.9998695850372314},{"label":"NEGATIVE","score":0.0001304351753788069}],[{"label":"POSITIVE","score":0.9998705387115479},{"label":"NEGATIVE","score":0.00012938841246068478}]]'
string: '[[{"label":"POSITIVE","score":0.9998695850372314},{"label":"NEGATIVE","score":0.0001304351753788069}]]'

... and therefore only 1 response

Comment on lines 208 to 209
self.assertIsInstance(item[0]["score"], float)
self.assertIsInstance(item[0]["label"], str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertIsInstance(item[0]["score"], float)
self.assertIsInstance(item[0]["label"], str)
self.assertIsInstance(item["score"], float)
self.assertIsInstance(item["label"], str)

1 level less

model=model,
task="text-classification",
)
return _bytes_to_list(response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return _bytes_to_list(response)
return _bytes_to_list(response)[0]

Since we take as input only a str (not a List[str]), we need to output the first item returned (since the server returns a list of list of items).

model=model,
task="text-classification",
)
return _bytes_to_list(response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return _bytes_to_list(response)
return _bytes_to_list(response)[0]

(same as sync version)

@Wauplin
Copy link
Contributor

Wauplin commented Sep 6, 2023

I've merged the suggested changes (see above) and tried it locally. It works great! :)
Will merge the PR once CI is green.

@Wauplin Wauplin merged commit 0cd7405 into huggingface:main Sep 6, 2023
14 checks passed
@martinbrose martinbrose deleted the 1539-InferenceClient-text-classification branch September 6, 2023 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants