Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Deprecate/Stop TorchText releases starting with Pytorch release 2.4 #2250

Open
atalman opened this issue Mar 28, 2024 · 9 comments
Open

Comments

@atalman
Copy link
Contributor

atalman commented Mar 28, 2024

🚀 Deprecation of TorchText releases

As of September 2023 we have paused active development of TorchText because our focus has shifted away from building out this library offering.

We would like to do the following:

  • For TorchText Release 0.17.2 and 0.18.x TorchData dependency is removed from TorchText [COMPLETED]
    • Users can still install torchdata manually to use the datasets
  • Minor PyTorch Release 2.3 will be last Release where we release TorchText 0.18
  • Starting PyTorch Release 2.4 we would like to stop releasing TorchText.
  • TorchText will still be available from nightlies on a best effort basis, with no guarantee that we'll be fixing issues and breakages

For reference here is the PyTorch Release schedule:
https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-cadence

cc @seemethere @malfet @matthewdzmura @NicolasHug

@agunapal
Copy link

Do we recommend any alternatives? Ex: TorchServe has a text_classifier handler and tests associated with these ( uses TorchText)
https://github.com/pytorch/serve/blob/master/ts/torch_handler/text_classifier.py

So, wondering whats the strategy. Should we replace it with HuggingFace and PyTorch would come up another solution at a later date?

@agunapal
Copy link

Can we release TorchText in PyTorch 2.3 for all platforms (ex: aarch64, not sure what other platform has this missing for PyTorch 2.2) ?

@atalman
Copy link
Contributor Author

atalman commented Mar 28, 2024

Yes. we will release same set of binaries as for PyTorch 2.2:
https://hud2.pytorch.org/hud/pytorch/text/release%2F0.18/1?per_page=50

@NicolasHug
Copy link
Member

Do we recommend any alternatives?

This would be case-by-case. For the TorchServe example the simple alternative is to copy/paste the one functionality that was used from torchtext into the example. It's very short and simple, so that's a viable solution.

https://github.com/pytorch/text/blob/main/torchtext/data/utils.py#L207-L228

@atalman atalman pinned this issue Apr 2, 2024
@agunapal
Copy link

agunapal commented Apr 2, 2024

Do we recommend any alternatives?

This would be case-by-case. For the TorchServe example the simple alternative is to copy/paste the one functionality that was used from torchtext into the example. It's very short and simple, so that's a viable solution.

https://github.com/pytorch/text/blob/main/torchtext/data/utils.py#L207-L228

Thanks. This seems like a good idea. We also use from torchtext.data.utils import get_tokenizer . Looking at the code, it doesn't seem too complicated to copy paste it for basic_english

@atalman
Copy link
Contributor Author

atalman commented Apr 10, 2024

cc @matthewdzmura @seemethere : releng team and @malfet propose to stop releasing TorchText as of release 2.3 since we can't ensure the quality of the release.

@HadiSDev
Copy link

What would be the alternative if I need a preprocessing for bert / vocab / regex operations compiled with my model?

@ffquintella
Copy link

keras.io is a viable alternative ...

@gluefox
Copy link

gluefox commented May 2, 2024

Is there any alternative for c++-only environments, that need native tokenizers now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants