-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer in bert.cpp is not good enough, how about tokenizers-cpp
#36
Comments
Using a mature implementation is helpful
|
Great! |
This is a very exiting direction, and huge props to FFenglll for getting this working. The usecase I originally had for this project is no longer valid, so I'm not as invested in making this library "production quality". So I have 2 suggestions on how to share your changes:
|
@skeskinen Thanks for your suggestions. After thinking, I wish to build a new repo (actually still a fork) with name
The major reason is what I want is just an efficient text embedding tool which can be deployed standalone. I've worked on this area for some time, and very glad to see Some other minor reasons might be
|
@cgisky1980 both of them work well. |
THX. where is the new repo? |
@FFengIll need embedding.cpp |
Here is the repo: https://github.com/FFengIll/embedding.cpp And I must remind that it is WIP and not stable enough. |
As mention in title,
https://github.com/mlc-ai/tokenizers-cpp
is a good implement for token.Maybe persons do not like another dependency, but it is worthy.
The text was updated successfully, but these errors were encountered: