-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why MLC_ENABLE_SENTENCEPIECE_TOKENIZER OFF by default? #45
Comments
Hi,have you figured that out? I have the same issue now |
cc @MasterJH5574 shall we turn it on by default (we can always add turn off by default in downstream)? |
Sorry for the delayed response. Yes we can enable it. Will follow up in these two days. |
We enabled SentencePiece in this PR #47 and have bumped it in mlc-llm accordingly mlc-ai/mlc-llm#3025. Please check out the latest code, thanks! |
Hi,I open MLC_ENABLE_SENTENCEPIECE_TOKENIZER ON . The tokenized result is different from transformers. How can I resolve ? ==============================tokenizers-cpp================== void TestTokenizer(std::unique_ptr tok, bool print_vocab = false, //...... ==============================python transfomers====================== |
Should
MLC_ENABLE_SENTENCEPIECE_TOKENIZER
be on by default inCMakeLists.txt
? I had to turn it on in order to successfully run./build_and_run.sh
to build the example target. Otherwise, I get a assert failure atsrc/sentencepiece_tokenizer.cc
:The text was updated successfully, but these errors were encountered: