You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry I am new to this.
Say I have a book full of independent articles, and I am trying to run alpaca in a loop, but for every run, I want it to forget everything and start new. (Basically the opposite of chat, make model remember zero previous context) To do this, based on my limited understanding, I need to set last_n_tokens_size to 0 (Is this even correct?)
llm = Llama(model_path="some_model.bin",n_ctx=1024,n_batch=1024, last_n_tokens_size=0, n_gpu_layers=2000000)
for article in book:
output=evaluate(llm,article)
to make predictions.
Unfortunately the code above will cause an error:
File "/***/lib/python3.10/site-packages/llama_cpp/llama.py", line 1328, in __call__
return self.create_completion(
File "/***/lib/python3.10/site-packages/llama_cpp/llama.py", line 1280, in create_completion
completion: Completion = next(completion_or_chunks) # type: ignore
File "/***/lib/python3.10/site-packages/llama_cpp/llama.py", line 872, in _create_completion
for token in self.generate(
File "/***/lib/python3.10/site-packages/llama_cpp/llama.py", line 695, in generate
token = self.sample(
File "/***/lib/python3.10/site-packages/llama_cpp/llama.py", line 620, in sample
last_n_tokens_data=(llama_cpp.llama_token * self.last_n_tokens_size)(
IndexError: invalid index
Interestingly, if I change last_n_tokens_size to 1, it will run smoothly.
My question is:
Is last_n_tokens_size=0 even the correct way to make it forget everything before? If not, how?
Is it ok to put llm = Llama(model=***) outside the for loop if I want each run not interfere each other's result, or do I need to put it inside the for loop?
If anyone also making predictions that are independent for each run, could you share some examples?
I apologize if the questions are silly.
Thanks in advance!
This discussion was converted from issue #519 on July 30, 2023 06:59.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Sorry I am new to this.
Say I have a book full of independent articles, and I am trying to run alpaca in a loop, but for every run, I want it to forget everything and start new. (Basically the opposite of chat, make model remember zero previous context) To do this, based on my limited understanding, I need to set last_n_tokens_size to 0 (Is this even correct?)
to make predictions.
Unfortunately the code above will cause an error:
Interestingly, if I change last_n_tokens_size to 1, it will run smoothly.
My question is:
I apologize if the questions are silly.
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions