Example with stream = True
?
#319
-
Hi, is there an example on how to use (In general, I think a few more examples in the documentation would be great.) |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
Here is a one way to do it. prompt = """
# Task
Name the planets in the solar system?
# Answer
"""
# With stream=True, the output is of type `Iterator[CompletionChunk]`.
output = llm.create_completion(prompt, stop=["# Question"], echo=True, stream=True)
# Iterate over the output and print it.
for item in output:
print(item['choices'][0]['text'], end='') |
Beta Was this translation helpful? Give feedback.
-
here is how streaming works for me :
|
Beta Was this translation helpful? Give feedback.
-
and here how to use on llama cpp python[server]: import time, requests, json
# record the time before the request is sent
start_time = time.time()
# prepare the request payload
payload = {
'messages': [
{'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
],
'temperature': 0,
'stream': True # streaming
}
# send the request to the LLaMA CPP server
response = requests.post('http://localhost:8000/v1/chat/completions', json=payload, stream=True)
# create variables to collect the stream of chunks
collected_chunks = []
collected_messages = []
# iterate through the stream of events
for line in response.iter_lines():
if line:
decoded_line = line.decode('utf-8')
print(f"Raw line received: {decoded_line}")
if decoded_line.startswith('data:'):
json_data = decoded_line[len('data:'):].strip() # Remove prefix 'data:' and white spaces
if json_data == '[DONE]': # if is the end of streaming
print("Stream completed")
break
try:
chunk = json.loads(json_data)
chunk_time = time.time() - start_time
collected_chunks.append(chunk)
if 'choices' in chunk and chunk['choices']:
chunk_message = chunk['choices'][0]['delta'].get('content', '') # get message
if chunk_message:
collected_messages.append(chunk_message) # save message
print(f"Message received {chunk_time:.2f} seconds after request: {chunk_message}")
except json.JSONDecodeError as e:
print(f"Failed to decode JSON: {e}")
continue
if 'chunk_time' in locals():
print(f"Full response received {chunk_time:.2f} seconds after request")
else:
print("No valid response received from the server")
collected_messages = [m for m in collected_messages if m]
full_reply_content = ''.join(collected_messages)
print(f"Full conversation received: {full_reply_content}") |
Beta Was this translation helpful? Give feedback.
-
I'm using this hyperparameter in conjunction with enumerate function to capture the order of each generated token. The strings in the position of the values of each hyperparameter are variables adjusted by the user in the LLM interface we are developing:
Samantha Interface Assistant project: |
Beta Was this translation helpful? Give feedback.
Here is a one way to do it.