You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@socketio.on('chat_message')
def handle_message(data):
system = data.get('system', '')
user_prompt = data.get('prompt', '')
print(user_prompt)
user_id = data.get("room", "")
if system != "":
prompt = prompt_template(system=system , prompt= user_prompt)
else:
prompt = prompt_template(prompt = user_prompt)
stream = data.get('stream', False)
if stream ==False:
stream = False
else:
print("stream==", stream)
stream = stream
max_token = data.get('max_token',4096 )
if max_token<=4096:
max_token = max_token
else:
max_token = max_token
try:
response = lcpp_llm(
prompt=prompt,
max_tokens=max_token,
temperature=0.5,
top_p=0.95,
repeat_penalty=1.2,
top_k=50,
echo=False,
stream=stream) # it got True
if stream==False:
emit('response', response["choices"][0]["text"], broadcast=True)
for i in response:
chunks = i["choices"][0]["text"]
print("==>",chunks)
emit('response', chunks ,room=user_id)
except Exception as e:
error_message = f"error: {str(e)}"
emit('response', error_message, room=user_id)
the problem is the stream response chunks print i can see but the response chunks did not emit until all response complete why??
Terminal response
write two line of best quote
stream== True
Llama.generate: prefix-match hit
==> Sure
==> !
==> Here
==> are
==> two
==> lines
==> of
==> insp
==> iring
==> quotes
==> for
==> you
==> :
==>
==>
==> "
==> The
==> future
==> belongs
==> to
==> those
==> who
==> believe
==> in
==> the
==> beauty
==> of
==> their
==> dream
==> s
==> ."
==> -
==> Ele
==> an
==> or
==> Ro
==> ose
==> vel
==> t
==>
==>
==> "
==> Bel
==> ieve
==> you
==> can
==> and
==> you
==> '
==> re
==> half
==> way
==> there
==> ."
==> -
==> The
==> odore
==> Ro
==> ose
==> vel
llama_print_timings: load time = 907.76 ms
llama_print_timings: sample time = 35.10 ms / 62 runs ( 0.57 ms per token, 1766.38 tokens per second)
llama_print_timings: prompt eval time = 546.49 ms / 14 tokens ( 39.03 ms per token, 25.62 tokens per second)
llama_print_timings: eval time = 3314.18 ms / 61 runs ( 54.33 ms per token, 18.41 tokens per second)
llama_print_timings: total time = 4036.60 ms
after this print i got all data on front end why it did not work like chatgpt api do ?
i use the same for loop for chatgpt api and can emit each chunks in real time streaming but i got problem with this??
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi @everyone hope you all doing well.
i am using this in flasksocketio and the code is
the problem is the stream response chunks print i can see but the response chunks did not emit until all response complete why??
Terminal response
after this print i got all data on front end why it did not work like chatgpt api do ?
i use the same for loop for chatgpt api and can emit each chunks in real time streaming but i got problem with this??
Any Help would be appreciated
Beta Was this translation helpful? Give feedback.
All reactions