You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Thank you for your great work!
We are trying to test the latency of MagicPIG. Also, we adjust the hyperparameter K and L to K=10 and L=100 in order to find a better latency.
However, we find that TTFT and TPOT seem high on InfiniteBench.
Our partial results are as follows:
for k in range(GEN_LEN):
st = time.time()
input_ids = logits.argmax(dim=-1)
logits = llm.inference(input_ids=input_ids, position_ids=position_ids[:,PREFIX_LEN + k:PREFIX_LEN + k + 1])
output.append(input_ids.item())
en = time.time()
total_decode_time.append(en-st)
if input_ids.item() in config["eos"]:
break
TPOT = sum(total_decode_time) / len(total_decode_time)
We would like to know the reason of high latency and if there is any error in our implementation.
The text was updated successfully, but these errors were encountered:
Usually, there are several reasons resulting in high latency
Other programs are running on the CPUs. Currently, we bind the attention computation to 64 CPU cores, if any of them are occupied or shared by other processes, the latency will be high.
Check the number of physical cores of your cluster. We use 64 cores by default, if your physical cores (not hyper-thread) are fewer than 64, you need to manually decrease the OpenMP threads in lsh.cc and sparse_attention.cc.
The performance of MagicPIG largely depends on the status of CPUs.
Besides, can you run the models/bench.sh file? We add this in v0.2 branch.
Hi! Thank you for your great work!
We are trying to test the latency of MagicPIG. Also, we adjust the hyperparameter K and L to K=10 and L=100 in order to find a better latency.
However, we find that TTFT and TPOT seem high on InfiniteBench.
Our partial results are as follows:
We would like to know the reason of high latency and if there is any error in our implementation.
The text was updated successfully, but these errors were encountered: