Question in testing latency #3

alkane7 · 2024-12-14T04:11:49Z

Hi! Thank you for your great work!
We are trying to test the latency of MagicPIG. Also, we adjust the hyperparameter K and L to K=10 and L=100 in order to find a better latency.
However, we find that TTFT and TPOT seem high on InfiniteBench.
Our partial results are as follows:

        code_debug    code_run 
TTFT(s)    630.37       140   
TPOT(s)    1.246       0.84

for k in range(GEN_LEN):
  st = time.time()
  input_ids = logits.argmax(dim=-1)
  logits = llm.inference(input_ids=input_ids, position_ids=position_ids[:,PREFIX_LEN + k:PREFIX_LEN + k + 1])
  output.append(input_ids.item())
  en = time.time()
  total_decode_time.append(en-st)
  if input_ids.item() in config["eos"]:
      break
TPOT = sum(total_decode_time) / len(total_decode_time)

We would like to know the reason of high latency and if there is any error in our implementation.

The text was updated successfully, but these errors were encountered:

dreaming-panda · 2024-12-15T06:25:01Z

Usually, there are several reasons resulting in high latency

Other programs are running on the CPUs. Currently, we bind the attention computation to 64 CPU cores, if any of them are occupied or shared by other processes, the latency will be high.
Check the number of physical cores of your cluster. We use 64 cores by default, if your physical cores (not hyper-thread) are fewer than 64, you need to manually decrease the OpenMP threads in lsh.cc and sparse_attention.cc.

The performance of MagicPIG largely depends on the status of CPUs.

Besides, can you run the models/bench.sh file? We add this in v0.2 branch.

dreaming-panda · 2024-12-15T06:57:25Z

In some cases, when you cannot use numactl, add OMP_NUM_THREADS=64 can help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question in testing latency #3

Question in testing latency #3

alkane7 commented Dec 14, 2024

dreaming-panda commented Dec 15, 2024 •

edited

Loading

dreaming-panda commented Dec 15, 2024

Question in testing latency #3

Question in testing latency #3

Comments

alkane7 commented Dec 14, 2024

dreaming-panda commented Dec 15, 2024 • edited Loading

dreaming-panda commented Dec 15, 2024

dreaming-panda commented Dec 15, 2024 •

edited

Loading