Skip to content

Latest commit

 

History

History
396 lines (394 loc) · 9.1 KB

results.md

File metadata and controls

396 lines (394 loc) · 9.1 KB

🏆 Results

The table below shows the quantitative results of the CHAIC benchmark. We report the average Transport Rate (TR), Efficiency Improvement (EI), Goal Inference Accuracy (IA), Completion Ratio of Helper (CR), and Standard Error of Transport Rate (STD_TR) here. w/o means the main agent does the task solely without a helper. The Emergency Rate (ER) metric is also reported for the shopping task.

TR(EI) Indoor Outdoor
Helper Agent Normal High Target High Container High Goalplace Lowthing Wheelchair Shopping Furniture Average
w/o 0.53 0.30 0.37 0.28 0.51 0.07 0.37 0.17 0.33
Random 0.52(-0.02) 0.27(-0.05) 0.36(0.00) 0.33(0.10) 0.50(-0.01) 0.21(0.56) 0.39(0.05) 0.48(0.68) 0.38(0.16)
RHP 0.64(0.15) 0.35(0.11) 0.45(0.19) 0.35(0.18) 0.66(0.23) 0.44(0.77) 0.49(0.22) 0.65(0.72) 0.50(0.32)
RL 0.45(-0.19) 0.26(-0.16) 0.28(-0.25) 0.25(-0.22) 0.43(-0.16) 0.11(0.07) 0.32(-0.13) 0.67(0.74) 0.35(-0.04)
SmartHelp 0.46(-0.12) 0.24(-0.17) 0.26(-0.28) 0.31(0.01) 0.49(-0.04) 0.13(0.11) 0.32(-0.13) 0.57(0.70) 0.35(0.01)
VLM 0.63(0.14) 0.33(0.06) 0.43(0.12) 0.26(-0.20) 0.69(0.26) 0.40(0.86) 0.50(0.25) 0.70(0.78) 0.49(0.28)
LLM+BM 0.65(0.17) 0.38(0.19) 0.49(0.24) 0.36(0.23) 0.70(0.27) 0.42(0.89) 0.58(0.33) 0.69(0.77) 0.53(0.39)
Oracle 0.77(0.31) 0.49(0.37) 0.69(0.47) 0.61(0.56) 0.82(0.38) 0.60(0.87) 0.61(0.39) 0.76(0.80) 0.67(0.52)
IA Indoor Outdoor
Helper Agent Normal High Target High Container High Goalplace Lowthing Wheelchair Shopping Average
Random 0.24 0.29 0.25 0.14 0.31 0.24 0.34 0.26
RHP 0.15 0.29 0.21 0.21 0.28 0.17 0.44 0.25
VLM 0.24 0.32 0.40 0.33 0.46 0.35 0.72 0.40
LLM+BM 0.25 0.29 0.30 0.35 0.43 0.47 0.74 0.40
Oracle 0.88 0.91 0.91 0.90 0.91 0.82 0.87 0.89
CR Indoor Outdoor
Helper Agent Normal High Target High Container High Goalplace Lowthing Wheelchair Shopping Furniture Average
Random 0.09 0.10 0.12 0.06 0.09 0.09 0.07 0.73 0.17
RHP 0.15 0.43 0.29 0.39 0.36 0.19 0.34 0.74 0.36
VLM 0.13 0.08 0.34 0.18 0.39 0.17 0.34 0.82 0.31
LLM+BM 0.22 0.30 0.30 0.35 0.38 0.45 0.46 0.78 0.41
Oracle 0.51 0.64 0.66 0.73 0.59 0.38 0.45 0.77 0.59
STD Indoor Outdoor
Helper Agent Normal High Target High Container High Goalplace Lowthing Wheelchair Shopping Furniture Average
w/o 0.03 0.02 0.03 0.05 0.03 0.04 0.02 0.04 0.03
Random 0.04 0.03 0.03 0.04 0.04 0.04 0.02 0.05 0.04
RHP 0.02 0.04 0.03 0.05 0.03 0.04 0.02 0.04 0.03
VLM 0.03 0.02 0.04 0.05 0.02 0.03 0.03 0.05 0.03
LLM+BM 0.03 0.03 0.03 0.04 0.03 0.05 0.03 0.05 0.04
Oracle 0.03 0.04 0.03 0.04 0.03 0.03 0.03 0.04 0.03
ER Outdoor
Helper Agent Shopping
Random 0.32
RHP 0.30
VLM 0.39
LLM+BM 0.38
Oracle 0.17