The table below shows the quantitative results of the CHAIC benchmark. We report the average Transport Rate (TR), Efficiency Improvement (EI), Goal Inference Accuracy (IA), Completion Ratio of Helper (CR), and Standard Error of Transport Rate (STD_TR) here. w/o means the main agent does the task solely without a helper. The Emergency Rate (ER) metric is also reported for the shopping task.
TR(EI)↑ |
Indoor |
Outdoor |
|
Helper Agent |
Normal |
High Target |
High Container |
High Goalplace |
Lowthing |
Wheelchair |
Shopping |
Furniture |
Average |
w/o |
0.53 |
0.30 |
0.37 |
0.28 |
0.51 |
0.07 |
0.37 |
0.17 |
0.33 |
Random |
0.52(-0.02) |
0.27(-0.05) |
0.36(0.00) |
0.33(0.10) |
0.50(-0.01) |
0.21(0.56) |
0.39(0.05) |
0.48(0.68) |
0.38(0.16) |
RHP |
0.64(0.15) |
0.35(0.11) |
0.45(0.19) |
0.35(0.18) |
0.66(0.23) |
0.44(0.77) |
0.49(0.22) |
0.65(0.72) |
0.50(0.32) |
RL |
0.45(-0.19) |
0.26(-0.16) |
0.28(-0.25) |
0.25(-0.22) |
0.43(-0.16) |
0.11(0.07) |
0.32(-0.13) |
0.67(0.74) |
0.35(-0.04) |
SmartHelp |
0.46(-0.12) |
0.24(-0.17) |
0.26(-0.28) |
0.31(0.01) |
0.49(-0.04) |
0.13(0.11) |
0.32(-0.13) |
0.57(0.70) |
0.35(0.01) |
VLM |
0.63(0.14) |
0.33(0.06) |
0.43(0.12) |
0.26(-0.20) |
0.69(0.26) |
0.40(0.86) |
0.50(0.25) |
0.70(0.78) |
0.49(0.28) |
LLM+BM |
0.65(0.17) |
0.38(0.19) |
0.49(0.24) |
0.36(0.23) |
0.70(0.27) |
0.42(0.89) |
0.58(0.33) |
0.69(0.77) |
0.53(0.39) |
Oracle |
0.77(0.31) |
0.49(0.37) |
0.69(0.47) |
0.61(0.56) |
0.82(0.38) |
0.60(0.87) |
0.61(0.39) |
0.76(0.80) |
0.67(0.52) |
IA↑ |
Indoor |
Outdoor |
|
Helper Agent |
Normal |
High Target |
High Container |
High Goalplace |
Lowthing |
Wheelchair |
Shopping |
Average |
Random |
0.24 |
0.29 |
0.25 |
0.14 |
0.31 |
0.24 |
0.34 |
0.26 |
RHP |
0.15 |
0.29 |
0.21 |
0.21 |
0.28 |
0.17 |
0.44 |
0.25 |
VLM |
0.24 |
0.32 |
0.40 |
0.33 |
0.46 |
0.35 |
0.72 |
0.40 |
LLM+BM |
0.25 |
0.29 |
0.30 |
0.35 |
0.43 |
0.47 |
0.74 |
0.40 |
Oracle |
0.88 |
0.91 |
0.91 |
0.90 |
0.91 |
0.82 |
0.87 |
0.89 |
CR↑ |
Indoor |
Outdoor |
|
Helper Agent |
Normal |
High Target |
High Container |
High Goalplace |
Lowthing |
Wheelchair |
Shopping |
Furniture |
Average |
Random |
0.09 |
0.10 |
0.12 |
0.06 |
0.09 |
0.09 |
0.07 |
0.73 |
0.17 |
RHP |
0.15 |
0.43 |
0.29 |
0.39 |
0.36 |
0.19 |
0.34 |
0.74 |
0.36 |
VLM |
0.13 |
0.08 |
0.34 |
0.18 |
0.39 |
0.17 |
0.34 |
0.82 |
0.31 |
LLM+BM |
0.22 |
0.30 |
0.30 |
0.35 |
0.38 |
0.45 |
0.46 |
0.78 |
0.41 |
Oracle |
0.51 |
0.64 |
0.66 |
0.73 |
0.59 |
0.38 |
0.45 |
0.77 |
0.59 |
STD |
Indoor |
Outdoor |
|
Helper Agent |
Normal |
High Target |
High Container |
High Goalplace |
Lowthing |
Wheelchair |
Shopping |
Furniture |
Average |
w/o |
0.03 |
0.02 |
0.03 |
0.05 |
0.03 |
0.04 |
0.02 |
0.04 |
0.03 |
Random |
0.04 |
0.03 |
0.03 |
0.04 |
0.04 |
0.04 |
0.02 |
0.05 |
0.04 |
RHP |
0.02 |
0.04 |
0.03 |
0.05 |
0.03 |
0.04 |
0.02 |
0.04 |
0.03 |
VLM |
0.03 |
0.02 |
0.04 |
0.05 |
0.02 |
0.03 |
0.03 |
0.05 |
0.03 |
LLM+BM |
0.03 |
0.03 |
0.03 |
0.04 |
0.03 |
0.05 |
0.03 |
0.05 |
0.04 |
Oracle |
0.03 |
0.04 |
0.03 |
0.04 |
0.03 |
0.03 |
0.03 |
0.04 |
0.03 |
ER↓ |
Outdoor |
Helper Agent |
Shopping |
Random |
0.32 |
RHP |
0.30 |
VLM |
0.39 |
LLM+BM |
0.38 |
Oracle |
0.17 |