Skip to content

XiandaGuo/Drive-MLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving

DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving

Xianda Guo*, Ruijun Zhang*, Yiqun Duan*, Yuhang He, Chenming Zhang, Long Chen.

News

  • [2024/11] Paper released on arXiv.

Overall

vis

Getting Started

0. Prepare Dataset

We are using the Hugging Face dataset DriveMLLM for evaluation. The images are sourced from the validation set of nuScenes. We have provided a metadata.jsonl file for all images, allowing users to easily access properties such as bboxes2D.

2. VQAs generation

Run the following code to download the dataset, generate the VQAs, and save them in the eval_vqas folder.

python hfdata_to_VQA.py

3. Inference

Run inference according to your requirements:

  • For GPT API calls:
export OPENAI_API_KEY=your_api_key

python inference/get_mllm_output.py \
--model_type gpt \
--model gpt-4o \
--vqas_dir eval_vqas \
--save_dir inference/mllm_outputs
  • For Gemini API calls:
export GOOGLE_API_KEY=your_api_key

python inference/get_mllm_output.py \
--model_type gemini \
--model models/gemini-2.0-flash \
--vqas_dir eval_vqas \
--save_dir inference/mllm_outputs
  • For Local LLaVA-Next inference:
python inference/get_mllm_output.py \
--model_type llava \
--model lmms-lab/llava-onevision-qwen2-7b-si \
--vqas_dir eval_vqas \
--save_dir inference/mllm_outputs
  • For Local QWen2-VL inference:
python inference/get_mllm_output.py \
--model_type qwen \
--model Qwen/Qwen2.5-VL-7B-Instruct \
--vqas_dir eval_vqas \
--save_dir inference/mllm_outputs

After executing the script, the results will be saved in the directory: {save_dir}/{model_type}/{model}.

Citation

@article{DriveMLLM,
        title={DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving},
        author={Guo, Xianda and Zhang Ruijun and Duan Yiqun and He Yuhang and Zhang, Chenming and Chen, Long},
        journal={arXiv preprint arXiv:2411.13112},
        year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages