ros2 workspace for a llm-driven robot (object tracekr, semantic map .. with additional hardware including jetson orin, a 4 wheel rover, mid 360 lidar and additional packages to run localization, navigation, and LLM command control: https://github.com/Innowing-robotics-interns/Rover_Official
- locate object's mask in the first frame and track the object's center at each frame
- owl_vit -> segment anything -> XMEM -> depth deptroject to 3D
- hugging face owl vit: https://huggingface.co/docs/transformers/model_doc/owlvit
- segment anything: https://github.com/facebookresearch/segment-anything
- XMEM: https://github.com/hkchengrex/XMem
- a replication of VLMaps(https://github.com/vlmaps/vlmaps) with some custom modifications
- building 3D semantic map, input the object name and find its' 3D location
- LSeg encoding -> cluster to few points -> deproject to 3D