-
Tsinghua University
- Beijing, China
-
22:09
- 8h ahead - https://shihao1895.github.io
Highlights
- Pro
Stars
[ICCV 2023] Adaptive Rotated Convolution for Rotated Object Detection
World's First Large-scale High-quality Robotic Manipulation Benchmark
Code for paper: Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Official repository of Slide-Transformer (CVPR2023)
Official repository of Uni-AdaFocus (TPAMI 2024).
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
[NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
Official repository of InLine attention (NeurIPS 2024)
1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundation visual backbones.
Official repository of FLatten Transformer (ICCV2023)
[TPAMI 2024] Probabilistic Contrastive Learning for Long-Tailed Visual Recognition
[ICML 2024] SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning
[ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
[ECCV 2024] Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
(TPAMI 2024) A Survey on Open Vocabulary Learning