一个近几年来各大顶会关于视频文本检索库,同步我的博客:https://blog.csdn.net/AAliuxiaolei/article/details/121433833
附一个比较好总结的GitHub仓库
- ICCV
2021 https://openaccess.thecvf.com/ICCV2021 |
---|
TeachText: CrossModal Generalized Distillation for Text-Video Retrieval |
HiT: Hierarchical Transformer With Momentum Contrast for Video-Text Retrieval |
TACo: Token-Aware Cascade Contrastive Learning for Video-Text Alignment |
2019 https://openaccess.thecvf.com/ICCV2019 |
Neighborhood Preserving Hashing for Scalable Video Retrieval |
SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval |
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips |
- SIGIR
2021 https://sigir.org/sigir2021/accepted-papers/ |
---|
Improving Video Retrieval by Adaptive Margin |
Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval |
2020 http://www.sigir.org/sigir2020/accepted-papers/ |
Tree-augmented Cross-Modal Encoding for Complex-Query Video Retrieval |
2019 http://sigir.org/sigir2019/program/accepted/ |
无 |
- ACM MM
2021 https://2021.acmmm.org/main-track-list |
---|
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training |
HANet: Hierarchical Alignment Networks for Video-Text Retrieval |
Discriminative Latent Semantic Graph for Video Captioning |
Fine-grained Cross-modal Alignment Network for Text-Video Retrieval |
Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval |
Progressive Semantic Matching for Video-Text Retrieval |
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising∗ |
2020 https://2020.acmmm.org/main-track-list.html |
Interpretable Embedding for Ad-Hoc Video Search |
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval |
A W2VV++ Case Study with Automated and Interactive Text-to-Video Retrieval |
2019 https://2019.acmmm.org/accepted-papers/index.html |
You Only Recognize Once: Towards Fast Video Text Spotting |
- AAAI
2021 https://aaai.org/Conferences/AAAI-21/wp-content/uploads/2020/12/AAAI-21_Accepted-Paper-List.Main_.Technical.Track_.pdf |
---|
Mind-the-Gap! Unsupervised Domain Adaptation for Text-Video Retrieval |
2020 https://aaai.org/Conferences/AAAI-20/wp-content/uploads/2020/01/AAAI-20-Accepted-Paper-List.pdf |
无 |
2019 https://aaai.org/Conferences/AAAI-19/wp-content/uploads/2018/11/AAAI-19_Accepted_Papers.pdf |
无 |
- IJCAI
2021 https://www.ijcai.org/proceedings/2021/ |
---|
Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment |
2020 https://www.ijcai.org/proceedings/2020/ |
Exploiting Visual Semantic Reasoning for Video-Text Retrieval |
2019 https://www.ijcai.org/proceedings/2019/ |
无 |
- CVPR
2021 https://openaccess.thecvf.com/CVPR2021 |
---|
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval |
On Semantic Similarity in Video Retrieval |
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers |
Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval |
Less is more: Clipbert for video-and-language learning via sparse sampling |
Mdmmt: Multidomain multimodal transformer for video retrieval |
2020 https://openaccess.thecvf.com/CVPR2020_search |
ActBERT: Learning Global-Local Video-Text Representations |
Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning |
2019 https://openaccess.thecvf.com/CVPR2019_search |
无 |
- WACV
2021 https://openaccess.thecvf.com/WACV2021 |
---|
Temporal Context Aggregation for Video Retrieval With Contrastive Learning |
2019 https://openaccess.thecvf.com/WACV2020_search |
无 |
- ECCV
2020 |
---|
Gabeur, Valentin, et al. "Multi-modal transformer for video retrieval." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. Springer International Publishing, 2020. |
Graph Wasserstein Correlation Analysis for Movie Retrieval |
- ICLR
2022 https://openreview.net/group?id=ICLR.cc/2021/Conference |
---|
LEARNING CONTEXT-ADAPTED VIDEO-TEXT RETRIEVAL BY ATTENDING TO USER COMMENTS |
2021 https://openreview.net/group?id=ICLR.cc/2021/Conference |
PARAMETER EFFICIENT MULTIMODAL TRANSFORMERS FOR VIDEO REPRESENTATION LEARNING |
Support-set bottlenecks for video-text representation learning |
2020 https://openreview.net/group?id=ICLR.cc/2021/Conference |
无 |
2019 https://openreview.net/group?id=ICLR.cc/2019/Conference |
无 |
- TIP
2021 https://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=9263394&punumber=83&sortType=vol-only-seq&searchWithin=retrieval&pageNumber=6 |
---|
Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video Retrieval |
2020 |
无 |
2019 |
无 |
- TPAMI
2021 https://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=4359286&searchWithin=video&pageNumber=1 |
---|
Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation |
Dual Encoding for Video Retrieval by Text |
Universal Weighting Metric Learning for Cross-Modal Retrieval |
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics |