Stars
10x Faster Long-Context LLM By Smart KV Cache Optimizations
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
The source code of IEEE TPAMI 2025 "Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation".
GIM: Learning Generalizable Image Matcher From Internet Videos (ICLR 2024 Spotlight)
Implementation of XFeat (CVPR 2024). Do you need robust and fast local feature extraction? You are in the right place!
使用AI大模型,一键生成高清故事短视频。Generate high-definition story short videos with one click using AI large models.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
This is a warehouse for MobileNetV4-Pytorch-model, can be used to train your image-datasets for vision tasks.
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker/Zotero
OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
Torchreid: Deep learning person re-identification in PyTorch.
The official PyTorch implementation of Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning - CVPR 2023
Hu1ce / chatgpt_academic
Forked from binary-husky/gpt_academic中科院科研工作专用ChatGPT,特别优化学术Paper润色体验,支持自定义快捷按钮,支持markdown表格显示,Tex公式双显示,代码显示功能完善,新增本地Python工程剖析功能/自我剖析功能
Iris Segmentation Groundtruth Database
Iris based security system using techniques of iris segmentation by Canny edge detection and Hough transformation.
We write your reusable computer vision tools. 💜
BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
E5-V: Universal Embeddings with Multimodal Large Language Models
🛠️ Class-imbalanced Ensemble Learning Toolbox. | 类别不平衡/长尾机器学习库
A lightweight 2D graphics library for rendering texts, geometries, and images with high-performance APIs that work across various platforms.
Metric learning and retrieval pipelines, models and zoo.
Fast and accurate automatic speech recognition (ASR) for edge devices
[AAAI 2025] Offical implementation of "DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input"