- Singapore
-
09:37
- 8h ahead - @nczhu
Starred repositories
World's First Large-scale High-quality Robotic Manipulation Benchmark
Affordance-based Robot Manipulation with Flow Matching
A generative world for general-purpose robotics & embodied AI learning.
PhoWhisper: Automatic Speech Recognition for Vietnamese (2024)
Towards Human-Friendly, Fast Learning and Adaptable Agent Communities
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
PyTorch code and models for V-JEPA self-supervised learning from video.
Model Context Protocol Servers
The official Python SDK for Model Context Protocol servers and clients
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (…
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
Speech To Speech: an effort for an open-sourced and modular GPT4-o
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
SGLang is a fast serving framework for large language models and vision language models.
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate