Stars
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
Large Concept Models: Language modeling in a sentence representation space
Code for the NAACL 2024 HCI+NLP Workshop paper "LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-explanation" (Wang et al. 2024)
InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations [EMNLP 2023 Findings]
A Topic Modeling System Toolkit (ACL 2024 Demo)
A Fast, Adaptive, Stable, and Transferable Topic Model (NeurIPS 2024)
Implementation of the proposed minGRU in Pytorch
Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)
Top2Vec learns jointly embedded topic, document and word vectors.
The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling
A comprehensive collection of KAN(Kolmogorov-Arnold Network)-related resources, including libraries, projects, tutorials, papers, and more, for researchers and developers in the Kolmogorov-Arnold N…
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Decoding platform for machine translation research
The source code of the paper "WebUltron: An Ultimate Retriever on Webpages under the Model-centric Paradigm"
Probabilistic time series modeling in Python
Multilingual/multidomain question generation datasets, models, and python library for question generation.
This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.
ICML'2022: NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
MIL-RBERT: A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction (BioNLP @ ACL 2020)
The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.
StableLM: Stability AI Language Models