Skip to content
View AD2000X's full-sized avatar
  • Computational Linguist
  • London, UK
  • 21:33 (UTC)

Block or report AD2000X

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Data

Dataset, Converter, Text, Augmentation, Tagging,
20 repositories
Jupyter Notebook 42 30 Updated May 15, 2019

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

Python 1,527 247 Updated Nov 29, 2024

A data augmentations library for audio, image, text, and video.

Python 4,990 303 Updated Feb 28, 2025

Get your documents ready for gen AI

Python 23,200 1,340 Updated Mar 3, 2025
Python 49 17 Updated Sep 3, 2019

Hundreds of strange attractors

Python 439 69 Updated Feb 10, 2025

Effortless data labeling with AI support from Segment Anything and other awesome models.

Python 4,934 558 Updated Feb 26, 2025
Shell 66 6 Updated Jan 13, 2025

Refine high-quality datasets and visual AI models

Python 9,245 605 Updated Mar 3, 2025

Label Studio is a multi-type data labeling and annotation tool with standardized output format

JavaScript 20,974 2,578 Updated Mar 3, 2025

GitHub repository accompanying the CrateDB Fundamentals Course at the CrateDB Academy.

Jupyter Notebook 6 3 Updated Mar 3, 2025

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.

Java 4,192 575 Updated Mar 3, 2025

The open source Firebase alternative. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.

TypeScript 78,307 7,846 Updated Mar 3, 2025

Python package for dataset imports from UCI ML Repository

Jupyter Notebook 286 116 Updated Aug 6, 2024

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.

Rust 16,340 498 Updated Mar 3, 2025

Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.

Python 347 50 Updated Dec 8, 2022

Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集,中英文机器翻译数据集, 中文分词数据集

Python 361 76 Updated Feb 3, 2021

A registry of publicly available datasets on AWS

Python 1,480 953 Updated Mar 3, 2025

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Python 19,721 2,776 Updated Feb 28, 2025

PTT 八卦版問答中文語料

Jupyter Notebook 242 36 Updated Oct 18, 2024