-
Computational Linguist
- London, UK
-
21:33
(UTC)
Data
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
A data augmentations library for audio, image, text, and video.
Effortless data labeling with AI support from Segment Anything and other awesome models.
Refine high-quality datasets and visual AI models
Label Studio is a multi-type data labeling and annotation tool with standardized output format
GitHub repository accompanying the CrateDB Fundamentals Course at the CrateDB Academy.
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
The open source Firebase alternative. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.
Python package for dataset imports from UCI ML Repository
Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.
Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集,中英文机器翻译数据集, 中文分词数据集
A registry of publicly available datasets on AWS
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools