Audiostcok Data Card

Dataset Overview

Size of dataset	Number of audios
5.8 GB	22572

Data Collection

Source	Collecting Method
Audiostock Website	1.Scrape sound effects audio files URLs and the title of effects from Audiostock website. 2. Create meta.csv - file with audio file URLs and titles. 3. Download mp3 audio file for each row in meta.csv and create audio-json pairs for each

Preprocessing Principles

You may refer to preprocess_audiostock.py for all the details. Here is a concise summary:

We retrieve information from the meta data (meta.csv) and form a 3-field .json file for each audio. Here are some audio-json pairs selected from the processed dataset:

data_cards_Audiostock_1.mov

{
    "text": [
        "Bubble 02"
    ],
    "tag": [
        "foam",
        "Bubble sound",
        "dangerous",
        "bubble",
        "air",
        "water",
        "water sound",
        "liquid",
        "rupture",
        "plosive sound",
        "asmr",
        "everyday life sound",
        "noise",
        "image",
        "boiling",
        "science",
        "chemical reaction",
        "experiment",
        "cooking",
        "horror",
        "suspense",
        "painful",
        "Bukku",
        "bukubuku",
        "bokkoshi",
        "bokoboko",
        "poke",
        "pokopoko",
        "Copop",
        "copacopo"
    ],
    "original_data": {
        "title": "Audiostock dataset",
        "Description": "Sound effects scraped from the audiostock.net website",
        "URL": "https://audiostock.net/audio/1150592/play",
        "scene": "",
        "purpose": "Video",
        "impression": "Horror",
        "audio_size": 2.0
    }
}

I. Json file generation principles

text entry We take title of the sound effect and put it into test attribute.
tag entry We use tags from Audiostock website.
original data We save URL, scene, purpose and impression (all those attributes are present on Audiostock website) for every audio as well as audio duration, the dataset name and dataset description.

II. Audio filtering principles

Keep audios with sampling rate higher than 16KHZ and discard the rest.
Discard all audios failed to be read by soundfile.read() method or denied by FFmpeg while processing.
Split evry audio in segements with one sentence in each.

III. Audio format specifications

After the preprocessing work, all audio files should be in FLAC format with sampling rate of 48KHZ. (Processed by ffmpeg).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audiostock.md

Audiostock.md

Audiostcok Data Card

Dataset Overview

Data Collection

Preprocessing Principles

I. Json file generation principles

II. Audio filtering principles

III. Audio format specifications

Files

Audiostock.md

Latest commit

History

Audiostock.md

File metadata and controls

Audiostcok Data Card

Dataset Overview

Data Collection

Preprocessing Principles

I. Json file generation principles

II. Audio filtering principles

III. Audio format specifications