This repository contains structured datasets in various categories such as "bank", "beer", "coffee", "commerce", "company", "computer", "credit_card", "dessert", "device", "food", "keywords", "movies", "ratings", "restaurant", "stripe", "subscription", and "user". Each category includes data in three different formats: CSV, JSON, and XML, with relevant and updated information as of January 16, 2024. The data is organized to facilitate access and exploitation for various analyses and developments.
The repository is organized as follows:
.
├── bank
│ ├── csv
│ │ ├── csv_bank_20240116_1.csv
│ │ ├── csv_bank_20240116_2.csv
│ │ ├── csv_bank_20240116_3.csv
│ │ ├── csv_bank_20240116_4.csv
│ │ └── csv_bank_20240116_5.csv
│ ├── json
│ │ ├── json_bank_20240116_1.json
│ │ ├── json_bank_20240116_2.json
│ │ ├── json_bank_20240116_3.json
│ │ ├── json_bank_20240116_4.json
│ │ └── json_bank_20240116_5.json
│ └── xml
│ ├── xml_bank_20240116_1.xml
│ ├── xml_bank_20240116_2.xml
│ ├── xml_bank_20240116_3.xml
│ ├── xml_bank_20240116_4.xml
│ └── xml_bank_20240116_5.xml
├── bank.py
├── [Other Categories]
└── [Corresponding Files]
Each category comes with a Python script (e.g., bank.py
, beer.py
, etc.) to facilitate interaction with the data. These scripts are designed to import and process data in CSV, JSON, and XML formats. Users can leverage these scripts to develop applications or perform data analysis.
Data across all categories use a common user_id
as the primary reference key, allowing for coherent integration and comparison across different categories.
Contributions to the repository are welcome. Please follow the contribution guidelines to submit your changes or additions.
For any questions or comments, feel free to contact [Stefen Taime] at [[email protected]].