We use uv to manage dependencies and the project environment.
Clone the GitHub repository:
git clone [email protected]:MDverse/md_data_schema.git
cd md_data_schema
Sync dependencies:
uv sync
Download parquet files from Zenodo to build the database:
uv run src/download_data.py
Files will be downloaded to data/parquet_files
:
data
└── parquet_files
├── datasets.parquet
├── files.parquet
├── gromacs_gro_files.parquet
├── gromacs_mdp_files.parquet
├── gromacs_xtc_files.parquet
Create the empty database:
uv run src/create_database.py
Populate the tables with the data from parquet files:
uv run src/ingest_data.py