Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

246 update python functions to adhere to simpler standards and pre format data #269

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b14d1c3
first draft of `Dataset` / `DataLoader` rewrite
ymahlich Nov 18, 2024
2781309
added `Dataset.types()` function
ymahlich Nov 18, 2024
748e383
addded train_test_validate as instance method
ymahlich Nov 18, 2024
8995b10
moved `train_test_validate` out of `Dataset`. Is now called from with…
ymahlich Nov 19, 2024
f689d00
added `dataset.save()` function
ymahlich Dec 3, 2024
b822acd
added option to load from pickled object file to `dataset.load()` fun…
ymahlich Dec 3, 2024
37d5f90
added skeleton for `dataset.format()`
ymahlich Dec 3, 2024
e899ec9
added "mutations" data_type to `dataset.format()`
ymahlich Dec 3, 2024
68a5719
added handeling of 'combinations', 'drugs', 'genes' & 'samples' in `d…
ymahlich Dec 3, 2024
50e4ee2
added basic handling of 'proteomics' to `dataset.format()`
ymahlich Dec 3, 2024
8ebb432
added handeling of 'transcriptomics' in `dataset.format()`
ymahlich Dec 3, 2024
64ce81a
generalized error handling
ymahlich Dec 3, 2024
101ad74
added handling of 'experiments' in `dataset.format()`
ymahlich Dec 3, 2024
a397281
added handling of `copy_number`
ymahlich Dec 9, 2024
6bd844a
added handling of `drug_descriptor`
ymahlich Dec 9, 2024
7c7342c
changed `format('mutations')` to use `pd.pivot_table` instead of `pd.…
ymahlich Dec 9, 2024
8ab2d77
renamed `download_by_prefix` to `download`
ymahlich Dec 10, 2024
8cba605
added option to download to specified folder in `download()`
ymahlich Dec 10, 2024
7c0d4b2
fixed import
ymahlich Dec 12, 2024
9184363
added `copy_number` -> `copy_call` conversion
ymahlich Dec 12, 2024
7804cd6
added utilization of __version__
ymahlich Dec 16, 2024
9862154
added missing __init__ file
ymahlich Dec 16, 2024
1b16b82
added helper function to list all available datasets
ymahlich Dec 16, 2024
28ea7e3
removed coderdata.DatasetLoader / coderdata.loader.*
ymahlich Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions coderdata/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
from .download.downloader import download_data_by_prefix
from .load.loader import DatasetLoader, join_datasets
from .download.downloader import download
from .split.splitter import train_test_validate
from .dataset.dataset import (
Dataset,
load,
)

# '_version.py' will be generated by hatchling once the switch away from
# setuptools.py is finished
try:
from ._version import __version__
except ImportError:
__version__ = '0.1.40'
try:
from ._version import __version_tuple__
except ImportError:
__version_tuple__ = (0, 1, 40)

from .utils.utils import version
from .utils.utils import list_datasets
4 changes: 2 additions & 2 deletions coderdata/cli.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import argparse
from .download.downloader import download_data_by_prefix
from .download.downloader import download

def main():
parser = argparse.ArgumentParser(prog='coderdata')
Expand All @@ -9,7 +9,7 @@ def main():
parser_download = subparsers.add_parser('download', help='Download datasets')
parser_download.add_argument('--prefix', type=str, default=None,
help='Prefix of the dataset to download (e.g., "hcmi"), "all", or leave empty for all files.')
parser_download.set_defaults(func=download_data_by_prefix)
parser_download.set_defaults(func=download)

args = parser.parse_args()
if hasattr(args, 'func'):
Expand Down
2 changes: 2 additions & 0 deletions coderdata/dataset/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .dataset import Dataset
from .dataset import load
Loading