Closes part of #682 #684

shamikbose · 2022-06-05T02:41:13Z

medical_data updated to bigbio_pairs schema
Passes all tests

Confirm that this PR is linked to the dataset issue.
Create the dataloader script biodatasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
Implement _info(), _split_generators() and _generate_examples() in dataloader script.
Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
Confirm dataloader script works with datasets.load_dataset function.
Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py.
If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

Updated to `bigbio_pairs` schema Passes all tests

bigbio/biodatasets/medical_data/medical_data.py

mariosaenger · 2024-10-24T15:15:43Z

@phlobo Please have a look at this dataset. I refactored the implementation to the new HF-hub-based style. I'm not completely happy with the modeling - essentially it's aspect-based text classification rather than text similarity. However, the former isn't properly supported by any of the BigBio schemes.

phlobo

I would pick a different task, but apart from that, the PR looks good!

bigbio/hub/hub_repos/samd/README.md

bigbio/hub/hub_repos/samd/samd.py

phlobo · 2024-10-26T07:37:04Z

@mariosaenger I realized the old version of the dataset is already on the hub: https://huggingface.co/datasets/bigbio/medical_data
should we delete that one?

Update medical_data.py

f77a877

Updated to `bigbio_pairs` schema Passes all tests

shamikbose requested review from hakunanatasha, jason-fries, sunnnymskang, ruisi-su, galtay, leonweber, sg-wbi and debajyotidatta as code owners June 5, 2022 02:41

shamikbose mentioned this pull request Jun 5, 2022

Fix Datasets Currently Mapped as Tasks.TEXTUAL_ENTAILMENT #682

Closed

galtay reviewed Jun 6, 2022

View reviewed changes

bigbio/biodatasets/medical_data/medical_data.py Outdated Show resolved Hide resolved

Update medical_data.py

e392364

mariosaenger self-assigned this Oct 23, 2024

Mario Sänger added 2 commits October 24, 2024 16:45

Merge branch 'main' into medical_data_update

62b92ef

refactor: Refactor SAMD dataset implementation to hub-based schema

03b9004

mariosaenger requested a review from phlobo October 24, 2024 15:12

phlobo reviewed Oct 25, 2024

View reviewed changes

bigbio/hub/hub_repos/samd/README.md Outdated Show resolved Hide resolved

bigbio/hub/hub_repos/samd/README.md Outdated Show resolved Hide resolved

bigbio/hub/hub_repos/samd/samd.py Outdated Show resolved Hide resolved

bigbio/hub/hub_repos/samd/samd.py Outdated Show resolved Hide resolved

fix: Change task for SAMD dataset to TEXT_PAIRS_CLASSIFICATION

ad926a3

phlobo previously approved these changes Oct 26, 2024

View reviewed changes

Fixed license

9cd63f5

phlobo dismissed their stale review via 9cd63f5 October 26, 2024 07:33

phlobo approved these changes Oct 26, 2024

View reviewed changes

phlobo merged commit 7373ea6 into bigscience-workshop:main Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes part of #682 #684

Closes part of #682 #684

shamikbose commented Jun 5, 2022 •

edited

Loading

mariosaenger commented Oct 24, 2024

phlobo left a comment

phlobo commented Oct 26, 2024

Closes part of #682 #684

Closes part of #682 #684

Conversation

shamikbose commented Jun 5, 2022 • edited Loading

mariosaenger commented Oct 24, 2024

phlobo left a comment

Choose a reason for hiding this comment

phlobo commented Oct 26, 2024

shamikbose commented Jun 5, 2022 •

edited

Loading