New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add function that verifies if the input dataset is valid #3

Open

EdoardoAbatiTR opened this issue Sep 30, 2024 · 0 comments

Collaborator

EdoardoAbatiTR commented Sep 30, 2024

As described in the README, we have some requirements for the input dataset if the user decides to use the built-in pipelines:

If you are planning to use any of the included pipelines, you must have a dataset split into 3 files (train.csv, dev.csv and test.csv) that contain train, validation and test sets respectively. Each file must have the following columns:
id: an identifier for each sample, e.g. a document id
text: the input text
labels: the labels list as a string (e.g. "[LabelA, OtherLabel, LabelB]")

It would be great to have a function that helps user verify that their datasets fulfill these requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment