Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function that verifies if the input dataset is valid #3

Open
EdoardoAbatiTR opened this issue Sep 30, 2024 · 0 comments
Open

Add function that verifies if the input dataset is valid #3

EdoardoAbatiTR opened this issue Sep 30, 2024 · 0 comments

Comments

@EdoardoAbatiTR
Copy link
Collaborator

As described in the README, we have some requirements for the input dataset if the user decides to use the built-in pipelines:

If you are planning to use any of the included pipelines, you must have a dataset split into 3 files (train.csv, dev.csv and test.csv) that contain train, validation and test sets respectively. Each file must have the following columns:
id: an identifier for each sample, e.g. a document id
text: the input text
labels: the labels list as a string (e.g. "[LabelA, OtherLabel, LabelB]")

It would be great to have a function that helps user verify that their datasets fulfill these requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant