-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dataframe validation before stage execution #197
Comments
|
O_o snakemake looks quite interesting indeed ! joining a broader "pipeline" community would make a lot of sense. regarding the 2nd point I think I would prefer defining everything inside the script but I see how that might lead to a certain amount of code duplication (if df_persons structure doesn't change much across many scripts for exemple...). |
FYI, I'm using pandera right now in another pipeline, and I find it very verbose if you want to validate the whole dataframe at every stage... I'll have a better opinion in a few weeks |
I think it would be a good idea to use Pandera to describe and check the input dataframes of a given stage at runtime.
It has the benefit of :
I don't think it can or should be be imposed in every existing stage but it can be strongly encouraged by the community.
For exemple :
The text was updated successfully, but these errors were encountered: