You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@gkiar, I have a question regarding the use of DataLad within a Jupyter notebook. I can use the Python API to track the generation and/or movement of files from one directory location to another, but I cannot track commands executed on the data as I see can be done using 'datalad run'. As far as I can tell, that only applies to executing some external script. I really like the concept of sharing my code in a notebook, but I think that tracking the data manipulation is more important for my particular project. Do you happen to have any suggestions?
The text was updated successfully, but these errors were encountered:
Hey @stephaniealley - great question, thanks for reaching out 😄
If for your project the Provenance of what you do to your dataset is most important, but you still like the look-and-feel of notebooks, I think we can come up with a bit of a hybrid approach that gets you both. What I'd propose is this:
Write the logic of what you want to do to your dataset in your notebook and make sure it works as expected.
Once it's working, move this code to a set of scripts, and make sure that they still work as expected when being called with datalad run.
Once you've done that, you can go into your notebook and do all the movement of files or plotting or what-have-you natively in the notebook, but replace the commands-now-belonging-to-scripts with something like this:
! dataladrunpythonmy_script.pyarg1arg2--flag1
where the args are replaced with those corresponding to your script, if applicable.
This way, you're able to still have things in a notebook, and even walk them through the commands you're running outside of the notebook (the ! at the start of the line makes the following commands run in the shell that launched your Python session, in this case your notebook).
@gkiar, I have a question regarding the use of DataLad within a Jupyter notebook. I can use the Python API to track the generation and/or movement of files from one directory location to another, but I cannot track commands executed on the data as I see can be done using 'datalad run'. As far as I can tell, that only applies to executing some external script. I really like the concept of sharing my code in a notebook, but I think that tracking the data manipulation is more important for my particular project. Do you happen to have any suggestions?
The text was updated successfully, but these errors were encountered: