Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using DataLad in Jupyter notebook #7

Open
stephaniealley opened this issue Jun 5, 2020 · 2 comments
Open

Using DataLad in Jupyter notebook #7

stephaniealley opened this issue Jun 5, 2020 · 2 comments

Comments

@stephaniealley
Copy link
Collaborator

@gkiar, I have a question regarding the use of DataLad within a Jupyter notebook. I can use the Python API to track the generation and/or movement of files from one directory location to another, but I cannot track commands executed on the data as I see can be done using 'datalad run'. As far as I can tell, that only applies to executing some external script. I really like the concept of sharing my code in a notebook, but I think that tracking the data manipulation is more important for my particular project. Do you happen to have any suggestions?

@gkiar
Copy link

gkiar commented Jun 6, 2020

Hey @stephaniealley - great question, thanks for reaching out 😄

If for your project the Provenance of what you do to your dataset is most important, but you still like the look-and-feel of notebooks, I think we can come up with a bit of a hybrid approach that gets you both. What I'd propose is this:

  1. Write the logic of what you want to do to your dataset in your notebook and make sure it works as expected.
  2. Once it's working, move this code to a set of scripts, and make sure that they still work as expected when being called with datalad run.
  3. Once you've done that, you can go into your notebook and do all the movement of files or plotting or what-have-you natively in the notebook, but replace the commands-now-belonging-to-scripts with something like this:
! datalad run python my_script.py arg1 arg2 --flag1

where the args are replaced with those corresponding to your script, if applicable.

This way, you're able to still have things in a notebook, and even walk them through the commands you're running outside of the notebook (the ! at the start of the line makes the following commands run in the shell that launched your Python session, in this case your notebook).

Does that make sense/seem to suite your need?

@stephaniealley
Copy link
Collaborator Author

Yes, that is exactly what I need! I just wasn't exactly sure how to work it out. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants