Sarah Gibson, The Alan Turing Institute
UKRI Cloud Workshop, 12 February 2019
You can follow along with this demo at http://bit.ly/sgibson-ukri-demo-2019
- It has a lot of different meanings across research fields
- In this context, to be "reproducible" means the same results (e.g. that are published in a paper) are generated given the same input data pushed through the same analysis pipeline
An independent person should be able to easily check my (our) work.
(AKA: I'm guilty of this too...)
- Astrophysics PhD, graduated January 2019
- Researched phenomenon known as Gamma-Ray Bursts (not your average supernova...) and the neutron stars that power them
- This is a figure published in my first journal paper
- It describes the evolution of the spin of a neutron star (middle panel) and the mass of its accretion disc (top panel) as it was being fed by fallback accretion
- The bottom panel is the property we were interested in which is when a derived parameter would cross a specific threshold (the dashed line)
What my research was about isn't necessarily important. What is important is whether other scientists in my field could verify my work.
This is a GIF of my PhD laptop producing the figure which turned out to not be reproducible for a number of reasons:
- Was not version controlled
- Computing environment(s) was not documented
- Computing environment no longer exists - the laptop has been returned and wiped 😱
With a little bit of work, I've managed to reproduce my figure using Binder (https://mybinder.org).
- My code is in a public GitHub repo - now version controlled ☑️
- The computing environment has been documented in an
environment.yml
file ☑️ (other config file types are available)
My PhD repo:
- github.com/sgibson91/magprop
- binder link: http://bit.ly/sgibson-phd-demo - points to a fixed commit for fast(ish) binder loading time
Link to full workflow GIF: phd_demo.gif 🚫 Emergency back-up GIF
Courtesy of Juliette Belin
Read the docs on making your own repo Binder-ready at https://mybinder.readthedocs.io
By design, because it costs the Binder Team about 5000 USD per month to run, the public Binder instance:
- Only works for public repos, cannot host private code or sensitive data
- Large datasets are discouraged
- Computing resources are minimal
- The host institution/organisation/RSE group can choose whether to make repos public or private
- This is an on-going project at the Turing Institute
BinderHub is an umbrella for:
- Building a docker image from a code repository
- repo2docker
- Launching an interactive browser displaying that code repository
- JupyterHub
- Distributing multiple instances of that code repository across the Cloud
- Kubernetes with Microsoft Azure/Google Cloud/Amazon Web Services
Some useful links:
- Zero-to-JupyterHub
- Zero-to-JupyterHub with Kubernetes
- Step Zero: Kubernetes on Microsoft Azure
- repo2docker
- Binder discussions at Jupyter discourse
Version updates to software packages could cause fundamental changes to your code that do not raise a fatal error, and so will pass without you realising.
Here's a little demo repo to highlight this: binder-examples/matplotlib-versions
Link to full workflow GIF: ukri_demo.gif 🚫 Emergency back-up GIF
Ok, so you may not worry too much about reproducing "style" in this way, but imagine if this was numerical. Or that a suite of interacting libraries are updated and are no longer compatible.
Thanks to The Turing Way team!
- Becky Arnold 💬 💻 📖 👀
- Louise Bowler 💬 💻 📖 💡 📋 👀
- Sarah Gibson 💬 💻 📖 🔧 👀 📢
- Patricia Herterich 💬 📖 👀
- Rosie Higman 💬 📋 👀
- Anna Krystalli 💬 💡 📋 👀
- Alexander Morley 💬 👀
⚠️ ️ - Martin O'Reilly 💬 🔧
- Kirstie Whitaker 💬 🎨 🔍 🤔 👀
⚠️ 📢
The Turing Way is a lightly opinionated guide to reproducible data science. Our goal is to provide all the information that researchers need at the start of their projects to ensure that they are easy to reproduce at the end.
Please visit our repo and help us deliver our dream!
Also, thanks to the Binder team for sharing their knowledge!
- Tim Head 💬 🤔
- Chris Holdgraf 💬 🤔
- Benjamin Ragan-Kelley 💬 🤔
- and many others!
- Boost your Research Reproducibility with Binder - Manchester, 1st March
- Sign up here: http://bit.ly/binder-manchester
- Boost your Research Reproducibility with Binder - Turing Institute London, 12th March
- Sign up here: http://bit.ly/binder-london
- Build a BinderHub - Sheffield, 18th March
- Sign up here: http://bit.ly/binderhub-sheffield
Emoji | Represents |
---|---|
💬 | Answering Questions (on gitter, GitHub, or in person) |
💻 | Code |
📖 | Documentation and specification |
🎨 | Design |
💡 | Examples |
📋 | Event Organizers |
🔍 | Funding/Grant Finders |
🤔 | Ideas & Planning |
👀 | Reviewed Pull Requests |
🔧 | Tools |
Tests | |
📢 | Talks |