A collection of Jupyter notebooks for working with data in OCDS Kingfisher Process.
Notebook | Description |
---|---|
Publisher analysis template |
Use this notebook to analyse data from a specific publisher |
Meta analysis template |
Use this notebook to analyse data from multiple publishers, or to perform other types of analysis on the Kingfisher database |
Structure and format feedback template |
Use this notebook to provide feedback on structure and format issues reported by the Data Review Tool |
Data quality feedback template |
Use this notebook to provide detailed feedback on structure, format, conformance and coherence issues |
Usability checks template | Use this notebook to provide feedback on data usability |
To ease maintenance, the notebooks are made up of reusable components. To see which components are used in each notebook, refer to the NOTEBOOKS
variable in manage.py
.
Component | Tasks |
---|---|
Setup environment |
Install requirements, import functions, load extensions and set config. Connect to the Kingfisher database. |
Choose data |
Choose a data source, collection and schema to work with. |
Check for errors |
Check for data collection and processing errors. |
Check scope |
Check how many releases and records your data contains. Check the date range and stages of the contracting process covered by your data. |
Check structure and format |
Check for structure and format errors reported by the Data Review Tool. |
Check quality |
Check for conformance and coherence errors. |
Use the buttons above to open the components from the main
branch for editing in Google Colaboratory (Colab).
To open a component from a different branch, use Colab's GitHub browser.
Alternatively, you can use the Open in Colab browser extension (Chrome, Firefox) to add a button that, when clicked when viewing a Jupyter notebook on GitHub, will open that notebook in Colab.
- Create a new notebook
- Set a title using H2 formatting and add your cells, following the style guide for SQL statements.
- Open the component in Colab.
- Add or edit cells, following the style guide for SQL statements.
In Colab:
- Click Edit -> Clear all outputs.
- Click File -> Save a copy in GitHub.
- Uncheck 'Include a link to Colaboratory'
- Select your branch, enter a commit message and click OK.
- Add the component to the entry for the notebook in the
NOTEBOOKS
variable inmanage.py
.
- Add an entry for the the notebook and its components to the
NOTEBOOKS
variable inmanage.py
. - Update the 'Notebooks' section of
README.md
.
- Create a pull request.
- Request a review from a helpdesk analyst.
- If the reviewer requests changes, make the changes then repeat this step.
Once approved, you can merge your own changes.
For small changes, you can review the raw diff in the GitHub review interface.
For larger changes, you can review and comment on a visual diff by clicking the button. You need to authorize the app the first time you open it.
- Install pg_format.
- Run
./manage.py pre-commit
.