Skip to content

Commit

Permalink
Merge pull request #120 from nhsengland/ah_data_validation_project
Browse files Browse the repository at this point in the history
AH Added /sde_data_validation project page
  • Loading branch information
amaiaita authored Jun 6, 2024
2 parents f03428e + 7108d02 commit d4d25d4
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions docs/our_work/sde_data_validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: 'Reusable New Data Product Validation Functions'
summary: 'More Efficient, More Consistent Data ​Through Shared Validation Functions'
origin: 'NHS England Secure Data Environment Service Data Wranglers'
tags: ['DATA WRANGLERS', 'NHSE_SDE', 'SDE', 'DATA VALIDATION', 'RAP', 'PYTHON']
---
![An image showing a stack of boxes on the left and a single box with robotic legs on the right. The stack of boxes has a label "old validation process" along with titles on boxes such as "code not shared", "inconsistent approach", "unreliable" and "manual process". Above the boxes it says "3 days". Next to the boxes an unhappy man is struggling to move them. To the right is a single box with robotic legs, with a happy looking man stood next to it. The box with robotic legs is labeled "new validation process" and has words nearby such as "reusable code", "consistent process" and "easy to re-run". Above the box is a label stating it takes about 30 minutes.](../images/sde_resuable_data_validation_functions.png)

All data provisioned into the NHS England Secure Data Environment (SDE) must be validated first. The old data product validation process was manual, time consuming and lengthy to re-run.​

Our objectives were to:
- Boost the efficiency and consistency of the data validation process for the Data Access Request Service (DARS) ​
- Make it re-usable to save time and uphold best practice​
- Share the code so others can benefit. ​

## Results

- Validation time down from days to approximately 30 minutes​
- Validation code was reusable on other datasets​ and has already been reused
- Consistent methodology compared to manual approach​
- Enabled multiple potential issues that could have hampered research efforts to be addressed earlier.​

Output|Link
---|---
Open Source Code & Documentation| Coming soon!
Case Study| N/A
Technical report| N/A
Algorithmic Impact Assessment| N/A

#
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ nav:
- Reproducible Analytical Pipelines Squad: our_work/ds218_rap_community_of_practice.md
- Tool to Asses Privacy Risk of Text Data - Extended: our_work/ds255_privacyfp.md
- A&E Forecasting Tool: our_work/a_and_e_forecasting_tool.md
- Reusable Data Validation Process: our_work/sde_data_validation.md
- Past Projects:
- 2023:
- AI Models for Shortlisting Interview Candidates: our_work/casestudy-recruitment-shortlisting.md
Expand Down Expand Up @@ -125,6 +126,8 @@ nav:
- Synthetic Data From Real Data: our_work/casestudy-synthetic-data-pipeline.md
- Synthetic Data Generation Pipeline: our_work/synthetic-data-pipeline.md
- Nursing Placement Scheduled Optimisation: our_work/nursing-placement-optimisation.md
- SDE Service Data Wranglers:
- Reusable Data Validation Process: our_work/sde_data_validation.md
- Our Team's Publications: our_work/Publications.md
# This allows any projects that get added to appear in the main page of the navigation, makes it easy to spot when people have forgotten to categorize them.
- ... | flat | regex=our_work/(?!template-project\.md).*\.md
Expand Down

0 comments on commit d4d25d4

Please sign in to comment.