diff --git a/docs/images/MPS_fig1.png b/docs/images/MPS_fig1.png new file mode 100644 index 00000000..768304fb Binary files /dev/null and b/docs/images/MPS_fig1.png differ diff --git a/docs/images/QAFDL.png b/docs/images/QAFDL.png new file mode 100644 index 00000000..7421417e Binary files /dev/null and b/docs/images/QAFDL.png differ diff --git a/docs/images/copdl.png b/docs/images/copdl.png new file mode 100644 index 00000000..803c264c Binary files /dev/null and b/docs/images/copdl.png differ diff --git a/docs/images/dl_prob_mod_scheme.png b/docs/images/dl_prob_mod_scheme.png new file mode 100644 index 00000000..45624dbd Binary files /dev/null and b/docs/images/dl_prob_mod_scheme.png differ diff --git a/docs/images/linkage_visual_abstract.png b/docs/images/linkage_visual_abstract.png index 96d66d7d..0b7382ce 100644 Binary files a/docs/images/linkage_visual_abstract.png and b/docs/images/linkage_visual_abstract.png differ diff --git a/docs/our_work/data-linkage-hub/index.md b/docs/our_work/data-linkage-hub/index.md new file mode 100644 index 00000000..14dfd770 --- /dev/null +++ b/docs/our_work/data-linkage-hub/index.md @@ -0,0 +1,44 @@ +--- +title: 'Overview - Data Linkage Hub' +summary: 'The data linkage hub encompasses all things data linkage, from documenting the existing state of linkage in NHS England in the Person_ID handbook, to exploring better matching algorithms using probabilistic models and Splink, to creating a Quality Assurance Framework for Data Linkage.' +origin: 'NHS Digital' +tags: ['BEST PRACTICE', 'LINKAGE', 'PYTHON', 'QUALITY'] +--- +![Diagram representing the four current areas of the data linkage hub: DL Quality Assurance, Better Matching Algorithm, MPS Documentation, and the DL Community of Practice.](../../images/linkage_visual_abstract.png) + +Data Linkage is a business-critical process within many government organisations, including NHS England. Being able to link patients across their care journey enables both direct care services and research studies on admin data, which in turn, influences healthcare policies. So taking care of this important service is why the Data Linkage hub was created in the new NHS England. + +The role of the Data Linkage hub in NHS England includes: + +- identifying points of collaboration with other government departments +- mapping the stakeholders involved in data linkage - both internal and external +- feeding user needs to the Data Linkage vision + +## **Work we do** + +!!! info + + Click each heading to find out more! + +### [Quality Assurance Framework](./linkage-projects/qaf.md) +If we want to achieve a consistent and high quality approach to linking data, which allows for robust, transparent and auditable results, we also need a framework to operate within. Hence, this workstream aims at creating, testing and implementing in the business process a Quality Assurance Framework for Data Linkage. + +### [Better Matching Algorithm](./linkage-projects/better-matching.md) +We're currently working on implementing a [probabilistic linkage model](https://www.bristol.ac.uk/media-library/sites/cmm/migrated/documents/problinkage.pdf) using [Splink](https://moj-analytical-services.github.io/splink/index.html), in order to improve linkage outcomes, and by extension, patient outcomes. + +### [Community of Practice](./linkage-projects/cop.md) +We are fostering a community of practice in NHS England to help people do the best linkage they can, and encourage them to be connected with the cross-government Data Linkage Champions network. The community of practice is open to any data linkage stakeholders in NHS England - to join the community of practice go [here](https://teams.microsoft.com/l/channel/19%3A7AGd-QLqWT3CEPP2MGz_Zf7o4_aWQvIK2DoqjJm6L-01%40thread.tacv2/General?groupId=4fc6024c-60fe-4723-8aff-3d139f37b1ef&tenantId=37c354b2-85b0-47f5-b222-07b48d774ee3). + +### [MPS Documentation](./linkage-projects/mps-handbook.md) +We have been [documenting how the Person_ID is generated via the Master Person Service](https://digital.nhs.uk/services/personal-demographics-service/master-person-service/the-person_id-handbook), to make the current process of linking data in the NHS more transparent and easy to understand. + + +[comment]: <> (The below header stops the title from being rendered (as mkdocs adds it to the page from the "title" attribute) - this way we can add it in the main.html, along with the summary.) +# + +|Output | Link| +|---|---| +MPS Diagnostics|[Github](https://github.com/NHSDigital/mps_diagnostics) +Person_ID Handbook | [NHS England Website](https://digital.nhs.uk/services/personal-demographics-service/master-person-service/the-person_id-handbook) +Quality Assurance Framework | [Work in Progress Link](https://musical-journey-mzj2woo.pages.github.io/) +Community of Practice (*internal only*) | [Teams Channel](https://teams.microsoft.com/l/channel/19%3A7AGd-QLqWT3CEPP2MGz_Zf7o4_aWQvIK2DoqjJm6L-01%40thread.tacv2/General?groupId=4fc6024c-60fe-4723-8aff-3d139f37b1ef&tenantId=37c354b2-85b0-47f5-b222-07b48d774ee3) diff --git a/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md b/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md new file mode 100644 index 00000000..939926d2 --- /dev/null +++ b/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md @@ -0,0 +1,36 @@ +--- +title: 'Probabilistic Linkage Model' +summary: 'This project is creating a probabilistic linkage model using Splink, in order to improve linkage outcomes, and by extension, patient outcomes.' +category: 'Projects' +origin: 'NHSD' +tags: ['LINKAGE', 'PYTHON', 'PROBABILISTIC MODEL'] +--- +## Crafting a model that suits NHS England data linkage needs +This project aims at developing an alternative data linkage model to [MPS (Master Person Service)](./mps-handbook.md) by creating a [probabilistic linkage model](https://www.bristol.ac.uk/media-library/sites/cmm/migrated/documents/problinkage.pdf) using the package called [Splink](https://moj-analytical-services.github.io/splink/index.html), which was developed by Ministry of Justice (MoJ). + +The linkage pipeline consists of a few steps: + +- Pre-processing +- Distance Metrics +- Blocking +- Training +- Prediction +- Evaluation + +Each of these steps requires research into linkage best practice, testing on samples of our data, feasibility studies of computational power required, and then thorough evaluation. We are working with an incremental improvement plan and a series of iterative MVPs to ensure that the pipeline has the highest quality we can achieve within our computational limits. + +Here is an overview of how our pipeline currently looks. +![Splink linkage pipeline scheme](../../../images/dl_prob_mod_scheme.png) + +## Building a model with transparency in mind +Users of linked data have to rely on the accuracy of the process created by others as often the process of linking data is not under their control. That is why one of the main focus of the model we are building is transparency of the methods and explainability of the results. + + +[comment]: <> (The below header stops the title from being rendered (as mkdocs adds it to the page from the "title" attribute) - this way we can add it in the main.html, along with the summary.) +# + +|Output | Link| +|---|---| +| Splink Linkage Pipeline * | [Github](https://github.com/NHSDigital/splink-linkage-pipeline) | + +\* This is currently private and available for internal access only. diff --git a/docs/our_work/data-linkage-hub/linkage-projects/cop.md b/docs/our_work/data-linkage-hub/linkage-projects/cop.md new file mode 100644 index 00000000..bd25fb9f --- /dev/null +++ b/docs/our_work/data-linkage-hub/linkage-projects/cop.md @@ -0,0 +1,28 @@ +--- +title: 'Data Linkage Community of Practice (DL CoP)' +summary: 'We are creating and leading a community of practice to help people do the best linkage they can, with support from the data linkage team, but also from fellow analysts who are actively working on data linkage.' +category: 'Projects' +origin: 'NHSD' +tags: ['BEST PRACTICE','EXPLAINABILITY','LINKAGE'] +--- + +## Why do we want a Community of Practice? +In NHS England data linkage occurs at various stages of the data lifecycle, involving different stakeholders (from data engineers, to analysts) and happening across different platforms. There exist pockets of knowledge and expertise that operate independently of each others. +The Community of Practice wants to support Data Linkage stakeholders in NHS England to share their expertise and best practices with colleagues across the organisation. +This is also in response to the Data Linkage Survey in which colleagues expressed a clear interest in cultivating a collaboration space. + +![Results from the Data Linkage Survey Community of Practice question](../../../images/copdl.png) + +## Data Linkage Community of Practice: Mission +The mission of our community of practice is to **facilitate collaboration and an exchange of knowledge, tools and innovative solutions** among data linkage stakeholders within NHS England and with and outlook onto other government and research institutions, enabling members to share and adopt effective practices. + +## How can I join? +You can request access by clicking [here](https://teams.microsoft.com/l/team/19%3A7AGd-QLqWT3CEPP2MGz_Zf7o4_aWQvIK2DoqjJm6L-01%40thread.tacv2/conversations?groupId=4fc6024c-60fe-4723-8aff-3d139f37b1ef&tenantId=37c354b2-85b0-47f5-b222-07b48d774ee3). +This Teams Channel is restricted to only NHS England employees. + +## Data Linkage Champions Network +The CoP wants to connect closely with the Data Linkage Champions Network, a cross-government initiative created by the Government Data Quality Hub (DQ Hub) in the Office for National Statistics (ONS) to work better as a community to improve methods, their applications, and skills in the field of data linkage. +If you want to know more about this, go to the [Data Linkage Champions Network page](https://analysisfunction.civilservice.gov.uk/support/data-linkage/data-linkage-champion-network/). + +[comment]: <> (The below header stops the title from being rendered (as mkdocs adds it to the page from the "title" attribute) - this way we can add it in the main.html, along with the summary.) +# diff --git a/docs/our_work/data-linkage-hub/linkage-projects/mps-handbook.md b/docs/our_work/data-linkage-hub/linkage-projects/mps-handbook.md new file mode 100644 index 00000000..519c918c --- /dev/null +++ b/docs/our_work/data-linkage-hub/linkage-projects/mps-handbook.md @@ -0,0 +1,21 @@ +--- +title: 'MPS Documentation - the Person_ID handbook' +summary: 'Documenting how the Person_ID is generated via the Master Person Service (MPS), to make the current process of linking data in the NHS more transparent and easy to understand.' +category: 'Projects' +origin: 'NHSD' +tags: ['BEST PRACTICE','EXPLAINABILITY','LINKAGE'] +--- + +The Person_ID is a unique patient identifier used by NHS England with the objective of standardising the approach to patient-level data linkage across different data sets. +The Data Linkage team has produced a detailed documentation of the data linkage algorithm used to create the Person_ID, namely the Master Person Service (MPS). +The figure below is an overview of the process. The full documentation is available in [this NHS England site](https://digital.nhs.uk/services/personal-demographics-service/master-person-service/the-person_id-handbook). + +![Creation of Person_ID via the Master Person Service process flow, high-level representation](../../../images/MPS_fig1.png) + +|Output | Link| +|---|---| +MPS Diagnostics|[Github](https://github.com/NHSDigital/mps_diagnostics) +Person_ID Handbook | [NHS England Website](https://digital.nhs.uk/services/personal-demographics-service/master-person-service/the-person_id-handbook) + +[comment]: <> (The below header stops the title from being rendered (as mkdocs adds it to the page from the "title" attribute) - this way we can add it in the main.html, along with the summary.) +# diff --git a/docs/our_work/data-linkage-hub/linkage-projects/qaf.md b/docs/our_work/data-linkage-hub/linkage-projects/qaf.md new file mode 100644 index 00000000..6a48817e --- /dev/null +++ b/docs/our_work/data-linkage-hub/linkage-projects/qaf.md @@ -0,0 +1,26 @@ +--- +title: 'Quality Assurance Framework for Data Linkage' +summary: 'This project aims to create, test, and distribute a quality assurance framework for data linkage to ensure robust, transparent and auditable results.' +category: 'Projects' +origin: 'NHSD' +tags: ['BEST PRACTICE','EXPLAINABILITY','LINKAGE'] +--- + +Data Linkage is a business-critical process within many government organisations, including NHS England. Research publications, official statistics, but also many direct care applications depend on data linkage. Its importance is further amplified when considering privacy preserving principles that require to minimise the use of patients' personal identifiable information. Consequently, data linkage is initiated early in the data lifecycle, establishing a substantial **reliance of downstream applications on the quality of the linkage process**. + +However, too often data linkage is seen as an exclusive software development and data engineering exercise instead of a modelling challenge, and there is not an appropriate level of quality assurance applied at the different stages of the process. This is why we have worked on the **Quality Assurance Framework for Data Linkage**, which is a tool for data linkage practitioners **to determine the necessary quality assurance levels at every stage of the data linkage process**: + +![Quality Assurance Framework for Data Linkage screenshot](../../../images/QAFDL.png) + +The required level of quality assurance varies by project and is determined by the data linker and data users. The triage questions in the framework provide a structured approach to deciding the minimum expected levels by type of project. + +The Quality Assurance Framework guides stakeholders to make well-informed choices based on a clear understanding of potential risks and benefits. Additionally, it can be used as a detailed record-keeping tool that helps evaluate and manage data linkage project aspects. + +Reach out to the Data Linkage Hub if you want to contribute to this project. + +[comment]: <> (The below header stops the title from being rendered (as mkdocs adds it to the page from the "title" attribute) - this way we can add it in the main.html, along with the summary.) +# + +|Output | Link| +|---|---| +Quality Assurance Framework | *link to come soon when website is released* \ No newline at end of file diff --git a/docs/our_work/linkage.md b/docs/our_work/linkage.md deleted file mode 100644 index 60e35f50..00000000 --- a/docs/our_work/linkage.md +++ /dev/null @@ -1,34 +0,0 @@ ---- -title: 'Data Linkage Enhancement' -summary: 'The data linkage enhancement project encompasses all things data linkage, from documenting the existing state of linkage in NHS England in the Person_ID handbook, to exploring better matching algorithms using probabilistic models and Splink, to creating a Quality Assurance Framework for Data Linkage.' -origin: 'NHS Digital' -tags: ['BEST PRACTICE', 'LINKAGE', 'PYTHON', 'QUALITY'] ---- -![Diagram representing the three current areas of the data linkage project: Quality Assurance, Better Matching Algorithm, and Data Linkage as a service.](../images/linkage_visual_abstract.png) - -Data Linkage is a business-critical process within many government organisations, including NHS England. Being able to link patients across their care journey, making sure that underrepresented populations are not lost in the cracks, and ensuring compatibility when using several data sets for the same purpose is a pillar of why the Data Linkage Enhancement team exists. - -## **Work we do** -### Quality Assurance Framework -If we want to achieve a consistent and high quality approach to linking data, which allows for robust, transparent and auditable results, we also need a framework to operate within. Hence, this workstream aims at creating, testing and implementing in the business process a Quality Assurance Framework for Data Linkage. - -### Better Matching Algorithm -We're currently working on implementing a [probabilistic linkage model](https://www.bristol.ac.uk/media-library/sites/cmm/migrated/documents/problinkage.pdf) using [Splink](https://moj-analytical-services.github.io/splink/index.html), in order to improve linkage outcomes, and by extension, patient outcomes. - -### Data Linkage as a service -This is the umbrella covering everything else we do. This stream of work encompasses: - -- identifying points of collaboration with other government departments -- mapping the stakeholders involved in data linkage - both internal and external -- feeding user needs into an overall Data Linkage vision - -Part of this stream of work was also [documenting how the Person_ID is generated via the Master Person Service](https://digital.nhs.uk/services/personal-demographics-service/master-person-service/the-person_id-handbook), to make the current process of linking data in the NHS more transparent and easy to understand. - -[comment]: <> (The below header stops the title from being rendered (as mkdocs adds it to the page from the "title" attribute) - this way we can add it in the main.html, along with the summary.) -# - -|Output | Link| -|---|---| -MPS Diagnostics|[Github](https://github.com/NHSDigital/mps_diagnostics) -Person_ID Handbook | [NHS England Website](https://digital.nhs.uk/services/personal-demographics-service/master-person-service/the-person_id-handbook) - Quality Assurance Framework | [Work in Progress Link](https://musical-journey-mzj2woo.pages.github.io/) \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 38068538..fc20cf72 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -29,7 +29,7 @@ nav: - Past/Current Projects: - Current Projects: - AI Dictionary: our_work/ai-dictionary.md - - Data Linkage Enhancement: our_work/linkage.md + - Data Linkage Hub: our_work/data-linkage-hub/ - NHS.UK Automatic Moderation of Ratings & Reviews: our_work/ratings-and-reviews.md - Reproducible Analytical Pipelines Squad: our_work/ds218_rap_community_of_practice.md - Tool to Asses Privacy Risk of Text Data - Extended: our_work/ds255_privacyfp.md @@ -88,7 +88,10 @@ nav: - Renal Health Prediction: our_work/renal-health-prediction.md - A&E Forecasting Tool: our_work/a_and_e_forecasting_tool.md - Data Science for Linked/Longitudinal Data: - - Data Linkage Enhancement: our_work/linkage.md + - Data Linkage Hub: + - our_work/data-linkage-hub/index.md + # - ... | flat | our_work/data-linkage-hub/*.md + - ... | flat | our_work/data-linkage-hub/linkage-projects/*.md - Inequalities in Diabetes from PHM Data: our_work/p32_phmdiabetes.md - Natural Language Processing Products: - AI Models for Shortlisting Interview Candidates: our_work/casestudy-recruitment-shortlisting.md