diff --git a/docs/images/dl_prob_mod_scheme.png b/docs/images/dl_prob_mod_scheme.png deleted file mode 100644 index 45624dbd..00000000 Binary files a/docs/images/dl_prob_mod_scheme.png and /dev/null differ diff --git a/docs/images/splink_diagram.png b/docs/images/splink_diagram.png new file mode 100644 index 00000000..df079b13 Binary files /dev/null and b/docs/images/splink_diagram.png differ diff --git a/docs/our_work/Publications.md b/docs/our_work/Publications.md index 760df664..45da9cca 100644 --- a/docs/our_work/Publications.md +++ b/docs/our_work/Publications.md @@ -8,6 +8,26 @@ tags: ['PUBLICATIONS'] List of pre-releases and publications connected to our work +[8] [https://ieeexplore.ieee.org/document/10635870](https://ieeexplore.ieee.org/document/10635870) + +**Medisure: Towards Assuring Machine Learning-Based Medical Image Classifiers Using Mixup Boundary Analysis** + +Adam Byfield, **William Poulett**, **Ben Wallace**, Anusha Jose, Shatakshi Tyagi, Smita Shembekar, Adnan Qayyum, Junaid Qadir, Muhammad Bilal + +*Machine learning (ML) models are becoming integral in healthcare technologies, necessitating formal assurance methods to ensure their safety, fairness, robustness, and trustworthiness. However, these models are inherently error-prone, posing risks to patient health and potentially causing irreparable harm when deployed in clinics. Traditional software assurance techniques, designed for fixed code, are not directly applicable to ML models, which adapt and learn from curated datasets during training. Thus, there is an urgent need to adapt established software assurance principles such as boundary testing with synthetic data. To bridge this gap and enable objective assessment of ML models in real-world clinical settings, we propose Mix-Up Boundary Analysis (MUBA), a novel technique facilitating the evaluation of image classifiers in terms of prediction fairness. We evaluated MUBA using brain tumour and breast cancer classification tasks and achieved promising results. This research underscores the importance of adapting traditional assurance principles to assess ML models, ultimately enhancing the safety and reliability of healthcare technologies. Our code is available at [https: //github.com/willpoulett/MUBA_pipeline](https: //github.com/willpoulett/MUBA_pipeline).* + +--- + +[7] [https://publichealth.jmir.org/2024/1/e46485](https://publichealth.jmir.org/2024/1/e46485) + +**The Use of Online Consultation Systems or Remote Consulting in England Characterized Through the Primary Care Health Records of 53 Million People in the OpenSAFELY Platform: Retrospective Cohort Study** + +**Martina Fonseca**, Brian MacKenna, Amir Mehrkar, The OpenSAFELY Collaborative, Caroline E Walters, George Hickman, **Jonathan Pearson**, Louis Fisher, Peter Inglesby, Seb Bacon, Simon Davy, William Hulme, Ben Goldacre, Ofra Koffman, Minal Bakhai + +*We aimed to explore general practice coding activity associated with the use of Online Consultations (OC) systems in terms of trends, COVID-19 effect, variation, and quality. The OpenSAFELY platform was used to query and analyze the in situ electronic health records of suppliers The Phoenix Partnership (TPP) and Egton Medical Information Systems, covering >53 million patients in >6400 practices, mainly in 2019-2020. We successfully queried general practice coding activity relevant to the use of OC systems, showing increased adoption and key areas of variation during the pandemic at both sociodemographic and clinical levels. The work can be expanded to support monitoring of coding quality and underlying activity. This study suggests that large-scale impact evaluation studies can be implemented within the OpenSAFELY platform, namely looking at patient outcomes.* + +--- + [6] [https://arxiv.org/abs/2403.19802](https://arxiv.org/abs/2403.19802) **Developing Healthcare Language Model Embedding Spaces** diff --git a/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md b/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md index 96e23333..62a8b421 100644 --- a/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md +++ b/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md @@ -1,12 +1,13 @@ --- title: 'Probabilistic Linkage Model' -summary: 'This project is creating a probabilistic linkage model using Splink, in order to improve linkage outcomes, and by extension, patient outcomes.' +summary: 'This project is creating a probabilistic linkage model using Splink, in order to improve linkage outcomes, and by extension, patient outcomes. The aim is for this to be used to link data in a range of NHS datasets.' category: 'Projects' origin: 'NHSD' tags: ['LINKAGE', 'PYTHON', 'PROBABILISTIC MODEL'] --- + ## Crafting a model that suits NHS England data linkage needs -This project aims at developing an alternative data linkage model to [MPS (Master Person Service)](./mps-handbook.md) by creating a [probabilistic linkage model](https://www.bristol.ac.uk/media-library/sites/cmm/migrated/documents/problinkage.pdf) using the package called [Splink](https://moj-analytical-services.github.io/splink/index.html), which was developed by Ministry of Justice (MoJ). +This project aims at developing an alternative data linkage model to [MPS (Master Person Service)](./mps-handbook.md) by creating a [probabilistic linkage model](https://www.bristol.ac.uk/media-library/sites/cmm/migrated/documents/problinkage.pdf) using the package called [Splink](https://moj-analytical-services.github.io/splink/index.html), which was developed by Ministry of Justice (MoJ) Data Science Team. The linkage pipeline consists of a few steps: @@ -19,8 +20,12 @@ The linkage pipeline consists of a few steps: Each of these steps requires research into linkage best practice, testing on samples of our data, feasibility studies of computational power required, and then thorough evaluation. We are working with an incremental improvement plan and a series of iterative MVPs to ensure that the pipeline has the highest quality we can achieve within our computational limits. +We have also added additional configuration to the pipeline to allow for a deduplication task. This is in order to try and identify possible duplicate records in the [Personal Demographics Service (PDS)](https://digital.nhs.uk/services/personal-demographics-service). + Here is an overview of how our pipeline currently looks. -![Splink linkage pipeline schema, shows the flow of the file system for the pipeline.](../../../images/dl_prob_mod_scheme.png) + +![Splink linkage pipeline schema, shows the flow of the file system for the pipeline.](../../../images/splink_diagram.png) + ## Building a model with transparency in mind Users of linked data have to rely on the accuracy of the process created by others as often the process of linking data is not under their control. That is why one of the main focus of the model we are building is transparency of the methods and explainability of the results. diff --git a/docs/our_work/ds251_RAG.md b/docs/our_work/ds251_RAG.md index 307470bd..c1390831 100644 --- a/docs/our_work/ds251_RAG.md +++ b/docs/our_work/ds251_RAG.md @@ -5,7 +5,18 @@ origin: '' tags: ['NLP','LLM','GENAI'] --- -## PLACEHOLDER PAGE - CONTENT COMING SOON! + + +![Stop lying to me Chat GPT Haca Poster](https://github.com/user-attachments/assets/52cc653c-db42-42de-a4bd-76c0fd42c83f) + +See our RAG Demos and discussions [here](https://github.com/nhsengland/ds_251_RAG) + + +## Headlines: +- LLMs produce more relevant and accurate content when given key information in the context window. +- All RAG techniques seek to utilise this strength of LLMs by maximising the use of the context window. +- Modern LLM systems will involve many RAG techniques, often coming as standard in AI development architectures. + [comment]: <> (The below header stops the title from being rendered (as mkdocs adds it to the page from the "title" attribute) - this way we can add it in the main.html, along with the summary.) -# \ No newline at end of file +#