Skip to content

Commit

Permalink
Merge pull request #29 from nhsengland/fixing-images
Browse files Browse the repository at this point in the history
fixing images and captions
  • Loading branch information
amaiaita authored Jan 3, 2024
2 parents 3587aac + f5b070b commit d5b07a5
Show file tree
Hide file tree
Showing 28 changed files with 36 additions and 206 deletions.
Binary file added docs/images/c245fig1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/c250fig1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/c338fig1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/ds218_rap_community_of_practice.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/p11fig1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/p22fig1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/p31fig1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/p32fig1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 3 additions & 10 deletions docs/our_work/c245_synpath.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,8 @@ permalink: c245_synpath.html

> | "Developing an agent-based simulation for generating synthetic patient pathways and scenario modelling for healthcare specific implementations."
<p align="center">
<img src="assets/img/c245fig1.png" alt="Overview of data model" width="100%"/>
</p>
<p align="left">
<em>Figure 1: Overview of the Synpath data model</em>
</p>
![Overview of data model](../images/c245fig1.png)
<figcaption>Figure 1: Overview of the Synpath data model</figcaption>

A data model (“Patient Agent”) was developed for fake patients to be defined in the simulation. The patient is then assigned a health record (conditions, medications, ..) with optional additional attributes.

Expand All @@ -29,7 +25,4 @@ Efficient object communication and concurrency were also highlighted needing sig
| ---- | ---- |
| Open Source Code & Documentation | [Github](https://github.com/nhsx/SynPath) |
| Case Study | Awaiting Sign-Off |
| Technical report | [Here](https://github.com/nhsx/SynPath/blob/master/reports/REDACTED_C245%20ABM%20Patient%20Pathways_Final%20Report_V3_28042021.cleaned.pdf) |

|:-|:-|:-|
|<img src="assets/img/simulation_badge_S.png" alt width="80"/>|<img src="assets/img/Synthetic.png" alt width="80"/>|<img src="assets/img/data_science_badge_S.png" alt width="80"/>|
| Technical report | [Here](https://github.com/nhsx/SynPath/blob/master/reports/REDACTED_C245%20ABM%20Patient%20Pathways_Final%20Report_V3_28042021.cleaned.pdf) |
12 changes: 2 additions & 10 deletions docs/our_work/c250_nhscorpus.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,8 @@ permalink: c250_nhscorpus.html

> | "What are the available tools that could be used to build an NHS-focussed collection of texts which could help developers build better NLP tools for the healthcare system."
<p align="center">
<img src="assets/img/c250fig1.png" alt="Ingest, Enrich, Share" width="100%"/>
</p>
<p align="left">
<em>Figure 1: Open source tools used in each functional setting</em>
</p>
![Ingest, Enrich, Share](../images/c250fig1.png)
<figcaption>Figure 1: Open source tools used in each functional setting</figcaption>

We aimed to explore how to build an Open, Representative, Extensible and Useful set of tools to curate, enrich and share sources of healthcare text data in an appropriate manner.

Expand All @@ -25,7 +21,3 @@ Whilst a tool stack was developed which achieved many of our objectives, the key
| Open Source Code & Documentation | [Github](https://github.com/nhsx/language-corpus-tools) |
| Case Study | n/a |
| Blog | [Here](https://nhsx.github.io/AnalyticsUnit/languagecorpusdiscovery.html) |

|:-|:-|:-|
|<img src="assets/img/machine_learning_badge_S.png" alt width="80"/>|<img src="assets/img/data_science_badge_S.png" alt width="80"/>|

15 changes: 4 additions & 11 deletions docs/our_work/c338_poud.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,10 @@ permalink: c338_poud.html

> | "What are the privacy considerations that need to be addressed when dealing with unstructured healthcare text data "
<p align="center">
<img src="assets/img/c338fig1.png" alt="" width="100%"/>
</p>
<p align="left">
<em>Figure 1: Figure 4 from Al-Fedaghi, Sabah. (2012). Experimentation with Personal Identifiable Information. Showing an example PII sphere from different perspectives (compund, singleton and multitude personal identifiable information) </em>
</p>
<figure markdown>
![](../images/c338fig1.png)
<figcaption>Figure 1: Figure 4 from Al-Fedaghi, Sabah. (2012). Experimentation with Personal Identifiable Information. Showing an example PII sphere from different perspectives (compund, singleton and multitude personal identifiable information)</figcaption>
</figure>

Unstructured data (e.g. text, image, audio) makes up a significant quantity of NHS data but is comparatively underused as an evidence source for analysis. This is often due to the privacy concerns restricting the sharing and use of these data.

Expand Down Expand Up @@ -57,8 +55,3 @@ The main output specified was for a list of key qualities that could feed a tool
| Open Source Code & Documentation | n/a |
| Case Study | Awaiting Sign-Off |
| Technical report | [Here](https://github.com/nhsx/PrivacyFingerprint/blob/main/reports/PrivacyOfUnstructuredDataReport_Nov2022.pdf) |

|:-|:-|:-|
|<img src="assets/img/machine_learning_badge_S.png" alt width="80"/>|<img src="assets/img/data_science_badge_S.png" alt width="80"/>|


12 changes: 1 addition & 11 deletions docs/our_work/c339_sas.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,6 @@ permalink: c339_sas.html

> | "Can the privacy of a generated dataset be assessed through downstream adversarial attacks to highlight the risk of reidentificiation "
<p align="center">
<img src="assets/img/c339fig1.png" alt="" width="100%"/>
</p>
<p align="left">
<em>Figure 1: Attack diagrams for the currently incorporated scenarios. Scenario 1: Access to the synthetic dataset and a description of the generative model’s architecture and training procedure. Scenario 2: Access to a black box model that can provide unlimited synthetic data, with data realistic of the training distribution gathered by the attacker, which may be an example synthetic dataset released by the researchers.</em>
</p>

An extensible code was developed to apply a suite of adversarial attacks to synthetically generated single table tabular data in order to assess the likely success of attacks and act as a privacy indicator for the dataset. Using this information then informs the generation and information governance process to ensure the safety of our data.

## Results
Expand All @@ -23,7 +16,4 @@ The code-base was successfully developed with code injection points for extensib
| ---- | ---- |
| Open Source Code & Documentation | restricted |
| Case Study | Awaiting Sign-Off |
| Technical report | [Blod](https://nhsx.github.io/AnalyticsUnit/SynthAdvSuite.html) |

|:-|:-|:-|
|<img src="assets/img/Synthetic.png" alt width="80"/>|<img src="assets/img/statistics_badge_S.png" alt width="80"/>|
| Technical report | [Blod](https://nhsx.github.io/AnalyticsUnit/SynthAdvSuite.html) |
12 changes: 1 addition & 11 deletions docs/our_work/c399_privfinger.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,6 @@ permalink: c399_privfinger.html

> | "Can we generate usable privacy scores for text data to support understanding of privacy concerns and the anonymisation process "
<p align="center">
<img src="assets/img/c399fig1.png" alt="" width="100%"/>
</p>
<p align="left">
<em>Figure 1: The methodology works in the following way: generated structured data for an individual record, transform this into unstructured medical notes, encode identifiers through named entitiy recognition, evaluate privacy, perform deidentification, repeat process.</em>
</p>

This work was undertaken as an external commission aiming to build a pipeline of components which firstly generated unstructured medical notes using a structured output from [Synthea:tm:](https://github.com/synthetichealth/synthea) and then running these through [GPT-3.5](https://platform.openai.com/docs/models/gpt-3-5) models to transform these into human readable notes.

These notes were then processed using named entitiy recognition to extract pre-defined identifiers and store these in a structured form. The alogrithm [pycorrect match](https://github.com/computationalprivacy/pycorrectmatch) was then implemented to give a privacy risk score of reidentification from the identifiers.
Expand All @@ -34,7 +27,4 @@ This is an ongoing piece of work.
| ---- | ---- |
| Open Source Code & Documentation | Coming Soon |
| Case Study | Coming |
| Technical report | Coming Soon |

|:-|:-|:-|:-|
|<img src="assets/img/machine_learning_badge_S.png" alt width="80"/>|<img src="assets/img/Synthetic.png" alt width="80"/>|<img src="assets/img/data_science_badge_S.png" alt width="80"/>|<img src="assets/img/statistics_badge_S.png" alt width="80"/>|
| Technical report | Coming Soon |
12 changes: 6 additions & 6 deletions docs/our_work/ds218_rap_community_of_practice.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ permalink: ds218_rap_community_of_practice.html

> | "Our roving squad of RAP champions have helped a number of teams not only transform their pipelines, but also tought them how to train others and produced guidance which is used by many other organisations."
<p align="center">
<a href="https://nhsdigital.github.io/rap-community-of-practice/"><img src="assets/img/ds218_rap_community_of_practice.png" alt="An image displaying the front page of the NHS RAP Community of Practice website." width="100%"/></a>
</p>
<p align="left">
<em>Figure 1: The front page of the RAP Community of Practice website. </em>
</p>
<a href = "https://nhsdigital.github.io/rap-community-of-practice/">
<figure markdown >
![An image displaying the front page of the NHS RAP Community of Practice website.](../images/ds218_rap_community_of_practice.png) </a>
<figcaption>Figure 1: The front page of the RAP Community of Practice website.</figcaption>
</figure>


Our Squad of RAP Champions has supported the rollout of RAP across the Analytics area within NHE England. This has involved taking existing guidance on RAP found elsewhere, and interpreting it in the local context of NHSE, making guidance specific to our systems and the problems faced by our analysts. We also put together a program for how to learn RAP, transform pipelines (through the "[thin slice approach](https://nhsdigital.github.io/rap-community-of-practice/our_RAP_service/thin-slice-strategy/)") and [become a RAP champion yourself](https://nhsdigital.github.io/rap-community-of-practice/our_RAP_service/building_team_capability/).

Expand Down
15 changes: 5 additions & 10 deletions docs/our_work/p11_synpathdiabetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,10 @@ permalink: p11_synpathdiabetes.html

> | "Exploration work into incorporating learning into a pathway simulator for diabetes. This work has fed our current SynPathGo project to create synthetic patient pathways and a foundation for agent based modelling in the NHS."
<p align="center">
<img src="assets/img/p11fig1.png" alt="" width="100%"/>
</p>
<p align="left">
<em>Figure 1: Table of learning algorithms considered for the simulation inteligence layer </em>
</p>
<figure markdown>
![](../images/p11fig1.png)
<figcaption>Figure 1: Table of learning algorithms considered for the simulation inteligence layer </figcaption>
</figure>

Using the SynPath framework we created a diabetes simulation for 800 patients. These patients could interact within a fictional local area with hospitals providing outpatient and inpatient services, GP practices and community healthcare services.

Expand All @@ -25,7 +23,4 @@ Future collaboration around validation and how to apply learning algorithms are
| ---- | ---- |
| Open Source Code & Documentation | [Github](https://github.com/nhsx/SynPath_Diabetes) |
| Case Study | Awaiting Sign-Off |
| Technical report | [Here](https://github.com/nhsx/SynPath_Diabetes/blob/main/t2dm/reports/Technical%20Report%20(SynPath%20Diabetes)%20v1.pdf) |

|:-|:-|:-|
|<img src="assets/img/simulation_badge_S.png" alt width="80"/>|<img src="assets/img/Synthetic.png" alt width="80"/>|<img src="assets/img/machine_learning_badge_S.png" alt width="80"/>|
| Technical report | [Here](https://github.com/nhsx/SynPath_Diabetes/blob/main/t2dm/reports/Technical%20Report%20(SynPath%20Diabetes)%20v1.pdf) |
9 changes: 0 additions & 9 deletions docs/our_work/p12_synthvae.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,6 @@ permalink: p12_synthvae.html

> | "The initial creation of a variational autoencoder with differential privacy for generating single table tabular gaussian data. This work demonstrated the feasibility of this approach for healthcare and fed into further interactions of the code base."
<p align="center">
<img src="assets/img/p12fig1.png" alt="" width="100%"/>
</p>
<p align="left">
<em>Figure 1: Correlation plots highlighting the difference between the variable relationships in the real and synthetic data across four models.</em>
</p>

This project investigates the potential suitability of Variational Autoencoders (VAEs) as a synthetic data generation tool in the context of the NHS. To effectively address this direction, this work focussed on four key aspects: quality, privacy, ease of use, and interpretability.

We evaluate the performance of the VAE approach alongside five alternative methods available in July/August 2021, namely Gaussian Copula, CTGAN, CopulaGAN, SDV’s TVAE and Independent (a model which assumes independence across variables). Evaluating this set of models provides context to the performance of the VAE with respect to both basic (e.g. Independent) and complex (e.g. CTGAN) approaches.
Expand All @@ -30,5 +23,3 @@ As the privacy budget increases, we see the quality decrease as expected. Howev
| Case Study | Awaiting Sign-Off |
| Technical report | [Here](https://github.com/nhsx/SynthVAE/blob/main/reports/report.pdf) |

|:-|:-|:-|
|<img src="assets/img/Synthetic.png" alt width="80"/>|<img src="assets/img/data_science_badge_S.png" alt width="80"/>|<img src="assets/img/pets_badge_S.png" alt width="80"/>|
12 changes: 1 addition & 11 deletions docs/our_work/p14_mcr.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,6 @@ permalink: p14_mcr.html

> | "How to asses the value that commercial sales data of over-the-counter prescriptions has on respiratory death predictions"
<p align="center">
<img src="assets/img/p14fig1.png" alt="" width="100%"/>
</p>
<p align="left">
<em>Figure 1: Schematic of the difference between other variable importance tools and the Model Class Reliance approach to explaining the value of a sinlge input variable in a prediction</em>
</p>

The primary aim of the project was to apply the novel variable importance technique, [model class reliance](https://papers.nips.cc/paper/2020/hash/fd512441a1a791770a6fa573d688bff5-Abstract.html), to machine learning models which could predict registered respiratory deaths in the UK. The objective was to assess the value of commercial health data in healthcare predictions compared to other available datasets.
## Results

Expand All @@ -36,7 +29,4 @@ The addition of commercial data show a significant increase in predictive power.
| ---- | ---- |
| Open Source Code & Documentation | [Github](https://github.com/nhsx/commercial-data-healthcare-predictions) |
| Case Study | Awaiting Sign-Off |
| Technical report | [Here](https://github.com/nhsx/commercial-data-healthcare-predictions/blob/main/report/NHSX%20Report_ValueofCommercialProductSalesDatainHealthcarePrediction_V2.pdf) |

|:-|:-|:-|
|<img src="assets/img/statistics_badge_S.png" alt width="80"/>|<img src="assets/img/data_science_badge_S.png" alt width="80"/>|<img src="assets/img/forecasting_badge_S.png" alt width="80"/>|
| Technical report | [Here](https://github.com/nhsx/commercial-data-healthcare-predictions/blob/main/report/NHSX%20Report_ValueofCommercialProductSalesDatainHealthcarePrediction_V2.pdf) |
12 changes: 1 addition & 11 deletions docs/our_work/p21_synthvae.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,6 @@ permalink: p21_synthvae2.html

> | "Improving our variational autoencoder to consider fairness and to run on non-gaussian distributions"
<p align="center">
<img src="assets/img/p21fig1.png" alt="" width="100%"/>
</p>
<p align="left">
<em>Figure 1: Directed Acylic Graph (DAG) of a network of variables highlighting their relationships. The highlighted blue I node is the variable of interest whilst the other highlighted blue node is the sensitive vraiable that maybe inducing bias.</em>
</p>

Continuation of the previous development of our variational autoencoder (VAE) to correct for an error discovered since the last project finished. This error appears when trying to generate data for continuous variables which follow non-Gaussian distributions. Previously, standard scaling had been used to normalise these variables which was causing the non-gaussian variables to be synthesised poorly. This was replaced with a Guassian mixture model from the RDT python library to scale and transform these variables into ones with a Gaussian distribution.

The second phase of this worked focussed on understanding the different ways of measuring and implementing fairness within the synthetic data.
Expand All @@ -28,7 +21,4 @@ Further work will expore the adaption of direct acylic graphs to control for fai
| ---- | ---- |
| Open Source Code & Documentation | [Github](https://github.com/nhsx/SynthVAE) |
| Case Study | Awaiting Sign-Off |
| Technical report | [Here](https://github.com/nhsx/SynthVAE/blob/main/reports/NHSXSynthVAE%20(2).pdf) |

|:-|:-|:-|
|<img src="assets/img/Synthetic.png" alt width="80"/>|<img src="assets/img/data_science_badge_S.png" alt width="80"/>|<img src="assets/img/pets_badge_S.png" alt width="80"/>|
| Technical report | [Here](https://github.com/nhsx/SynthVAE/blob/main/reports/NHSXSynthVAE%20(2).pdf) |
16 changes: 2 additions & 14 deletions docs/our_work/p22_txtrayalign.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,7 @@ permalink: p22_txtrayalign.html

> | "Generating descriptive text from X-Ray images using contrastive learning on multi-modal data"
<p align="center">
<img src="assets/img/p22fig1.png" alt="" width="100%"/>
</p>
![](../images/p22fig1.png)
<p align="left">
<em>Figure 1: A contrastive retrieval mechanism. A query image is encoded and compared with the embeddings of a corpus of reference reports. The report with the greatest cosine similarity in the shared embedding space is returned as the output.</em>
</p>
Expand All @@ -19,13 +17,6 @@ TxtRayAlign exploits contrastive training to learn similarities between text and

We observe that even the best performing model (ResNet50-DeCLUTR) only retrieves anything of relevance for 62% of queries. The retrieved sentences tend to contain findings that are not relevant for the query, as indicated by the relatively poor precision. Further, the query image contains findings that are only poorly covered by the retrieved sentences, as indicated by the low recall.

<p align="center">
<img src="assets/img/p22fig2.png" alt="" width="100%"/>
</p>
<p align="left">
<em>Figure 2: Two example reports generated by ResNet50-DeCLUTR (trained on 5%). Highlighted text corresponds to matches of the CheXpert sentence label between the ground truth and generated report. Ground truth report partially redacted for privacy.</em>
</p>

The results of our investigation indicate that this approach can help generate reasonably grammatical and clinically meaningful sentences, yet falls short in achieving this with sufficient accuracy. While improvements to the model could be made, our findings are corroborated by others in literature. Besides improving
performance, future work could develop other applications of TxtRayAlign for other downstream tasks, such as image-to-image or text-to-image retrieval.

Expand All @@ -34,7 +25,4 @@ performance, future work could develop other applications of TxtRayAlign for oth
| ---- | ---- |
| Open Source Code & Documentation | [Github](https://github.com/nhsx/txt-ray-align) |
| Case Study | Awaiting Sign-Off |
| Technical report | [Here](https://github.com/nhsx/txt-ray-align/blob/main/report/TxtRayAlign_Report_DZ.pdf) |

|:-|:-|:-|
|<img src="assets/img/machine_learning_badge_S.png" alt width="80"/>|<img src="assets/img/data_science_badge_S.png" alt width="80"/>|
| Technical report | [Here](https://github.com/nhsx/txt-ray-align/blob/main/report/TxtRayAlign_Report_DZ.pdf) |
14 changes: 1 addition & 13 deletions docs/our_work/p23_stm.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,6 @@ title: Text Analysis using Structural Topic Modelling
permalink: p23_stm.html
---

> | "Using metadata to support better topic modelling of free text responses"
<p align="center">
<img src="assets/img/p23fig1.png" alt="" width="100%"/>
</p>
<p align="left">
<em>Figure 1: Example visualisations using toLDAvis (Top) and stminsights (Bottom)</em>
</p>

The development of an R code for investigating the topics found in free text survey data using a technique that monitors both the content of the responses but also the metadata (e.g. when the response was made, which organisation the response relates to) in order to support the construction of these topics.

## Results
Expand All @@ -23,7 +14,4 @@ The code base has been developed as an open reusable code and being used interna
| ---- | ---- |
| Open Source Code & Documentation | [Github](https://github.com/nhsx/stm-survey-text) |
| Case Study | Awaiting Sign-Off |
| Technical report | [Here](https://github.com/nhsx/stm-survey-text/blob/main/reports/report_stm.pdf) |

|:-|:-|:-|
|<img src="assets/img/machine_learning_badge_S.png" alt width="80"/>|<img src="assets/img/data_science_badge_S.png" alt width="80"/>|
| Technical report | [Here](https://github.com/nhsx/stm-survey-text/blob/main/reports/report_stm.pdf) |
Loading

0 comments on commit d5b07a5

Please sign in to comment.