diff --git a/CONTRIBUTE.md b/CONTRIBUTE.md index 5cff8798..c3fd1202 100644 --- a/CONTRIBUTE.md +++ b/CONTRIBUTE.md @@ -24,7 +24,8 @@ To increase the likelihood of your pull request being accepted: - If you are making visual changes, include a screenshot of what the affected element looks like, both before and after. - Follow the [style guide][style]. -- Keep your change as focussed as possible. If there are multiple changes you would like to make that are not dependent upon each other, consider submitting them as separate pull requests. +- Follow the [accessibility guidance][https://nhsd-confluence.digital.nhs.uk/pages/viewpage.action?pageId=902212969]. The most important aspects are to include alt text for images that convey meaning, and null alt text for decorative images, colour not being the only way to convey any of the meaning in your content, descriptive heading and labels, and images aren't used as text (if you have images that convey text meaning, they should be SVGs), and any links have a descriptive text, not just "click here" or "link". +- Keep your change as focused as possible. If there are multiple changes you would like to make that are not dependent upon each other, consider submitting them as separate pull requests. - Write [good commit messages](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html). ## Contribute to NHS England Data Science Website diff --git a/docs/about.md b/docs/about.md index 1f00ca03..1c339e2f 100644 --- a/docs/about.md +++ b/docs/about.md @@ -1,9 +1,7 @@ -# Data Science in NHS England +# About the Data Science Team in NHS England -
- -![Image title](images/DS_team_photo_smaller.jpeg){ width="450" alt-tex="Picture of the Data Science team stood on some steps in London." align=right } +![Data science team photo, all stood on some stairs outdoors.](images/DS_team_photo_smaller.jpeg){ width="450" align=right} We are the [NHS England](https://www.england.nhs.uk/) Data Science Team. @@ -18,8 +16,6 @@ We are passionate about getting the most value out of the data collected by NHS [Contact Us (datascience@nhs.net)](mailto:datascience@nhs.net){ .md-button .md-button--primary } -
- ## Teams In NHSE data scientists are concentrated in the central team but also embedded across a number of other areas. @@ -60,8 +56,6 @@ In NHSE data scientists are concentrated in the central team but also embedded a -
- ## Learn about Data Science in Healthcare To support knowledge share of data science in healthcare we've put together a **monthly newsletter** with valuable **insights**, **training opportunities** and **events**. @@ -83,7 +77,7 @@ We also support the [NHS Data Science Community](https://data-science-community. ## Our Members ??? "Our Members" - +
- + diff --git a/docs/articles/posts/20230105_rap.md b/docs/articles/posts/20230105_rap.md index 546a1a89..f7e2d55a 100644 --- a/docs/articles/posts/20230105_rap.md +++ b/docs/articles/posts/20230105_rap.md @@ -19,11 +19,11 @@ Over the past year, we’ve been going through a change programme and adopting R -![The authors Alistair Bullward and Sam Hollings.](https://digital.nhs.uk/binaries/content/gallery/website/data-points-blog/rap-blog-lead-image.jpg/rap-blog-lead-image.jpg/website%3AnewsPostImageLarge2x) +![Picture of the authors Alistair Bullward (left) and Sam Hollings(right).](https://digital.nhs.uk/binaries/content/gallery/website/data-points-blog/rap-blog-lead-image.jpg/rap-blog-lead-image.jpg/website%3AnewsPostImageLarge2x) This is about analytics and data, but knowledge of RAP isn’t just for those cutting code day-to-day. It’s crucial that senior colleagues understand the levels and benefits of RAP and get involved in promoting this new way of working and planning how we implement it. This improves the lives of our data analysts and the quality of our work. -[Read the whole article **HERE**](https://digital.nhs.uk/blog/data-points-blog/2023/why-were-getting-our-data-teams-to-rap){ .md-button .md-button--primary } +[Read the whole article **HERE** (opens in new tab)](https://digital.nhs.uk/blog/data-points-blog/2023/why-were-getting-our-data-teams-to-rap){ .md-button .md-button--primary } diff --git a/docs/articles/posts/20240411_privlm.md b/docs/articles/posts/20240411_privlm.md index 3fc414ea..51d7fda4 100644 --- a/docs/articles/posts/20240411_privlm.md +++ b/docs/articles/posts/20240411_privlm.md @@ -22,8 +22,8 @@ description: > ### LMs can memorize their Training Data
-![xkcd - Predictive Models](https://imgs.xkcd.com/comics/predictive_models.png) -
Figure 1: xkcd 2169 - Predictive Models
+![Cartoon of a stick figure sat at a desk. The caption says "When you train predictive models on input from your users, it can leak information in unexpected ways". On the computer screen it says "Long live the revolution. Our next meeting will be at" with an autofill greyed out of "the docks at midnight on June 28". The stick figure is saying "Aha, found them!"](https://imgs.xkcd.com/comics/predictive_models.png) +
Figure 1: xkcd 2169 - Predictive Models (opens in new tab)
Studies have shown that LMs can inadvertently memorise and disclose information verbatim from their training data when prompted in certain ways, a phenomenon referred to as training data leakage. This leakage can violate the privacy assumptions under which datasets were collected and can make diverse information more easily searchable. diff --git a/docs/articles/posts/20240807_annotation_tools.md b/docs/articles/posts/20240807_annotation_tools.md index 9b7d9bbf..014e1ec0 100644 --- a/docs/articles/posts/20240807_annotation_tools.md +++ b/docs/articles/posts/20240807_annotation_tools.md @@ -22,19 +22,19 @@ description: > ## Introduction -We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data called [Privacy Fingerprint](https://nhsengland.github.io/datascience/our_work/ds255_privacyfp/). +We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data called [Privacy Fingerprint (opens in new tab)](https://nhsengland.github.io/datascience/our_work/ds255_privacyfp/). Named Entity Recognition (NER) is a particularly important part of our pipeline. It is the task of identifying, categorizing and labelling specific pieces of information, known as entities, within a given piece of text. These entities can include the names of people, dates of birth, or even unique identifiers like NHS Numbers. As of the time of writing, there are two NER models fully integrated within the Privacy Fingerprint pipeline used to identify entities which may contribute towards a privacy risk. These are: -- [UniversalNER](https://universal-ner.github.io/): A prompted-based NER Model, where a language model has been finetuned with a conversation-style prompt to output a list containing all entities in the text corresponding to an input entity type. -- [GLiNER](https://github.com/urchade/GLiNER): A BERT-like bidirectional transformer encoder with a key benefit over UniversalNER in that it is a smaller model in terms of memory size. +- [UniversalNER (opens in new tab)](https://universal-ner.github.io/): A prompted-based NER Model, where a language model has been finetuned with a conversation-style prompt to output a list containing all entities in the text corresponding to an input entity type. +- [GLiNER (opens in new tab)](https://github.com/urchade/GLiNER): A BERT-like bidirectional transformer encoder with a key benefit over UniversalNER in that it is a smaller model in terms of memory size. -Both NER models in our pipeline need to be fed a list of entities to extract. This is true for many NER models, although some like [Stanza](https://stanfordnlp.github.io/stanza/) from [Stanford NLP Group](https://stanfordnlp.github.io/) and [BERT](https://huggingface.co/docs/transformers/tasks/token_classification) token classifiers do not need an initial entity list for extraction. For our privacy tool to be effective, we want our list of entities to be representative of the real entities in the data, and not miss any important information. +Both NER models in our pipeline need to be fed a list of entities to extract. This is true for many NER models, although some like [Stanza (opens in new tab)](https://stanfordnlp.github.io/stanza/) from [Stanford NLP Group (opens in new tab)](https://stanfordnlp.github.io/) and [BERT (opens in new tab)](https://huggingface.co/docs/transformers/tasks/token_classification) token classifiers do not need an initial entity list for extraction. For our privacy tool to be effective, we want our list of entities to be representative of the real entities in the data, and not miss any important information.
-![Cartoon of man trying to extract entities. He looks confused and frustrated](../../images/annotation_tools_blog/entity_extraction_cartoon.jpg) +![Cartoon of man trying to extract entities. He looks confused and frustrated. He has a speech bubble saying "Extract an entity? What does that mean?"](../../images/annotation_tools_blog/entity_extraction_cartoon.jpg)
Figure 1: A frustrated user trying to extract entites!.
@@ -73,7 +73,7 @@ There were two approaches we took to develop an annotation tool.
Figure 2: An example of the ipyWidgets and DisplaCy labelling application. All clinicial notes are synthetic.
-First, we used [DisplaCy](https://spacy.io/usage/visualizers/), [ipyWidgets](https://github.com/jupyter-widgets/ipywidgets/blob/main/docs/source/examples/Index.ipynb), and a NER model of choice to generate an interactive tool that works inside Jupyter notebooks. DisplaCy is a visualiser integrated into the SpaCy library which allows you to easily visualise labels. Alongside ipyWidgets, a tool that allows you to create interactive widgets such as buttons, we created an interface which allowed a user to go through reviews and add new entities. +First, we used [DisplaCy (opens in new tab)](https://spacy.io/usage/visualizers/), [ipyWidgets (opens in new tab)](https://github.com/jupyter-widgets/ipywidgets/blob/main/docs/source/examples/Index.ipynb), and a NER model of choice to generate an interactive tool that works inside Jupyter notebooks. DisplaCy is a visualiser integrated into the SpaCy library which allows you to easily visualise labels. Alongside ipyWidgets, a tool that allows you to create interactive widgets such as buttons, we created an interface which allowed a user to go through reviews and add new entities. One of the main advantages of this method is that everything is inside a Jupyter notebook. The entity names you want to extract come straight from the experiment parameters, so if you used this in the same notebook as the rest of your pipeline the entitiy names could be updated automatically from the labelling tool. This would allow easy integration into a user workflow. @@ -88,7 +88,7 @@ This approach was simple and resulted in a fully working example. However, highl
Figure 3: An example of the Streamlit labelling application. All clinicial notes are synthetic.
-We explored a second option using [Streamlit](https://streamlit.io/). Streamlit is a python framework that allows you to build simple web apps. We can use it alongside a package called [Streamlit Annotation Tools](https://github.com/rmarquet21/streamlit-annotation-tools) to generate a more interactive user interface. As an example, a user can now use their cursor to highlight particular words and assign them an entity type which is more hands-on and engaging. Unlike our ipyWidgets example, users can select different labels to be displayed which makes the tool less cluttered, and you can easily navigate using a slider to separate reviews. Like the previous widget, there is a button which uses a NER model to label the text and give live feedback. Including this, the tool is more synergistic, easier to use and more immersive than the ipyWidgets alternative. +We explored a second option using [Streamlit (opens in new tab)](https://streamlit.io/). Streamlit is a python framework that allows you to build simple web apps. We can use it alongside a package called [Streamlit Annotation Tools (opens in new tab)](https://github.com/rmarquet21/streamlit-annotation-tools) to generate a more interactive user interface. As an example, a user can now use their cursor to highlight particular words and assign them an entity type which is more hands-on and engaging. Unlike our ipyWidgets example, users can select different labels to be displayed which makes the tool less cluttered, and you can easily navigate using a slider to separate reviews. Like the previous widget, there is a button which uses a NER model to label the text and give live feedback. Including this, the tool is more synergistic, easier to use and more immersive than the ipyWidgets alternative. However, there were still a few teething issues when developing the Streamlit app. Firstly, Streamlit annotation tool’s has an inability to display `\n` as a new line and instead prints `\n`, resulting in the structure of text being lost. This is a Streamlit issue and we haven’t yet found a way to keep the structure of the text intact. There was an easy fix in which each `\n` was replaced with two spaces (this means the start and end character count for each labelled entity remains consistent with the original structured text), but the structure of the text is still lost which may cause issues for some future users. @@ -100,4 +100,4 @@ Both labelling tools we have identified have key advantages. DisplaCy and ipyWid Following the research and development of these two tools, we believe the ability to interactively annotate, explore and extract entities from your data greatly improves the user experience when using our privacy risk scorer pipeline. -We will publish working examples of annotation using both ipyWidgets and Streamlit, such that a future user can build on them or use them to improve their workflow. The code is available on our [github](https://github.com/nhsengland/privfp-experiments). +We will publish working examples of annotation using both ipyWidgets and Streamlit, such that a future user can build on them or use them to improve their workflow. The code is available on our [github (opens in new tab)](https://github.com/nhsengland/privfp-experiments). diff --git a/docs/images/LIME-workflow.png b/docs/images/LIME-workflow.png new file mode 100644 index 00000000..bfefd73b Binary files /dev/null and b/docs/images/LIME-workflow.png differ diff --git a/docs/images/ai-deep-dive.jpg b/docs/images/ai-deep-dive.jpg new file mode 100644 index 00000000..f1f7d8f2 Binary files /dev/null and b/docs/images/ai-deep-dive.jpg differ diff --git a/docs/images/ai-skunkworks.png b/docs/images/ai-skunkworks.png new file mode 100644 index 00000000..97eac44a Binary files /dev/null and b/docs/images/ai-skunkworks.png differ diff --git a/docs/images/dag_job_opportunity.png b/docs/images/dag_job_opportunity.png new file mode 100644 index 00000000..21ea867e Binary files /dev/null and b/docs/images/dag_job_opportunity.png differ diff --git a/docs/images/example_report_output.png b/docs/images/example_report_output.png new file mode 100644 index 00000000..56e4b877 Binary files /dev/null and b/docs/images/example_report_output.png differ diff --git a/docs/images/nhs-resolution.jpg b/docs/images/nhs-resolution.jpg new file mode 100644 index 00000000..5baaba2b Binary files /dev/null and b/docs/images/nhs-resolution.jpg differ diff --git a/docs/images/sas.png b/docs/images/sas.png new file mode 100644 index 00000000..f417dd28 Binary files /dev/null and b/docs/images/sas.png differ diff --git a/docs/images/stminsights_lowquality.png b/docs/images/stminsights_lowquality.png new file mode 100644 index 00000000..04a9cb0f Binary files /dev/null and b/docs/images/stminsights_lowquality.png differ diff --git a/docs/images/vae.png b/docs/images/vae.png new file mode 100644 index 00000000..b3249bf8 Binary files /dev/null and b/docs/images/vae.png differ diff --git a/docs/our_work/adrenal-lesions.md b/docs/our_work/adrenal-lesions.md index 2330fdad..5cd68d0b 100644 --- a/docs/our_work/adrenal-lesions.md +++ b/docs/our_work/adrenal-lesions.md @@ -7,7 +7,7 @@ tags: ['CLASSIFICATION','LESION DETECTION','COMPUTER VISION','AI'] ---
-![Adrenal flow of transfer](../images/Flow_of_transfer.width-800.png) +![Flow of work for the adrenal lesion project. Starts with 2.5D images on the left, with an arrow to "Pre-trained deep learning model", then to "model training". This then flows into "Model for this usecase", which has two arrows to "Normal" and "Abnormal". The core of the diagram is also labelled as "2D neural network".](../images/Flow_of_transfer.width-800.png)
Many cases of adrenal lesions, known as adrenal incidentalomas, are discovered incidentally on CT scans performed for other medical conditions. These lesions can be malignant, and so early detection is crucial for patients to receive the correct treatment and allow the public health system to target resources efficiently. Traditionally, the detection of adrenal lesions on CT scans relies on manual analysis by radiologists, which can be time-consuming and unsystematic. @@ -47,7 +47,7 @@ Due to the intrinsic nature of CT scans (e.g., a high operating cost, limited nu To overcome some of the disadvantage of training a 3D deep learning model, we took a 2.5D deep learning model approach in this case study. Training the model using 2.5D images enables our deep learning model to still learn from the 3D features of the CT scans, while increasing the number of training and testing data points in this study. Moreover, we can apply 2D deep learning models to the set of 2.5D images, which allow us to apply transfer learning to train our own model further based on the knowledge learned by other deep learning applications (e.g., ImageNet, and the NHS AI Lab’s National COVID-19 Chest Imaging Database). -![Adrenal flow of transfer](../images/Flow_of_transfer.width-800.png) +![Same image as at the top of the page: Flow of work for the adrenal lesion project. Starts with 2.5D images on the left, with an arrow to "Pre-trained deep learning model", then to "model training". This then flows into "Model for this usecase", which has two arrows to "Normal" and "Abnormal". The core of the diagram is also labelled as "2D neural network".](../images/Flow_of_transfer.width-800.png) #### Classification of 3D CT scans @@ -59,7 +59,7 @@ To connect the classification prediction results from the 2.5D images to the CT To prepare the CT scans for this case study (region of interest focus on the adrenal grands), we also developed a manual 3D cropping tool for CT scans. This cropping applied to all three dimensions, including a 1D cropping to select the appropriate axial slices and a 2D cropping on each axial slice. The final cropped 3D image covered the whole adrenal gland on both sides with some extra margin on each side. -![Adrenal cropping](../images/Cropping_process.width-800.png) +![Diagram of how the image cropping to focus on the adrenal glands occurs.](../images/Cropping_process.width-800.png) ### Outcomes and lessons learned diff --git a/docs/our_work/ai-deep-dive.md b/docs/our_work/ai-deep-dive.md index d05c7c24..8722796b 100644 --- a/docs/our_work/ai-deep-dive.md +++ b/docs/our_work/ai-deep-dive.md @@ -1,12 +1,14 @@ --- -title: 'AI Deep Dive' +title: 'AI Deep Dive Workshops' summary: 'The NHS AI Lab Skunkworks team have developed and delivered a series of workshops to improve confidence working with AI.' category: 'Playbooks' origin: 'Skunkworks' tags : ['AI', 'GUIDANCE', 'BEST PRACTICE'] --- -# Case Study +
+![](../images/ai-deep-dive.jpg) +
## Info diff --git a/docs/our_work/ai-dictionary.md b/docs/our_work/ai-dictionary.md index 707f7131..163efb83 100644 --- a/docs/our_work/ai-dictionary.md +++ b/docs/our_work/ai-dictionary.md @@ -6,7 +6,7 @@ origin: 'Skunkworks' tags : ['AI', 'DICTIONARY', 'JAVASCRIPT', 'REACT'] --- -[![AI Dictionary](../images/ai-dictionary.png)](https://nhsx.github.io/ai-dictionary) +[![Image of a browser showing the AI dictionary.](../images/ai-dictionary.png)](https://nhsx.github.io/ai-dictionary) AI is full of acronyms and a common understanding of technical terms is often lacking. diff --git a/docs/our_work/ai-skunkworks.md b/docs/our_work/ai-skunkworks.md index 921b948b..43b39e91 100644 --- a/docs/our_work/ai-skunkworks.md +++ b/docs/our_work/ai-skunkworks.md @@ -6,6 +6,8 @@ origin: 'Skunkworks' tags: ['CLASSIFICATION','LESION DETECTION','AI', 'PYTHON'] --- +![AI Skunkworks website homepage](../images/ai-skunkworks.png) + !!! info Welcome to the technical website of the NHS AI Lab Skunkworks team. For our general public-facing website, please visit the [AI Skunkworks programme](https://www.nhsx.nhs.uk/ai-lab/ai-lab-programmes/skunkworks/) diff --git a/docs/our_work/ambulance-delay-predictor.md b/docs/our_work/ambulance-delay-predictor.md index e2c42b64..f7d42f2d 100644 --- a/docs/our_work/ambulance-delay-predictor.md +++ b/docs/our_work/ambulance-delay-predictor.md @@ -6,7 +6,7 @@ origin: 'Skunkworks' tags: ['AMBULANCE','PREDICTION','RANDOM FOREST', 'CLASSIFICATION', 'TIME SERIES', 'PYTHON'] --- -![Ambulance Handover Delay Predictor screenshot](../images/ambulance-delay-predictor.png) +![Ambulance Handover Delay Predictor screenshot showing the handover times expected for different hospitals, with the high times highlighted in orange.](../images/ambulance-delay-predictor.png) Ambulance Handover Delay Predictor was selected as a project in Q2 2022 following a successful pitch to the AI Skunkworks problem-sourcing programme. diff --git a/docs/our_work/bed-allocation.md b/docs/our_work/bed-allocation.md index 42036d72..8859551d 100644 --- a/docs/our_work/bed-allocation.md +++ b/docs/our_work/bed-allocation.md @@ -6,7 +6,7 @@ origin: 'Skunkworks' tags: ['HOSPITAL','BAYESIAN FORECASTING','MONTE CARLO','GREEDY ALLOCATION', 'PYTHON'] --- -![Bed allocation screenshot](../images/bed-allocation.png) +![Browser showing the dashboard for Kettering General Hospital that shows the forecasting of their bed occupancy.](../images/bed-allocation.png) Bed allocation was identified as a suitable opportunity for the AI Skunkworks programme in May 2021. diff --git a/docs/our_work/c245_synpath.md b/docs/our_work/c245_synpath.md index fbf56584..f5ae8d90 100644 --- a/docs/our_work/c245_synpath.md +++ b/docs/our_work/c245_synpath.md @@ -1,12 +1,12 @@ --- -title: Building the Foundations for a Generic Patient Simulator +title: Building the Foundations for a Generic Patient Simulator (SynPath) summary: Developing an agent-based simulation for generating synthetic patient pathways and scenario modelling for healthcare specific implementations. category: Projects permalink: c245_synpath.html tags: ['SYNTHETIC DATA', 'PATHWAYS','SIMULATION'] --- -![Overview of data model](../images/c245fig1.png) +![](../images/c245fig1.png)
Figure 1: Overview of the Synpath data model
A data model (“Patient Agent”) was developed for fake patients to be defined in the simulation. The patient is then assigned a health record (conditions, medications, ..) with optional additional attributes. diff --git a/docs/our_work/c250_nhscorpus.md b/docs/our_work/c250_nhscorpus.md index 498e9f6d..a23d9e7b 100644 --- a/docs/our_work/c250_nhscorpus.md +++ b/docs/our_work/c250_nhscorpus.md @@ -5,7 +5,7 @@ permalink: c250_nhscorpus.html tags: ['NLP'] --- -![Ingest, Enrich, Share](../images/c250fig1.png) +![Ingest box containing the logo for scrapy and a screenshot of the NHS.uk website, Enrich box including logos for Helin, brat, and doccan, Share box including huggingface, database. Under the boxes there are the docker, SQLPad, elasticsearch and caddy logos.](../images/c250fig1.png)
Figure 1: Open source tools used in each functional setting
We aimed to explore how to build an Open, Representative, Extensible and Useful set of tools to curate, enrich and share sources of healthcare text data in an appropriate manner. diff --git a/docs/our_work/c338_poud.md b/docs/our_work/c338_poud.md index 768c335e..e02f72b2 100644 --- a/docs/our_work/c338_poud.md +++ b/docs/our_work/c338_poud.md @@ -7,8 +7,8 @@ tags: ['UNSTRUCTURED DATA', 'PRIVACY', 'PII', 'BEST PRACTICE']
![](../images/c338fig1.png) -
Figure 1: Figure 4 from Al-Fedaghi, Sabah. (2012). Experimentation with Personal Identifiable Information. Showing an example PII sphere from different perspectives (compound, singleton and multitude personal identifiable information)
+
Figure 1: Figure 4 from Al-Fedaghi, Sabah. (2012). Experimentation with Personal Identifiable Information. Showing an example PII sphere from different perspectives (compound, singleton and multitude personal identifiable information).
Unstructured data (e.g. text, image, audio) makes up a significant quantity of NHS data but is comparatively underused as an evidence source for analysis. This is often due to the privacy concerns restricting the sharing and use of these data. diff --git a/docs/our_work/c339_sas.md b/docs/our_work/c339_sas.md index 39ed585b..ec04a58b 100644 --- a/docs/our_work/c339_sas.md +++ b/docs/our_work/c339_sas.md @@ -5,6 +5,10 @@ permalink: c339_sas.html tags: ['SYNTHETIC DATA', 'GAN','TABULAR DATA'] --- +![](../images/sas.png) + +*Figure 1: Attack diagrams for the currently incorporated scenarios. Scenario 1: Access to the synthetic dataset and a description of the generative model’s architecture and training procedure. Scenario 2: Access to a black box model that can provide unlimited synthetic data, with data realistic of the training distribution gathered by the attacker, which may be an example synthetic dataset released by the researchers.* + An extensible code was developed to apply a suite of adversarial attacks to synthetically generated single table tabular data in order to assess the likely success of attacks and act as a privacy indicator for the dataset. Using this information then informs the generation and information governance process to ensure the safety of our data. ## Results diff --git a/docs/our_work/casestudy-recruitment-shortlisting.md b/docs/our_work/casestudy-recruitment-shortlisting.md index f0049179..9d1a7ee2 100644 --- a/docs/our_work/casestudy-recruitment-shortlisting.md +++ b/docs/our_work/casestudy-recruitment-shortlisting.md @@ -30,11 +30,9 @@ When talking about bias by the predictive model, the model was determined to hav Bias can also be identified by looking at integrity of the source data (looking at factors such as the way it was collected) or sufficiency (see [here](https://en.wikipedia.org/wiki/Fairness_(machine_learning)#:~:text=of%20a%20model.%22-,Sufficiency,-%5Bedit%5D)) of the data -![Bed allocation screenshot](../images/Recruitment_graph.width-800.png) -> **Figure 1**: An example of the synthetic staining process. a) the original slide, containing the α-syn proteins stained in a brownish colour b) a processed version of the original slide, filtered for the brownish colour c) the synthetically stained image after the algorithm has been applied to it. The α-syn proteins are now highlighted in a greenish colour. - -> I regularly hear that a bed is a bed and I know it’s not ... But when you have those front door pressures, you can’t get ambulances offloaded and I have beds in the wrong place - this is the time I need the real support, real time data, an automatic risk assessment that is generated for each patient. -– Member of bed management staff, Kettering General Hospital +![Graph of the counts and proportions of candidates shortlisted or not shortlisted, by grade and ethnicity.](../images/Recruitment_graph.width-800.png) + +***Figure 1**: Graph of the counts and proportions of candidates shortlisted or not shortlisted, by grade and ethnicity.* [comment]: <> (The below header stops the title from being rendered (as mkdocs adds it to the page from the "title" attribute) - this way we can add it in the main.html, along with the summary.) # diff --git a/docs/our_work/casestudy-synthetic-data-pipeline.md b/docs/our_work/casestudy-synthetic-data-pipeline.md index a07d9c75..f2e96b55 100644 --- a/docs/our_work/casestudy-synthetic-data-pipeline.md +++ b/docs/our_work/casestudy-synthetic-data-pipeline.md @@ -6,6 +6,8 @@ origin: 'Skunkworks' tags: ['SYNTHETIC DATA','VAE','PRIVACY','QUALITY','UTILITY','AI', 'PYTHON'] --- +![Example graphs studying the fidelity of the synthetic data to the artificial data.](../images/example_report_output.png) + ## Info This is a backup of the case study published [here](https://transform.england.nhs.uk/ai-lab/explore-all-resources/develop-ai/exploring-how-to-create-mock-patient-data-synthetic-data-from-real-patient-data/) on the NHS England Transformation Directorate website. diff --git a/docs/our_work/ct-alignment.md b/docs/our_work/ct-alignment.md index 35e51df3..dedfb201 100644 --- a/docs/our_work/ct-alignment.md +++ b/docs/our_work/ct-alignment.md @@ -6,7 +6,7 @@ origin: 'Skunkworks' tags: ['CT','COMPUTER VISION','IMAGE REGISTRATION','PYTHON'] --- -![CT Alignment and Lesion Detection screenshot](../images/ct-alignment.png) +![CT Alignment and Lesion Detection screenshot of the interface for identifying lesions.](../images/ct-alignment.png) As the successful candidate from the AI Skunkworks problem-sourcing programme, CT Alignment and Lesion Detection was first picked as a pilot project for the AI Skunkworks team in April 2021. diff --git a/docs/our_work/data-lens.md b/docs/our_work/data-lens.md index 21fe7664..814ec689 100644 --- a/docs/our_work/data-lens.md +++ b/docs/our_work/data-lens.md @@ -6,7 +6,7 @@ origin: 'Skunkworks' tags: ['NLP', 'SEMANTIC SEARCH', 'SCRAPING','JAVASCRIPT','PYTHON'] --- -![Data Lens screenshot](../images/data-lens.png) +![Image of a browser showing the data lens search front end.](../images/data-lens.png) As the successful candidate from a Dragons’ Den-style project pitch, Data Lens was first picked as a pilot project for the NHS AI (Artificial Intelligence) Lab Skunkworks team in September 2020. diff --git a/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md b/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md index 0f005153..62a8b421 100644 --- a/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md +++ b/docs/our_work/data-linkage-hub/linkage-projects/better-matching.md @@ -23,7 +23,9 @@ Each of these steps requires research into linkage best practice, testing on sam We have also added additional configuration to the pipeline to allow for a deduplication task. This is in order to try and identify possible duplicate records in the [Personal Demographics Service (PDS)](https://digital.nhs.uk/services/personal-demographics-service). Here is an overview of how our pipeline currently looks. -![Splink linkage pipeline scheme](../../../images/splink_diagram.png) + +![Splink linkage pipeline schema, shows the flow of the file system for the pipeline.](../../../images/splink_diagram.png) + ## Building a model with transparency in mind Users of linked data have to rely on the accuracy of the process created by others as often the process of linking data is not under their control. That is why one of the main focus of the model we are building is transparency of the methods and explainability of the results. diff --git a/docs/our_work/data-linkage-hub/linkage-projects/cop.md b/docs/our_work/data-linkage-hub/linkage-projects/cop.md index bd25fb9f..2e29ffdc 100644 --- a/docs/our_work/data-linkage-hub/linkage-projects/cop.md +++ b/docs/our_work/data-linkage-hub/linkage-projects/cop.md @@ -11,7 +11,7 @@ In NHS England data linkage occurs at various stages of the data lifecycle, invo The Community of Practice wants to support Data Linkage stakeholders in NHS England to share their expertise and best practices with colleagues across the organisation. This is also in response to the Data Linkage Survey in which colleagues expressed a clear interest in cultivating a collaboration space. -![Results from the Data Linkage Survey Community of Practice question](../../../images/copdl.png) +![Results from the Data Linkage Survey Community of Practice question, 41 people said they want a COP, 21 said no and 53 said maybe. On the right is a bar graph of the different activities wanted in the COP: Consulting experts or peers, Share tools, Professional Development, Cross-government initiatives, Offering my expertise, Showcasing my work, and Other.](../../../images/copdl.png) ## Data Linkage Community of Practice: Mission The mission of our community of practice is to **facilitate collaboration and an exchange of knowledge, tools and innovative solutions** among data linkage stakeholders within NHS England and with and outlook onto other government and research institutions, enabling members to share and adopt effective practices. diff --git a/docs/our_work/data-linkage-hub/linkage-projects/qaf.md b/docs/our_work/data-linkage-hub/linkage-projects/qaf.md index 8c950f5b..456e84fc 100644 --- a/docs/our_work/data-linkage-hub/linkage-projects/qaf.md +++ b/docs/our_work/data-linkage-hub/linkage-projects/qaf.md @@ -10,7 +10,7 @@ Data Linkage is a business-critical process within many government organisations However, too often data linkage is seen as an exclusive software development and data engineering exercise instead of a modelling challenge, and there is not an appropriate level of quality assurance applied at the different stages of the process. This is why we have worked on the [**Quality Assurance Framework for Data Linkage**](https://nhsengland.github.io/quality-assurance-framework-for-data-linkage/), which is a tool for data linkage practitioners **to determine the necessary quality assurance levels at every stage of the data linkage process**: -![Quality Assurance Framework for Data Linkage screenshot](../../../images/qafdl_overview.png) +![Quality Assurance Framework for Data Linkage screenshot. Consists of two tables. Top table has three columns: Data Preparation (contains profiling, assessment, and enrichment), Implementation (contains techniques and tools, configuration of linkage parameters/settings, and version control), and Evaluation (contains verification and validity, quality of linkage, and speed/Computational resources). The second table has a heading of "Overall Considerations" and contains: Uncertainty management, Communication of changes, Safety, Ethics and fairness, Information Governance, Community Engagement, Knowledge Management, and Continuous improvement and maintenance. ](../../../images/qafdl_overview.png) The required level of quality assurance varies by project and is determined by the data linker and data users. The triage questions in the framework provide a structured approach to deciding the minimum expected levels by type of project. diff --git a/docs/our_work/ds218_rap_community_of_practice.md b/docs/our_work/ds218_rap_community_of_practice.md index 9fcec962..bbd6e8e7 100644 --- a/docs/our_work/ds218_rap_community_of_practice.md +++ b/docs/our_work/ds218_rap_community_of_practice.md @@ -8,9 +8,9 @@ tags: ['RAP','BEST PRACTICE','PYTHON','R']
-![Multiple examples of how the team has marketed Reproducible Analytical Pipelines as a way of working, on LinkedIn, Teams, through drop-in sessions, seminars and conference talks.](../images/RAP - marketing examples.png) -
Figure 1: Multiple examples of how the team has marketed Reproducible Analytical Pipelines as a way of working. Click the image to visit our website.
+![](../images/RAP - marketing examples.png)
+*Figure 1: Multiple examples of how the team has marketed Reproducible Analytical Pipelines as a way of working, on LinkedIn, Teams, through drop-in sessions, seminars and conference talks. Click the image to visit our website.* Reproducible Analytical Pipelines (RAP) is a way of working [promoted across the Civil Service](https://analysisfunction.civilservice.gov.uk/policy-store/reproducible-analytical-pipelines-strategy/), which promises **faster**, more **efficient**, more **robust** and **transparent** analysis and data pipelines.
diff --git a/docs/our_work/ds255_privacyfp.md b/docs/our_work/ds255_privacyfp.md index d9920704..cfd79631 100644 --- a/docs/our_work/ds255_privacyfp.md +++ b/docs/our_work/ds255_privacyfp.md @@ -9,7 +9,7 @@ tags: ['TEXT DATA', 'LLM','PYTHON', 'PRIVACY', 'NAMED ENTITY RECOGNITION', 'UNST This codebase is a proof of concept and is under constant development so should only be used for demonstration purposes within a controlled environment.
-![High-level overview of Privacy Fingerprint using open-source models](../images/privfp_diagram.png) +![](../images/privfp_diagram.png)
Figure 1: Diagram of the high-level overview of Privacy Fingerprint using open-source models.
diff --git a/docs/our_work/index.md b/docs/our_work/index.md index fca860d3..9ef852f1 100644 --- a/docs/our_work/index.md +++ b/docs/our_work/index.md @@ -2,7 +2,7 @@

Explore our comprehensive portfolio of ongoing and completed projects that harness the power of data to drive insight.

NameRoleTeamGithub
NameRoleTeamGithub
Sarah CulkinDeputy DirectorCentral Data Science TeamSCulkin-code
Rupert ChaplinAssistant DirectorCentral Data Science Teamrupchap
Jonathan HopeData Science LeadCentral Data Science TeamJonathanHope42