diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..02d875ef --- /dev/null +++ b/404.html @@ -0,0 +1,4134 @@ + + + +
+ + + + + + + + + + + + + + +Our internships are aimed at current PhD students looking for an industrial placement of around five months with the right to work in the UK. The projects are focussed on innovation, in particular around getting the most value out of NHS data.
+The projects often have a focus on emerging data science techniques and so we advertise mainly to data science programmes, however previous interns have come from other disciplines such as clinical, mathematics, computer science and bioinformatics, which have added huge value through the range of approaches and knowledge.
+For more information and details on how to apply see the Scheme Overview page on the microsite
+For details on open projects see the Projects page on the microsite
+Available outputs from previous projects can also be seen at Previous Projects on the microsite
+Currently our interns are working on the following projects in two waves. These are the original briefs they applied to and their work and outputs will be available on our organisation GitHub.
+Wave 6 | +February - July 2024 | +
---|---|
+ | NHS Language Corpus Extension | +
+ | Understanding Fairness and Explainability in Multi-modal Approaches within Healthcare | +
Wave 7 | +July - December 2024 | +
+ | Evaluating NER-focussed models and LLMs for identifying key entities in histopathology reports – working with GOSH DRIVE | +
+ | Investigating Privacy Concerns and Mitigations for Healthcare Language and Foundation Models | +
We are the NHS England Data Science Team.
+We are passionate about getting the most value out of the data collected by NHS England and the wider NHS through applying innovative techniques in appropriate and well-considered ways.
+Our vision is:
+++ +Embed ambitious yet accessible data science in health and care to help people live healthier longer lives
+
Contact Us (datascience@nhs.net)
+In NHSE data scientists are concentrated in the central team but also embedded across a number of other areas.
+Data Linkage
+The Data Linkage Hub aims at providing a unified and quality solution to the data linkage needs in NHS England. Data Science is central to achieving this objective, and it covers many aspects, from the mathematical models of entity resolution and record linkage, to identifying and correcting linkage errors, assessing their impact on downstream applications, and ensuring quality.
+See the Data Linkage Hub.
+Central Data Science Team
+We develop and deploy data science products to make a positive impact on NHS patients and workforce. We investigate applying novel techniques that increase the insight we get from health-related data. We prioritise code-first ways of working, transparency and promoting best practice. We champion quality, safety and ethics in the application of methods and use of data. We have the remit to be open and collaborative and have the aim of sharing products with the wider healthcare community.
+See our Projects.
+National SDE Team
+Working with customer researchers and analysts to identify how they can do their research, overcome and rectify data issues and use the platform and data to its fullest. There is also work to create products and tools that facilitate research in the environment such as data quality and completeness visualisations, example analysis and machine learning code as well as continuous improvement and increasing automation of the processes to get data both into the SDE and out through output checking.
+See the SDE website.
+Other Embedded Data Scientists
+Across the organisation individual data scientists are embedded within specific team (including: Workforce, Training and Education (WT&E); Medicines; Patient Safety; AI Lab; Digital Channels etc.).
+We come together through the data science assembly to align our professional development and standards.
+To support knowledge share of data science in healthcare we've put together a monthly newsletter with valuable insights, training opportunities and events.
+Note
+The newsletter is targeted towards members of the NHS England Data Science team, so some links may only be accessible to those with the necessary login credentials, however the newsletter and its archive are available for all at the link above.
+We also support the NHS Data Science Community hosted in AnalystX, which is the home of spreading data science knowledge within the NHS. You can also learn a lot about data science from the other communities we support:
+ +Name | Role | Team | Github |
---|---|---|---|
Sarah Culkin | Deputy Director | Central Data Science Team | SCulkin-code |
Rupert Chaplin | Assistant Director | Central Data Science Team | rupchap |
Jonathan Hope | Data Science Lead | Central Data Science Team | JonathanHope42 |
Jonathan Pearson | Data Science Lead | Central Data Science Team | JRPearson500 |
Achut Manandhar | Data Science Lead | Central Data Science Team | achutman |
Simone Chung | Principal Data Scientist (Section Head) | Central Data Science Team | simonechung |
Efrosini Serakis | Data Science Lead | Central Data Science Team | efrosini-s |
Daniel Schofield | Principal Data Scientist (Section Head) | Central Data Science Team | danjscho |
Eladia Valles Carrera | Principal Data Scientist | Central Data Science Team | lilianavalles |
Elizabeth Johnstone | Principal Data Scientist (Section Head) | Central Data Science Team | LiziJohnstone |
Nicholas Groves-Kirkby | Principal Data Scientist (Section Head) | Central Data Science Team | ngk009 |
Divya Balasubramanian | Data Science Lead | Central Data Science Team | divyabala09 |
Giulia Mantovani | Data Science Lead | Data Linkage Hub | GiuliaMantovani1 |
Angeliki Antonarou | Principal Data Scientist | National SDE Data Science Team | AnelikiA |
Kevin Fasusi | Principal Data Scientist | National SDE Data Science Team | KevinFasusi |
Jonny Laidler | Senior Data Scientist | Central Data Science Team | JonathanLaidler |
Mia Noonan | Senior Data Scientist | Central Data Science Team | amelianoonan1-nhs |
Sean Aller | Senior Data Scientist | Central Data Science Team | seanaller |
Hadi Modarres | Senior Data Scientist | Central Data Science Team | hadimodarres1 |
Michael Spence | Senior Data Scientist | Central Data Science Team | mspence-nhs |
Harriet Sands | Senior Data Scientist | Central Data Science Team | harrietrs |
Alice Tapper | Senior Data Scientist | Central Data Science Team | alicetapper1 |
Ben Wallace | Senior Data Scientist | Central Data Science Team | |
Jane Kirkpatrick | Senior Data Scientist | Central Data Science Team | |
Kenneth Quan | Senior Data Scientist | Central Data Science Team | quan14 |
Shoaib Ali Ajaib | Senior Data Scientist | National SDE Team | |
Marek Salamon | Senior Data Scientist | National SDE Team | |
Adam Hollings | Senior Data Scientist | National SDE Team | AdamHollings |
Oluwadamiloju Makinde | Senior Data Scientist | National SDE Team | |
Joseph Wilson | Senior Data Scientist | National SDE Team | josephwilson8-nhs |
Nickie Wareing | Senior Data Scientist | National SDE Team | nickiewareing |
Helen Richardson | Senior Data Scientist | National SDE Team | helrich |
Humaira Hussein | Senior Data Scientist | National SDE Team | humairahussein1 |
Jake Kasan | Senior Data Wrangler (contract) | National SDE Team | |
Lucy Harris | Senior Data Scientist | Meds Team | |
Vithursan Vijayachandrah | Senior Data Scientist | Workforce, Training & Education Team | VithurshanVijayachandranNHSE |
Warren Davies | Data Scientist | Central Data Science Team | warren-davies4 |
Sami Sultan | Data Scientist | Workforce, Training & Education Team | SamiSultanNHSE |
Chaeyoon Kim | Data Scientist | Workforce, Training & Education Team | ChaeyoonKimNHSE |
Ilja Lomkov | Data Scientist | Workforce, Training & Education Team | IljaLomkovNHSE |
Thomas Bouchard | Data Science Officer | Central Data Science Team | tom-bouchard |
Catherine Sadler | Data Science Officer | Central Data Science Team | CatherineSadler</a |
William Poulett | Data Science Officer | Central Data Science Team | willpoulett |
Amaia Imaz Blanco | Data Science Officer | Central Data Science Team | amaiaita |
Xiyao Zhuang | Data Science Officer | Central Data Science Team | xiyaozhuang |
Scarlett Kynoch | Data Science Officer | Central Data Science Team | scarlett-k-nhs |
Jennifer Struthers | Data Science Officer | Central Data Science Team | jenniferstruthers1-nhs |
Matthew Taylor | Data Science Officer | Central Data Science Team | mtaylor57 |
Elizabeth Kelly | Data Science Officer | National SDE Team | ejkcode |
Sam Hollings | (former) Principal Data Scientist | Central Data Science Team | SamHollings |
Alistair Jones | (former) Senior Data Scientist | National SDE Team | alistair-jones |
Daniel Goldwater | (former) Senior Data Scientist | Central Data Science Team | DanGoldwater1 |
Jennifer Hall | (former) Data Science Lead | Data Linking Hub | |
Paul Carroll | (former) Principal Data Scientist (Section Head) | Central Data Science Team | pauldcarroll |
++Reproducible analytical pipelines (RAP) help ensure all published statistics meet the highest standards of transparency and reproducibility. Sam Hollings and Alistair Bullward share their insights on adopting RAP and give advice to those starting out.
+
Reproducible analytical pipelines (RAP) are automated statistical and analytical processes that apply to data analysis. It’s a key part of national strategy and widely used in the civil service.
+Over the past year, we’ve been going through a change programme and adopting RAP in our Data Services directorate. We’re still in the early stages of our journey, but already we’ve accomplished a lot and had some hard-learnt lessons.
+ + + +This is about analytics and data, but knowledge of RAP isn’t just for those cutting code day-to-day. It’s crucial that senior colleagues understand the levels and benefits of RAP and get involved in promoting this new way of working and planning how we implement it.
+This improves the lives of our data analysts and the quality of our work.
+ + + + + + + + + + + + + + + + + + + +++ + +Over recent years, larger, more data-intensive Language Models (LMs) with greatly enhanced performance have been developed. The enhanced functionality has driven widespread interest in adoption of LMs in Healthcare, owing to the large amounts of unstructured text data generated within healthcare pathways.
+However, with this heightened interest, it becomes critical to comprehend the inherent privacy risks associated with these LMs, given the sensitive nature of Healthcare data. This PhD Internship project sought to understand more about the Privacy-Risk Landscape for healthcare LMs through a literature review and exploration of some technical applications.
+
Studies have shown that LMs can inadvertently memorise and disclose information verbatim from their training data when prompted in certain ways, a phenomenon referred to as training data leakage. This leakage can violate the privacy assumptions under which datasets were collected and can make diverse information more easily searchable.
+As LMs have grown, their ability to memorize training data has increased, leading to substantial privacy concerns. The amount of duplicated text in the training data also correlates with memorization in LMs. This is especially relevant in healthcare due to the highly duplicated text in Electronic Healthcare Records (EHRs).
+If LMs have been trained on private data and are subsequently accessible to users who lack direct access to the original training data, the model could leak this sensitive information. This is a concern even if the user has no malicious intent.
+A malicious user can stage a privacy attack on an LM to extract information about the training data purposely. Researchers can also use these attacks to measure memorization in LMs. There are several different attack types with distinct attacker objectives.
+One of the most well-known attacks is Membership inference attacks (MIAs). MIAs determine whether a data point was included in the training data of the targeted model. Such attacks can result in various privacy breaches; for instance, discerning that a text sequence generated by Clinical LMs (trained on EHRs) originating from the training data can disclose sensitive patient information.
+At the simplest level, MIAs use the confidence of the target model on a target data instance to predict membership. A threshold is set against the confidence of the model to ascertain membership status. For a specific example, if the confidence is greater than the threshold then the attacker assumes the target is a member of the training data, as the model is "unsurprised" to see this example, indicating it has likely seen this example before during training. Currently, the most successful MIAs use reference models. This refers to a second model trained on a dataset similar to the training data of the target model. The reference model filters out uninteresting common examples, which will also be "unsurprising" to the reference model.
+There are three primary approaches to mitigate privacy risks in LMs:
+In this project, we sought to understand more about the Privacy-Risk Landscape for Healthcare LMs and conduct a practical investigation of some existing privacy attacks and defensive methods.
+Initially, we conducted a thorough literature search to understand the privacy risk landscape. Our first applied work package explored data deduplication before model training as a mitigation to reduce memorization and evaluated the approach with Membership Inference Attacks. We showed that RoBERTa models trained on patient notes are highly vulnerable to MIAs, even when only trained for a single epoch. We investigated data deduplication as a mitigation strategy but found that these models were just as vulnerable to MIAs. Further investigation of models trained for multiple epochs is needed to confirm these results. In the future, semantic deduplication could be a promising avenue for medical notes.
+Our second applied work package explored editing/unlearning approaches for healthcare LMs. Unlearning in LMs is poised to become increasingly relevant, especially in light of the growing awareness surrounding training data leakage and the 'Right to be Forgotten'. We found that many repositories for performing such approaches were not adapted for all LM types, and some are still not mature enough to be easy to use as packages. Exploring a Locate-then-Edit approach to Knowledge Neurons, we found this was not well suited to the erasure of information we needed in medical notes. Our findings suggest that the focus from a privacy perspective on these methods should be on those which allow the erasure of specific training data instances instead of relational facts.
+This work primarily explored privacy in pre-trained Masked Language Models. The growing adoption of generative LMs underscores the importance of expanding this work to Encoder and Encoder-Decoder models like the GPT family and T5. Also, due to the common practice of freezing parameters and tuning the last layer of a LM on a private dataset, it is critical to expand investigations of privacy risks to LMs fine-tuned on healthcare data.
+Within the scope of this exploration, the field of Machine Unlearning/Editing applied to LMs was in its infancy, but it is gaining momentum. As this field matures, comparing the efficacy of different methods becomes crucial. Furthermore, it is important to explore the effect of removing the influence of a set of data points. A holistic examination of the effectiveness, privacy implications, and broader impacts of Machine Unlearning/Editing methods on healthcare LMs is essential to inform the development of robust and privacy-conscious LMs in the NHS.
+When considering explainability of models, this often involves generating explanations or counterfactuals alongside the decisions made by the LM. However, integrating explanations into the output of LMs can introduce vulnerabilities related to training data leakage and privacy attacks. Additionally, efforts to enhance privacy, such as employing Privacy-preserving training techniques, can inadvertently impact fairness, particularly in datasets lacking diversity. In healthcare, all three elements are paramount, so investigating the privacy-explainability-fairness trade-off is crucial for developing private, robust and ethically sound LMs.
+Finally, privacy concerns in several emerging trends for LMs need to be understood in Healthcare scenarios. Incorporating external Knowledge Bases to enhance LMs, known as retrieval augmentation, could make LMs more likely to leak private information. Further, Multimodal Large Language Models (MLLM), referring to LM-based models that can take in and reason over multimodal information common in healthcare, could be susceptible to leakage from one input modality through another output modality.
+ + + + + + + + + + + + + + + + + + +++ + +We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data. To use our tool effectivly, users need a basic understanding of the entities within their dataset which may contribute to privacy risk.
+There are various tools for annotating and exploring free text data. The author explores some of these tools and discusses his experiences.
+
We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data called Privacy Fingerprint (opens in new tab).
+Named Entity Recognition (NER) is a particularly important part of our pipeline. It is the task of identifying, categorizing and labelling specific pieces of information, known as entities, within a given piece of text. These entities can include the names of people, dates of birth, or even unique identifiers like NHS Numbers.
+As of the time of writing, there are two NER models fully integrated within the Privacy Fingerprint pipeline used to identify entities which may contribute towards a privacy risk. These are:
+Both NER models in our pipeline need to be fed a list of entities to extract. This is true for many NER models, although some like Stanza (opens in new tab) from Stanford NLP Group (opens in new tab) and BERT (opens in new tab) token classifiers do not need an initial entity list for extraction. For our privacy tool to be effective, we want our list of entities to be representative of the real entities in the data, and not miss any important information.
+ +Let's consider a new user who wants to investigate the privacy risk of a large unstructured dataset. Maybe they want to use this data to train a new generative healthcare model and don’t want any identifiable information to leak into the training data. Or maybe this dataset is a large list of outputs from a similar model and they want to ensure that no identifiable information has found it's way into the data. They may ask:
+What does my data look like?
+What entities within my data have a high privacy risk?
+Wait a second, what even is an entity?
+We want to offer an easy and interactive starting point for new users of our tool, where they can easily explore their data, understand the role of NER and identify what risks lie in their data. If they miss certain entities, this could have large implications on the scoring aspect of our pipeline.
+Of course, we want people to use our tool efficiently and effectively! So we asked:
+How can a new user efficiently explore their data to understand what entities exist within the data, and in particular, which entities may contribute to a privacy risk?
+Interactive annotation tools offer a solution to the above problem. If we can include a tool which allows a user to manually label their dataset, alongside live feedback from the NER model, it would allow a user to very quickly understand the entities in their data.
+Further to this, some NER models can be surprisingly affected by the wording of entities. The entity titled 'name' may extract both the name of an individual and the name of a hospital. The entity 'person' might only extract the name of the person. We have found that changing the entity 'person' to 'name' in UniversalNER reduced how often names were picked up by the model. If a user gets live feedback from a model whilst labelling, this will help them both finetune which entity names work best, alongside picking out which entities to use at all.
+We want a tool that:
+There were two approaches we took to develop an annotation tool.
+First, we used DisplaCy (opens in new tab), ipyWidgets (opens in new tab), and a NER model of choice to generate an interactive tool that works inside Jupyter notebooks. DisplaCy is a visualiser integrated into the SpaCy library which allows you to easily visualise labels. Alongside ipyWidgets, a tool that allows you to create interactive widgets such as buttons, we created an interface which allowed a user to go through reviews and add new entities.
+One of the main advantages of this method is that everything is inside a Jupyter notebook. The entity names you want to extract come straight from the experiment parameters, so if you used this in the same notebook as the rest of your pipeline the entitiy names could be updated automatically from the labelling tool. This would allow easy integration into a user workflow.
+There is also a button which allows for live feedback from the NER model which is useful given our previous comment on different entitity names having different effects on the NER model.
+This approach was simple and resulted in a fully working example. However, highlighting entities manually was not possible, and this meant it was hard to correct predictions that the model got wrong. You are fully reliant on the labels given by the model, and can't add your own.
+We explored a second option using Streamlit (opens in new tab). Streamlit is a python framework that allows you to build simple web apps. We can use it alongside a package called Streamlit Annotation Tools (opens in new tab) to generate a more interactive user interface. As an example, a user can now use their cursor to highlight particular words and assign them an entity type which is more hands-on and engaging. Unlike our ipyWidgets example, users can select different labels to be displayed which makes the tool less cluttered, and you can easily navigate using a slider to separate reviews. Like the previous widget, there is a button which uses a NER model to label the text and give live feedback. Including this, the tool is more synergistic, easier to use and more immersive than the ipyWidgets alternative.
+However, there were still a few teething issues when developing the Streamlit app. Firstly, Streamlit annotation tool’s has an inability to display \n
as a new line and instead prints \n
, resulting in the structure of text being lost. This is a Streamlit issue and we haven’t yet found a way to keep the structure of the text intact. There was an easy fix in which each \n
was replaced with two spaces (this means the start and end character count for each labelled entity remains consistent with the original structured text), but the structure of the text is still lost which may cause issues for some future users.
Secondly, Streamlit involves a little bit more set up than ipyWidgets. Rather than interacting with the reviews in your notebook you run the app on a local port and access it through your browser. This also makes it harder to retrieve back into your pipeline the list of entities you have labelled. Whilst there is benefit to running all your analysis in one jupyter notebook, the Streamlit app gives a better user experience.
+Both labelling tools we have identified have key advantages. DisplaCy and ipyWidgets fit well into your workflow, whilst Streamlit offers a nicer user experience. ipyWidgets and Streamlit are both versatile tools, and so users can edit the annotation tools in the future to fit their own use case.
+Following the research and development of these two tools, we believe the ability to interactively annotate, explore and extract entities from your data greatly improves the user experience when using our privacy risk scorer pipeline.
+We will publish working examples of annotation using both ipyWidgets and Streamlit, such that a future user can build on them or use them to improve their workflow. The code is available on our github (opens in new tab).
+ + + + + + + + + + + + + + + + + + +++ + +The NHS England Data Science team, as well as a range of other analysts from across the organisation, attended an AI Hackathon at Microsoft, organised by the Data Science Team together with Microsoft and Kainos, with the key stakeholders being the NHS Websites Services Team. In this article, the author shares her experiences at the event.
+
Last week, the long awaited Hack for Health hosted at the beautiful Story Club, in Paddington London, by Microsoft and NHSE finally happened! At the hackathon, every team had either a Kainos or Microsoft representative, as well as a range of participants from across NHS England, including a strong Data Science Team presence, with several of our team members in each group. And unlike a usual hackathon, this one had stakeholders and well defined use cases, we were all working to one common goal.
+The use cases were outlined as:
+The aim was to develop generative AI solutions for website services, who could adapt and improve them for their own use cases. Having use cases meant that it was so much easier to split work up, and get our heads down, with everyone having something to actively work on. It also meant we had a structured approach, making the experience so much more fulfilling. I was delighted by the creativity and range of approaches that the different teams took when it came to presenting at the end of the two days. We ranged from teams that had taken a thin slice approach to all three usecases, to teams that had gone really in depth in just one of them, to teams that had taken use case number three and made it into a tool that could be used by the websites team to improve the websites on the backend, removing conflict and duplication.
+Overall, I was left in awe by the creativity and technical skills of our team, as well as of all the other attendees. Hopefully the work gets used in the future by the websites team, and I hope that any hackathons I attend in the future are of this high quality! (Of course it did help that my team won)
+ +The final results were:
+First Place: A thin slice approach of all three usecases, a project which included 4 of our very own data scientists: Sean Aller, Sudeshna Mallik, Xiyao Zhuang, and myself, as well as Rob Mansfield, Veta Ngammekchay, and our wonderful Kainos helper Peter Bodnar.
+Second Place: Data Scientists Chaeyoon Kim and Warren Davis, with Mary Amanuel, Piyali Dutta, and Farwah Kazmi from elsewhere in the NHS, together with Microsoft's Dan Watkinson and Josh Mercurio developed an AI career coach that was able to draw from and cite relevant information from the HEE website, and was easily customisable to improve communication towards different user personas.
+Third Place: Contradiction Finder by Data Science's Ben Wallace, Matt Taylor, and Jenny Chim, as well as Andrew Walker and Microsoft's Hannah Howell and Hanna Riaz. Focused on use case 3, making a usable tool for the websites team to find contraditions.
+ +Quotes from some of the attendees about their experience:
+Bashir Abubakar
+++The hackathon was an incredible experience that not only allowed me to learn how to use Azure AI Foundry but also deepened my understanding of how large language models (LLMs) can transform healthcare, particularly within the NHS. My fascination with transformers began when I first read Attention is All You Need paper, which revolutionised the NLP space with its groundbreaking approach to self-attention. Seeing this theory in action, from research papers to practical applications, has been nothing short of inspiring. +The hackathon felt like a full-circle moment, as it opened new pathways for applying LLMs in healthcare, a vision I’ve long held for the future of AI professionals (Industry 4.0). It also reinforced the transformative role of prompt engineering, a skill I believe is pivotal in unlocking the potential of AI in creating meaningful solutions.
+
Will Poulett:
+++ + + + + + + + + + + + + + + + + + +The variety of professions within each team was great, it's not often that GP's and data scientists can work together using generative AI. The solutions developed by each team were varied and interesting, I'm looking forward to seeing how they are implemented in the future!
+
++ + +We have built a proof-of-concept tool which will help assurers, data scientists and clinicians to evaluate AI classifiers. We call this the RISE tool, it utilises LLM's, AI Image Generators and an interactive plot to allow users to easily evaluate image classifiers. We carried out careful experimentation to ensure its effectiveness, and plan to continue this research in the future.
+
Within NHS England, testers and assurers are increasingly being asked to assure AI models and systems, including AI classifiers. For assurers who are used to deterministic code and functional testing, this can be quite the challenge. F1 scores, AUC-ROC curves and aptly named confusion matrices are all used by data scientists to evaluate these AI models. These metrics can be hard to understand and for multi-class models can easily trip up anyone – assurer or data scientist. As the development of increasingly complex models increases, it’s important we make it easier for assurers to evaluate AI systems, bridging the gap that currently exists between data scientists and technical assurers.
+Bridging this gap is the aim of the AI Quality Community of Practice - a group of both data scientists and technical assurers. Alongside upskilling technical assurers with training and offering guidance on AI assurance, we have also spent time developing new tools to improve the testing and assurance of AI models – such as using mixup images to try and identify a model’s decision boundary (see our paper here!).
+In this article we present our preliminary study of a new tool - RISE. It is a pipeline leveraging generative AI that aims to make evaluating AI classifiers quicker, easier, and less reliant on data science technical knowledge. It will support assurers, data scientists and even clinicians. We don’t intend for this to replace other AI evaluation methods, rather to compliment them. We believe this tool can help identify potential biases that can’t be found via other techniques, making it incredibly useful throughout the AI development lifecycle.
+RISE stand for Risk-Informed Synthetic Embeddings. On a very high level it follows these steps (don’t worry – we’ll go into more detail later!):
+This tool has a lot of moving parts. Whilst one level of success would be to simply make a working prototype, we want to ensure that this work helps real assurers within NHS England. To do this we created an experiment with the aim of answering the following 4 questions:
+Figure 1 demonstrates how we structured and ran our experiment. It is split into four steps:
+Traditional AI Training + Evaluation
refers to training and evaluating an AI classifier using typical data science techniques.
Risk Informed Image Generation
is the first step of the RISE pipeline, where LLM’s first increase the list of scenarios, then use it to generate image prompts. These are fed into an image generator to create our synthetic test dataset.
The Human Labelling
stage used 14 volunteers to label our synthetic dataset.
The Interactive Tool
stage of the experiment completes the RISE pipeline. Predictions are made by the AI classifier and using dimension reduction techniques we plot the model embeddings on an interactive scatter plot.
Throughout the experiment, we noted results from our evaluations and gained feedback from labellers and end users. At the end of this article, we will refer to these four questions and asses just how successful the experiment was.
+To train a model, we first needed an image dataset. If you had a keen eye, you may have noticed some pictures of dogs and cats in Figure 1, and indeed we used the Animal Faces Dataset to train and evaluate our AI classifier. This may seem like an interesting choice for NHS England where we treat humans rather than pets, but there were various reasons behind this choice.
+It goes without saying that we intend to use this tool on medical datasets in the future, with guidance from clinicians as to how realistic and useful AI generated medical images are. If you want to read ahead, our exact plans on future research can be found at the end of this article.
+Once our dataset was selected, it was time to train a model. Our model was trained using transfer learning on top of the EfficientNetV2S model, with ImageNet weights. The model performed exceptionally well on a test dataset containing 1467 images (493, 491 and 483 images for cats, dogs and wildlife respectively) with 99.8% accuracy. The precision for the cat and dog class was perfect. The only incorrect classifications were three images, all predicted as a wild animal when their label was either a cat or a dog.
+The three images that were incorrectly classified are shown in Figure 3. The left-most image stands out most due to a possible instance of label noise. The image is likely a clouded leopard – wildlife, yet has a ground truth label of a cat. This means the model probably got the prediction right! If we have identified possible label noise in the test dataset, we can assume there are probably instances of it in the training dataset.
+There are certainly improvements to the model training process that we could have used when running this experiment, such as screening for label noise and bias within the training dataset. This is something we would undoubtably do as data scientists working on NHS England projects, however for this experiment having a non-perfect model has some advantages. It means we can expect some areas of poor performance in the model, and then ensure end users are able to spot these errors when trialling the tool.
+ +Once the model was trained, it was time to develop the first stage of the RISE tool – turning a list of scenarios into a synthetic image dataset. This involved the use of both an LLM and an AI Image Generator.
+We wanted a list of scenarios that tested both likely and non-likely scenarios. Likely scenarios are those that are likely to have appeared in the training data - simple images of dogs and cats. Non-likely scenarios are those which won’t have appeared often in the training dataset but may still occur in the future. We may also wish to make an initial guess as to what sort of scenarios may trip the model up. For example, a cat holding a tennis ball may be mistaken as a dog, given this a typically a dog-like behaviour.
+Our LLM of choice for this step was Llama 3.1 8B. Whilst not the most powerful of LLMs, its main advantage was that it could be run locally on a laptop. For future iterations of this tool using medical datasets, this means possible sensitive data never has to leave your computer or data platform. DallE3 was used for image generation. This cannot be run locally, but we found its generation capabilities to be much better than smaller models such as Stable Diffusion v1 which we trialled locally. We expect that higher quality image generation models will be able to run locally in the near future, so were happy to use DallE3 for this experiment.
+An assurance college was given the initial evaluation results alongside a description of the dataset and generated an initial list of 14 scenarios for us to test. These included 'domestic dogs that look like wild dogs' and 'multiple animals in one picture'. We asked Llama 3.1 8B with a temperature of 0.7 to generate an additional list of scenarios, and it did so generating a list of 44 new scenarios. We then asked it again to consider its previous risks and generate some more, this time adding 15 new scenarios. Whenever we used an LLM we followed good prompt guidance, this included asking the model to adopt a persona, asking the model if it missed anything on previous passes and giving examples.
+Compiling all of these risks together we ended up with 20 high quality scenarios, including new scenarios not considered in our initial list. New scenarios included 'unusual or creative use of colour' and 'dogs and cats with medical injuries'. It was clear that an LLM was helpful for generating and considering new scenarios.
+We then used Llama 3.1 8B to generate five image prompts for each scenario, again following good prompt guidance. This was successful, although there were a few interesting errors we experienced when generating LLM responses.
+Here are two examples:
+I've created six detailed prompts to generate synthetic images. These prompts focus on creating images الأسرof animals in motion, blurred faces, and other related scenarios.
2 закрытA dog with a fluffy, cream-colored coat and black markings that resemble a panda's distinctive fur pattern, sitting in a serene garden surrounded by blooming flowers and a tranquil pond. The dog's eyes are closed, and its paws are tucked under its body as it enjoys the peaceful atmosphere.
In both of the above examples, apparently random foreign words appeared in the output. The rest of the image prompt seems fine! We still aren’t sure why this occurred, but it did mean we had to manually review all prompts before using them to generate images.
+We used DallE3, accessed via Bing's Copilot to generate our images using the generated prompts. This was an inefficient step, and in future we will either use a local model or use an API call. However, this method was free and still provided sufficiently high quality images for this piece of work.
+Some example images are shown in Figure 4, including 'edge-case' images.
+ +We wanted to pay particular attention to edge-case image examples, as this mirrors boundary analysis in traditional software testing. Boundary analysis is the testing of values very close to a decision boundary. Some of our scenarios already included edge-case examples. This included 'Crossbreed or hybrid animals', which we hoped would lie closer to the model's decision boundary and would help us identify where the model changes its mind, and which features in an image correspond to this.
+Additionally, we used GPT4o and DallE3 to generate 40 more image prompts for edge case scenarios – in particular hybrid animals. These were animals that had features from multiple classes, and to a human were hard to classify. In a medical dataset, this may be a certain disease with symptoms similar to an alternative disease.
+We used 14 volunteers to label our dataset. Our total dataset was 288 images, of which we considered 148 as 'hard' to classify. Making our dataset smaller we hoped would result in higher quality labels, as volunteers wouldn't get 'button fatigue' – losing engagement in the tool as they did more and more labelling.
+To gather labels we put these 148 images into a new dataset where they were resized to the size used by the model. Model predictions were gathered, and each image was randomly assigned a number of 1 or 0, splitting the dataset randomly in two.
+We then created an image labelling tool using ipywidgets. For each image, users were asked to select whether the image was a domestic cat, domestic dog or of wildlife. There was a 50% chance the user would be told the model’s prediction. As the dataset was randomly split in half, we ensured that for each image there would be seven occasions when the prediction was given, and seven without. This allowed us to explore the effect of a user being told a model's prediction on their classification..
+We decided to keep the labelling tool simplistic to ensure that volunteers did not get bored, and thus gave us high quality labels. This meant removing possible features such as an 'other' button, or 'multiple classes' button. Even with this, we did see button fatigue, when some users got on a roll they made mistakes. If generating a similar label tool in the future, we may wish to consider adding additional features such as a timer which records the how long it takes the user to make a decision, and possibly a back button.
+ + +For each image, we assigned a label based on the most common vote. If an image had six votes as a domestic cat, five as a domestic dog and three as wildlife, we would label it as a domestic cat. We defined confidence as the number of votes for that class divided by the total number of votes. For this example, that would be 6 / 14 which is approximately 43%. For each image, we also assigned a label based on if the users were shown a prediction when classifying the image, or if they were not. When all labels are considered, there were five images where the two most common classes had an equal number of votes. These images are shown in Figure 6.
+For this proof-of-concept piece, the argmax function built into numpy assigned each of these images a label. However, in future iterations of the tool we should handle these occurrences in a more sophisticated way. What this does demonstrate is that we really did generate some edge case images, ones even humans struggle to classify.
+Labels | +Domestic Cat | +Domestic Dog | +Wildlife | +
---|---|---|---|
Predictions shown | +49 | +38 | +61 | +
Predictions not shown | +49 | +36 | +63 | +
All labels | +53 | +37 | +58 | +
The above table shows the different label counts when users were shown model predictions, were not shown model predictions and the combination of both. The differences are small. We knew that some edge case images were hard to classify, these are the ones that received equal numbers of votes for multiple classes. We also knew that some users made mistakes whilst labelling these images with the label tool. This might have explained some of the small differences we see in the above table. The prediction being shown doesn't seem to have had a significant impact.
+There were nine occasions where the labels changed depending on whether or not predictions were shown, this number excluded the five images that were shown in Figure 6. All of these images had a confidence of no greater than 64% across all 14 votes. Given the confidence was low and the images are edge-case, it is not surprising that the label changed. The nature of these images is more likely to explain the changing label, as opposed to users being shown the prediction.
+Across the whole dataset and considering all 14 voters, 82% of labels agreed with model predictions and 39% of labels had 100% confidence in their label.
+There are plenty of ways to improve this section of the experiment for future studies. We have already mentioned the use of a timer, but if we are to move on to a medical dataset and let clinicians use the tool, we may also want a back button or an 'I don't know' button, alongside using a larger cohort of labellers to try and get statistically significant results.
+Let's finally talk about the interactive tool. We had synthetic images, we had labels, all that remained was to create a clear way of plotting and interacting with them.
+To create our scatter plots, we needed a way to turn model predictions into a set of two-dimensional coordinates. This was achieved using hidden layer activations and dimension reduction techniques.
+Our image classifier was a type of neural network. Essentially, neural networks are made up of layers, with each layer containing a number of neurons. When a model makes a prediction on an image, each layer influences the next, using patterns and rules it learned during training. The final layer makes the prediction and in our case contained three neurons, each corresponding to a class: dogs, cats, and wildlife.
+Just before this is the 'final hidden layer', which in our model contained 32 neurons. When making a prediction, each of these neurons produced a number that the final layer used to decide how to classify the image. We could examine the values of these 32 neurons for each image in our dataset. Dimension reduction techniques compressed these 32 values into two dimensions and displayed them as scatter plots. At this stage in the neural network, the model had already identified patterns and similarities between classes, which we could visualise as clusters in the plot, with similar images appearing closer together.
+We picked out five different dimension reduction techniques for our tool: TSNE, PCA, Feature Agglomeration, Isomap and Umap. Each one contains a link to some documentation if you'd like to learn more about how they work. Each technique was given all of the long list of 'hidden layer activations' and compressed these down into two dimensions.
+If an assurer were to inspect these clusters, they may find occurrences where a certain group of similar images are misclassified. They might even have similar features or themes which can then be used to identify risks within the model. With dogs and cats, perhaps a cluster of images of cats holding tennis balls are all misclassified as dogs. In a clinical chest X-ray dataset, perhaps a chest X-ray with a broken rib is instead classified as having a tumour.
+We used Bokeh to create our tool that can be accessed within a Jupyter Notebook. The tool was essentially an interactive scatter plot, where you could use a slider to navigate between different dimension reduction techniques.
+ +Points were coloured based on the model's prediction, and there was the ability to change the shape of each point based on the human assigned labels. When you hovered over a point the image was shown. If you looked at the right had side, you could see a selection of images based on the cluster you highlighted.
+This tool is available on GitHub, and you can try it out yourself here.
+Once our tool was created, we asked a group of assurers to trial it. Two respondents used human assigned labels within the tool, whilst three did not. Each assurer was asked to identify potential risks with the AI classifier, flag which images demonstrated said risks, asked whether they would want to generate additional test data, and also comment on the usability of the tool. We'll summarise the responses from each user below.
+Colleagues using the tool with and without labels were able to identify potential risks. When no labels were included, images of wildlife were misclassified, the image of a cat holding a tennis ball was misclassified, and some users found occasions where images containing no animals were predicted as dogs. One user pointed out the two images seen in Figure 8, which were predicted differently despite looking very similar.
+ +When labels were included, users were similarly successful, finding many of the same examples of those mentioned above.
+When labels were not included, users wanted to see more kittens, puppies and cubs, animals with their eyes closed, animals in action and more wildlife images. One user specified it would be useful to find images that are close to identical but with key features changed. This might be easier to implement now due to the recent release of 'Add it' - a tool using generative AI to easily add new features to existing images. With labels included, users wanted to look further into domestic cats.
+The headline for this section all users found this tool to be very useful!
+That being said, we received various suggestions for improvements. Some of these improvements included:
+Make the prediction of thumbnail images on the right clearer.
+Include a way to flag the image as 'questionable'.
+Add the capability to generate and add new images within the tool.
+Overall, the tool appears to have been used successfully, with users enjoying the experience and identifying risks and misclassifications within the tool. The feedback we received from users was mixed in length and quality, which means directly comparing whether the tool is more useful with or without labels is hard. What is clear is that both cohorts were similarly successful in using the tool.
+I did notice that within the feedback there was some confusion regarding whether some behaviour was part of the tool or part of the AI classifier. One user suggested that the tool should have another class called 'other'. Whilst this is a good insight, it is in fact the model which would need another class. Additionally, some users mixed up terms such as efficiency and accuracy.
+This highlights the need for thorough staff training when using this tool, and many suggestions regarding the tools usability should be acted upon before the next iteration of this work.
+Additionally, none of the feedback referenced any edge-case (hybrid animal) images, instead pointing out occurrences when easily identifiable animals were performing certain behaviours or contained certain features. This is interesting, and may imply that the edge-case images we used were not found to be very helpful by the assurers. Alternatively, our guess of what an edge-case image was might have been completely wrong – the model interpreted images in a completely different way to human users.
+Let's look back at the key questions identified earlier in this article and assess how well we can answer them following the experiment.
+Can we use generative AI to turn scenarios into test data?
+Simply, yes. We were successful in using generative AI to turn a list of scenarios into an image dataset. However, there were lots of manual steps involved in doing so. Ideally, this tool can be turned into a semi-autonomous pipeline where humans check intermediate steps but have less of a need to edit them or clean LLM outputs.
+We don't want users to have to remove random tokens written in another language every time they use the tool!
+Does an interactive tool make evaluating AI classifiers easier?
+This again seems to be successful. Assurers without much knowledge of AI systems were able to identify images and risks, whilst suggesting additional images they’d like to generate and evaluate.
+Do image labels improve the tool?
+This is harder to measure. An ideal answer would be no, as this tool could be used without the time-consuming labelling step, whilst still being useful for evaluations. Whilst there are indications of this, we don’t have enough information to come to a strong conclusion.
+Does knowing the model’s prediction change how evaluators interpret the results?
+Again, this is hard to measure. The differences between seeing the model prediction and not seeing the model prediction when using the labelling tool is small. We likely need to run a larger experiment in order to get more statistically significant results.
+Adding an additional comment, we also gained very little evidence of assurers using hybrid animal pictures to make conclusions about the behaviour of the model. This doesn’t suggest that AI generated images aren’t useful for evaluation, more that our 'guess' of what an 'edge-case' image is, doesn’t line up with the model’s decision boundary.
+So, what next? Whilst working with images of cats and dogs is fun, the entire aim of this work has been to transition to real, clinical datasets, helping assurers, data scientists and testers to evaluate real-world systems. We hope to soon engage with clinicians to find an image dataset that an expert can interpret, and run a similar experiment with them. Figure 9 explains partially why we think this will tool be successful in a clinical dataset, we know that this tool can identify key issues in an animal classifier, so why not a cancer detection model using chest X-rays?
+In the meantime, there are plenty of other improvements that can be made to the tool. New and improved open-source models are being released all the time, these can increase the reliability of the tool and possibly allow high quality image generation to be performed locally. We also received plenty of user feedback on the useability of the interactive tool, all of this should be considered for future iterations.
+The success of this work means that hopefully this is just the beginning. Keep your eyes peeled for more work, articles and research in this area.
+ + + + + + + + + + + + + + + + + + + +++Reproducible analytical pipelines (RAP) help ensure all published statistics meet the highest standards of transparency and reproducibility. Sam Hollings and Alistair Bullward share their insights on adopting RAP and give advice to those starting out.
+
Reproducible analytical pipelines (RAP) are automated statistical and analytical processes that apply to data analysis. It’s a key part of national strategy and widely used in the civil service.
+Over the past year, we’ve been going through a change programme and adopting RAP in our Data Services directorate. We’re still in the early stages of our journey, but already we’ve accomplished a lot and had some hard-learnt lessons.
+ + +++ + +We have built a proof-of-concept tool which will help assurers, data scientists and clinicians to evaluate AI classifiers. We call this the RISE tool, it utilises LLM's, AI Image Generators and an interactive plot to allow users to easily evaluate image classifiers. We carried out careful experimentation to ensure its effectiveness, and plan to continue this research in the future.
+
++ + +The NHS England Data Science team, as well as a range of other analysts from across the organisation, attended an AI Hackathon at Microsoft, organised by the Data Science Team together with Microsoft and Kainos, with the key stakeholders being the NHS Websites Services Team. In this article, the author shares her experiences at the event.
+
++ + +We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data. To use our tool effectivly, users need a basic understanding of the entities within their dataset which may contribute to privacy risk.
+There are various tools for annotating and exploring free text data. The author explores some of these tools and discusses his experiences.
+
++ + +Over recent years, larger, more data-intensive Language Models (LMs) with greatly enhanced performance have been developed. The enhanced functionality has driven widespread interest in adoption of LMs in Healthcare, owing to the large amounts of unstructured text data generated within healthcare pathways.
+However, with this heightened interest, it becomes critical to comprehend the inherent privacy risks associated with these LMs, given the sensitive nature of Healthcare data. This PhD Internship project sought to understand more about the Privacy-Risk Landscape for healthcare LMs through a literature review and exploration of some technical applications.
+
++ + +We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data. To use our tool effectivly, users need a basic understanding of the entities within their dataset which may contribute to privacy risk.
+There are various tools for annotating and exploring free text data. The author explores some of these tools and discusses his experiences.
+
++ + +We have built a proof-of-concept tool which will help assurers, data scientists and clinicians to evaluate AI classifiers. We call this the RISE tool, it utilises LLM's, AI Image Generators and an interactive plot to allow users to easily evaluate image classifiers. We carried out careful experimentation to ensure its effectiveness, and plan to continue this research in the future.
+
++ + +We have built a proof-of-concept tool which will help assurers, data scientists and clinicians to evaluate AI classifiers. We call this the RISE tool, it utilises LLM's, AI Image Generators and an interactive plot to allow users to easily evaluate image classifiers. We carried out careful experimentation to ensure its effectiveness, and plan to continue this research in the future.
+
++ + +The NHS England Data Science team, as well as a range of other analysts from across the organisation, attended an AI Hackathon at Microsoft, organised by the Data Science Team together with Microsoft and Kainos, with the key stakeholders being the NHS Websites Services Team. In this article, the author shares her experiences at the event.
+
++ + +We have built a proof-of-concept tool which will help assurers, data scientists and clinicians to evaluate AI classifiers. We call this the RISE tool, it utilises LLM's, AI Image Generators and an interactive plot to allow users to easily evaluate image classifiers. We carried out careful experimentation to ensure its effectiveness, and plan to continue this research in the future.
+
++ + +We have built a proof-of-concept tool which will help assurers, data scientists and clinicians to evaluate AI classifiers. We call this the RISE tool, it utilises LLM's, AI Image Generators and an interactive plot to allow users to easily evaluate image classifiers. We carried out careful experimentation to ensure its effectiveness, and plan to continue this research in the future.
+
++ + +We have built a proof-of-concept tool which will help assurers, data scientists and clinicians to evaluate AI classifiers. We call this the RISE tool, it utilises LLM's, AI Image Generators and an interactive plot to allow users to easily evaluate image classifiers. We carried out careful experimentation to ensure its effectiveness, and plan to continue this research in the future.
+
++ + +The NHS England Data Science team, as well as a range of other analysts from across the organisation, attended an AI Hackathon at Microsoft, organised by the Data Science Team together with Microsoft and Kainos, with the key stakeholders being the NHS Websites Services Team. In this article, the author shares her experiences at the event.
+
++ + +We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data. To use our tool effectivly, users need a basic understanding of the entities within their dataset which may contribute to privacy risk.
+There are various tools for annotating and exploring free text data. The author explores some of these tools and discusses his experiences.
+
++ + +Over recent years, larger, more data-intensive Language Models (LMs) with greatly enhanced performance have been developed. The enhanced functionality has driven widespread interest in adoption of LMs in Healthcare, owing to the large amounts of unstructured text data generated within healthcare pathways.
+However, with this heightened interest, it becomes critical to comprehend the inherent privacy risks associated with these LMs, given the sensitive nature of Healthcare data. This PhD Internship project sought to understand more about the Privacy-Risk Landscape for healthcare LMs through a literature review and exploration of some technical applications.
+
++ + +The NHS England Data Science team, as well as a range of other analysts from across the organisation, attended an AI Hackathon at Microsoft, organised by the Data Science Team together with Microsoft and Kainos, with the key stakeholders being the NHS Websites Services Team. In this article, the author shares her experiences at the event.
+
++ + +We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data. To use our tool effectivly, users need a basic understanding of the entities within their dataset which may contribute to privacy risk.
+There are various tools for annotating and exploring free text data. The author explores some of these tools and discusses his experiences.
+
++ + +Over recent years, larger, more data-intensive Language Models (LMs) with greatly enhanced performance have been developed. The enhanced functionality has driven widespread interest in adoption of LMs in Healthcare, owing to the large amounts of unstructured text data generated within healthcare pathways.
+However, with this heightened interest, it becomes critical to comprehend the inherent privacy risks associated with these LMs, given the sensitive nature of Healthcare data. This PhD Internship project sought to understand more about the Privacy-Risk Landscape for healthcare LMs through a literature review and exploration of some technical applications.
+
++ + +We have built a proof-of-concept tool which will help assurers, data scientists and clinicians to evaluate AI classifiers. We call this the RISE tool, it utilises LLM's, AI Image Generators and an interactive plot to allow users to easily evaluate image classifiers. We carried out careful experimentation to ensure its effectiveness, and plan to continue this research in the future.
+
++ + +The NHS England Data Science team, as well as a range of other analysts from across the organisation, attended an AI Hackathon at Microsoft, organised by the Data Science Team together with Microsoft and Kainos, with the key stakeholders being the NHS Websites Services Team. In this article, the author shares her experiences at the event.
+
++ + +We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data. To use our tool effectivly, users need a basic understanding of the entities within their dataset which may contribute to privacy risk.
+There are various tools for annotating and exploring free text data. The author explores some of these tools and discusses his experiences.
+
++ + +Over recent years, larger, more data-intensive Language Models (LMs) with greatly enhanced performance have been developed. The enhanced functionality has driven widespread interest in adoption of LMs in Healthcare, owing to the large amounts of unstructured text data generated within healthcare pathways.
+However, with this heightened interest, it becomes critical to comprehend the inherent privacy risks associated with these LMs, given the sensitive nature of Healthcare data. This PhD Internship project sought to understand more about the Privacy-Risk Landscape for healthcare LMs through a literature review and exploration of some technical applications.
+
++Reproducible analytical pipelines (RAP) help ensure all published statistics meet the highest standards of transparency and reproducibility. Sam Hollings and Alistair Bullward share their insights on adopting RAP and give advice to those starting out.
+
Reproducible analytical pipelines (RAP) are automated statistical and analytical processes that apply to data analysis. It’s a key part of national strategy and widely used in the civil service.
+Over the past year, we’ve been going through a change programme and adopting RAP in our Data Services directorate. We’re still in the early stages of our journey, but already we’ve accomplished a lot and had some hard-learnt lessons.
+ + +++Reproducible analytical pipelines (RAP) help ensure all published statistics meet the highest standards of transparency and reproducibility. Sam Hollings and Alistair Bullward share their insights on adopting RAP and give advice to those starting out.
+
Reproducible analytical pipelines (RAP) are automated statistical and analytical processes that apply to data analysis. It’s a key part of national strategy and widely used in the civil service.
+Over the past year, we’ve been going through a change programme and adopting RAP in our Data Services directorate. We’re still in the early stages of our journey, but already we’ve accomplished a lot and had some hard-learnt lessons.
+ + +++ + +We have built a proof-of-concept tool which will help assurers, data scientists and clinicians to evaluate AI classifiers. We call this the RISE tool, it utilises LLM's, AI Image Generators and an interactive plot to allow users to easily evaluate image classifiers. We carried out careful experimentation to ensure its effectiveness, and plan to continue this research in the future.
+
++ + +The NHS England Data Science team, as well as a range of other analysts from across the organisation, attended an AI Hackathon at Microsoft, organised by the Data Science Team together with Microsoft and Kainos, with the key stakeholders being the NHS Websites Services Team. In this article, the author shares her experiences at the event.
+
++ + +We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data. To use our tool effectivly, users need a basic understanding of the entities within their dataset which may contribute to privacy risk.
+There are various tools for annotating and exploring free text data. The author explores some of these tools and discusses his experiences.
+
++ + +Over recent years, larger, more data-intensive Language Models (LMs) with greatly enhanced performance have been developed. The enhanced functionality has driven widespread interest in adoption of LMs in Healthcare, owing to the large amounts of unstructured text data generated within healthcare pathways.
+However, with this heightened interest, it becomes critical to comprehend the inherent privacy risks associated with these LMs, given the sensitive nature of Healthcare data. This PhD Internship project sought to understand more about the Privacy-Risk Landscape for healthcare LMs through a literature review and exploration of some technical applications.
+
++Reproducible analytical pipelines (RAP) help ensure all published statistics meet the highest standards of transparency and reproducibility. Sam Hollings and Alistair Bullward share their insights on adopting RAP and give advice to those starting out.
+
Reproducible analytical pipelines (RAP) are automated statistical and analytical processes that apply to data analysis. It’s a key part of national strategy and widely used in the civil service.
+Over the past year, we’ve been going through a change programme and adopting RAP in our Data Services directorate. We’re still in the early stages of our journey, but already we’ve accomplished a lot and had some hard-learnt lessons.
+ + +