GitHub - SamyakSheth/AI-Powered-Medical-Literature-Review: Master Thesis 2024

Hello! This file contains information about the github repository. The repository is about AI-powered Medical Literature Review using Large Language Models.

It contains codes for the first screening, full text screening using Retrieval Augmentd Generation Pipeline and codes for web scrappers to automatically extract titles and abstrcts and download PDFS.

IMPORTANT

This project utilizes Large Language Models from Ollama SO make sure you have Ollama installed and your desired models pulled already.

For list of Ollama models go to: (https://ollama.com/library)

First Screening

first_screening.ipynb : Code for the title and abstract screening of papers with three models.
first_screening_inference.ipynb : Code to check the outputs, convert them into 0 1 classification labels based on the criteria and make inferences based on the confusion matrix and classification reports.
first_screening_outputs : Stores the outputs for all the first screening experiemnts.

Full-Text Screening

full_text_screening.ipynb : Code for the full text screening experiments using a RAG based pipeline. It also contains the Confusion matrix and classification reports.
rag_llama.py : Class definitaion of the RAG based pipeline. The components are coded as functions which can be called one by one while execution of the pipeline.
full_text_screening_outputs: Stores the outputs for both the full text screening experiemnts.

Web Scrappers

scrapper_pubmed.py : A seleinum based webscrapper that takes the citation of the paper as an inout and searches for the paper on the github website. Then it extracts the Title, Abstract and DOI for that paper.
scrapper_scihub.py : A selenium based webscrapper that takes a DOI as an input and looks for the paper on Scihub. It then downloads the paper to the specified download directory.
scrapper_pubMed.ipynb : This notebook takes in the UVH first screening results as the ground truth. Runs the scrapper_pubmed first to extract the Titles, Abstracts and DOI. Then runs the scrapper_scihub to download all the required papers. The UVH- Screening (First stage)(ground truth).xlsx contains a lot of papers listes but this project only focuses on the files from source PUBMED.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
first_screening_outputs		first_screening_outputs
full_text_screening_outputs		full_text_screening_outputs
pdfs		pdfs
vectorstores/db_faiss		vectorstores/db_faiss
.DS_Store		.DS_Store
README.md		README.md
UVH- Screening (First stage)(ground truth).xlsx		UVH- Screening (First stage)(ground truth).xlsx
comparison_plot.ipynb		comparison_plot.ipynb
data_distribution.png		data_distribution.png
experiments.png		experiments.png
first_screening.ipynb		first_screening.ipynb
first_screening_inference.ipynb		first_screening_inference.ipynb
full_text_screening.ipynb		full_text_screening.ipynb
pubmed_abstracts.csv		pubmed_abstracts.csv
pubmed_pdfs.csv		pubmed_pdfs.csv
rag_llama.py		rag_llama.py
requirements.txt		requirements.txt
scrapper_pubMed.ipynb		scrapper_pubMed.ipynb
scrapper_pubmed.py		scrapper_pubmed.py
scrapper_scihub.py		scrapper_scihub.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMPORTANT

First Screening

Full-Text Screening

Web Scrappers

About

Releases

Packages

Languages

SamyakSheth/AI-Powered-Medical-Literature-Review

Folders and files

Latest commit

History

Repository files navigation

IMPORTANT

First Screening

Full-Text Screening

Web Scrappers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages