Project Description

In this study, we have successfully replicated and built upon the analysis conducted in a highly influential paper in the field of breast cancer single-cell RNA sequencing (scRNA-seq), titled "Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer" that can be found with this link. Our research not only validates the feasibility of distinguishing between tumor and immune cells at the single-cell level but also sheds light on their characterization using targeted gene sets and unsupervised machine-learning techniques. These findings hold significant promise for advancing personalized medicine strategies in breast cancer. To enhance our analysis, we developed a preprocessing pipeline tailored to our specific research goals. By employing distinct and well-evaluated clustering approaches at each stage of cell separation, we aimed to optimize the accuracy and reliability of our results. Moreover, we incorporated advanced visualization techniques such as UMAP and t-SNE. These additions provided invaluable insights into the underlying organization and relationships within the data, enriching our overall analysis.

A comprehensive report of our findings and analysis can be found in the Project_Report.pdf file.

We present some example figures from the Project_Report.pdf below for easy access:

Visible clusters of carcinoma vs non-carcinoma cells utilizing T-SNE, UMAP, PCA on the Transcripts Per Million (TPM) counts matrix:

T-SNE, UMAP visualization of the Transcripts Per Million (TPM) counts matrix per cell type:

Results for gene expression analysis for ER+, HER2+, and TNBC marker genes for tumor cells and bulk tumors:

Data Availability

Both the publicly available datasets from the reference paper and our own data are available in this Google Drive Folder

Instructions

If you would like to run the code yourself, please follow the instructions below.

Library dependencies

kneed
matplotlib
numpy
pandas
scanpy
anndata
seaborn
sklearn
optuna

Creating the workspace

Clone the repository
Download the reference paper data, as well as our data from the Google Drive Folder and place it in a folder named: datasets.
Run the notebooks in the following order 1. preprocessing.ipynb 2. dimensionalityred.ipynb 3. cellseperation.ipynb 4. R_genefu_results.ipynb 5. cancercellanalysis.ipynb 6. immunecellanalysis.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
Genesets		Genesets
Literature		Literature
R_time		R_time
utils		utils
.gitignore		.gitignore
Input_R_API_Enterz_ID.ipynb		Input_R_API_Enterz_ID.ipynb
Project_Report.pdf		Project_Report.pdf
README.md		README.md
R_genefu_results.ipynb		R_genefu_results.ipynb
cancercellsanalysis.ipynb		cancercellsanalysis.ipynb
cellseperation.ipynb		cellseperation.ipynb
dimensionalityred.ipynb		dimensionalityred.ipynb
immunecellanalysis.ipynb		immunecellanalysis.ipynb
preprocessing.ipynb		preprocessing.ipynb
scGmix.py		scGmix.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

Data Availability

Instructions

Library dependencies

Creating the workspace

About

Releases

Packages

Contributors 2

Languages

KyriakosPsa/BreastCancer-SingleCell

Folders and files

Latest commit

History

Repository files navigation

Project Description

Data Availability

Instructions

Library dependencies

Creating the workspace

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages