From 95ec6cc6bd4d027a981278d488e7f2b74d0383d7 Mon Sep 17 00:00:00 2001 From: NadiaBlostein <33006815+NadiaBlostein@users.noreply.github.com> Date: Mon, 9 May 2022 09:00:08 -0400 Subject: [PATCH] Update README.md --- README.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 5f11dcc..c718c2c 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,9 @@ # Project-Prep-Series-02-Data-Preparation -Data preparation is a critical prerequisite to any downstream analysis, especially within the context of machine learning appliactions. The purpose of the following workshop is to familiarize students with some data preprocessing basics in Python 3+, using .csv and .png files. +Data preparation is a critical prerequisite to any downstream analysis, especially within the context of machine learning applications. The purpose of the following workshop is to familiarize students with some data preprocessing basics in Python 3+, using .csv and .png files.\ +Part 1 will focus on .csv data preparation and we will be walking through the `CSV_preparation.ipynb` notebook. This notebook contains a mini assignment, the answers of which can be found in `CSV_preparation_Mini_Assignment_Answers.ipynb`.\ +Part 2 will focus on .png (2D image) data preparation and we will be walking through the `CSV_preparation.ipynb` notebook. This notebook contains a mini assignment, the answers of which can be found in `CSV_preparation_Mini_Assignment_Answers.ipynb`. -## Part 1: CSV data preprocessing -* Notebook with content + mini assignemnt: `PNG_preprocessing.ipynb` -* Notebook with mini assignment answers: PNG_preprocessing_Mini_Assignment_Answers.ipynb -* THE DATA -All of the csv data that you are provided today comes from the S1200 release of the [Human Connectome Project](http://www.humanconnectomeproject.org/data/) (HCP). This is an open-source initiative containing demographic, behavioural and high-quality neuroimaging data on healthy young adult twin and non-twin siblings. +## Part 1: About the CSV Data ### 1.1 Behavioral data: `data_csv/unrestricted_HCP_behavioral.csv` @@ -17,7 +15,7 @@ Structural MRI data was acquired from the WU-Minn HCP S1200 Release ([Van Essen Total brain volume (TBV) as well as the volume and total surface area of 6 other structures (left striatum, right striatum, left thalamus, right thalamus, left globus pallidus, right globus pallidus) were obtained by Nadia Blostein and colleagues from the [Computational Brain Anatomy (CoBrA) Laboratory](https://cobralab.ca/) (Cerebral Imaging. Center, Douglas Mental Health University Institute) under the supervision of Dr. Mallar Chakravarty. Images were processed and volume and surface area measures extracted using a standard lab pipeline that involved the publicly available [minc-bpipe library](https://github.com/CoBrALab/minc-bpipe-library) and [MAGeTbrain segmentation algorithm](https://github.com/CobraLab/MAGeTbrain). More thorough details on image processing and volume obtention can be found [here](https://www.biorxiv.org/content/10.1101/2022.04.11.487874v1). -## Part 2: PNG data preprocessing -* Notebook with content + mini assignemnt: `PNG_preprocessing.ipynb` -* Notebook with mini assignment answers: PNG_preprocessing_Mini_Assignment_Answers.ipynb -* THE DATA +## Part 2: About the PNG Data + +We will be using 20 2D chest X-ray images (.png files) from the 500 images available in the open-access [Pulmonary Chest X-Ray Abnormalities](https://www.kaggle.com/kmader/pulmonary-chest-xray-abnormalities) Kaggle dataset. This data was collected by the National Library of Medicine (USA) in collaboration with Shenzhen No.3 People’s Hospital, Guangdong Medical College (China). More info can be found in `data_png/NLM-ChinaCXRSet-ReadMe.docx`. `chest_xrays_pngs` contains the 20 .png files, 10 of which represent a normal lung (`CHNCXR_ID_0.png`) and 10 of which represent an abnormal lung (`CHNCXR_ID_1.png`). +