Skip to content

Commit

Permalink
fixed requirements and added a few instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
NadiaBlostein committed Jul 21, 2024
1 parent c0b9466 commit f5f5f5d
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 5 deletions.
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
# Project-Prep-Series-02-Data-Preparation
Data preparation is a critical prerequisite to any data analysis or machine learning application. The purpose of the following workshop is to familiarize students with some data preparation (ie preprocessing) basics in Python 3+, using .csv and .png files.

Part 1 will focus on .csv data preparation and we will be going through the `CSV_preparation.ipynb` notebook. This notebook contains a mini assignment, the answers of which can be found in `CSV_preparation_mini_assignment_answers.ipynb`.

Part 2 will focus on .png (2D image) data preparation and we will be going through the `CSV_preparation.ipynb` notebook. This notebook also contains a mini assignment, the answers of which can be found in `CSV_preparation_mini_assignment_answers.ipynb`.

## Part 1: About the CSV Data
# Setting-Up
Make sure you have Python `3.10.14` or above. To check, type `python -V` from your command-line.
Next, run the following from your command-line:
```
conda create -n WorkshopEnv python=3.10.14
conda activate WorkshopEnv
```

# Part 1: About the CSV Data

### 1.1 Behavioral data: `data_csv/unrestricted_HCP_behavioral.csv`

Expand All @@ -17,7 +24,7 @@ Structural MRI data was acquired from the WU-Minn HCP S1200 Release ([Van Essen

Total brain volume (TBV) as well as the volume and total surface area of 6 other structures (left striatum, right striatum, left thalamus, right thalamus, left globus pallidus, right globus pallidus) were obtained by Nadia Blostein and colleagues from the [Computational Brain Anatomy (CoBrA) Laboratory](https://cobralab.ca/) (Cerebral Imaging. Center, Douglas Mental Health University Institute) under the supervision of Dr. Mallar Chakravarty. Images were processed and volume and surface area measures extracted using a standard lab pipeline that involved the publicly available [minc-bpipe library](https://github.com/CoBrALab/minc-bpipe-library) and [MAGeTbrain segmentation algorithm](https://github.com/CobraLab/MAGeTbrain). More thorough details on image processing and volume obtention can be found [here](https://www.biorxiv.org/content/10.1101/2022.04.11.487874v1).

## Part 2: About the PNG Data
# Part 2: About the PNG Data

We will be using 20 2D chest X-ray images (.png files) from the 500 images available in the open-access [Pulmonary Chest X-Ray Abnormalities](https://www.kaggle.com/kmader/pulmonary-chest-xray-abnormalities) Kaggle dataset. This data was collected by the National Library of Medicine (USA) in collaboration with Shenzhen No.3 People’s Hospital, Guangdong Medical College (China). More info can be found in `data_png/NLM-ChinaCXRSet-ReadMe.docx`. `chest_xrays_pngs` contains the 20 .png files, 10 of which represent a normal lung (`CHNCXR_ID_0.png`) and 10 of which represent an abnormal lung (`CHNCXR_ID_1.png`).

3 changes: 1 addition & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,4 @@ protobuf==4.22.4
scikit_image==0.19.3
scikit_learn==1.2.2
scipy==1.10.1
seaborn==0.12.2
skimage==0.0
seaborn==0.12.2

0 comments on commit f5f5f5d

Please sign in to comment.