Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed Dataset and uploaded code files #65

Closed
wants to merge 12 commits into from

Large diffs are not rendered by default.

Large diffs are not rendered by default.

1,246 changes: 1,246 additions & 0 deletions Data Science/Parkinsons Disease/Code/EDA and Data Preprocessing.ipynb

Large diffs are not rendered by default.

130 changes: 130 additions & 0 deletions Data Science/Parkinsons Disease/Code/Feature Selection.ipynb

Large diffs are not rendered by default.

82 changes: 82 additions & 0 deletions Data Science/Parkinsons Disease/Code/Merging Data.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 24,
"id": "e1c31b44-419d-4547-9d26-fec1bceb737d",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "4f5943d5-30b4-4623-820b-73b423c249e7",
"metadata": {},
"outputs": [],
"source": [
"##pip install openpyxl"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "5c64e683-8a95-4f69-b0ff-a67d969e4e9e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All CSV files merged successfully into CSV/PD.csv\n"
]
}
],
"source": [
"# folder path containing the CSV files\n",
"folder_path = 'CSV/PD'\n",
"\n",
"# Initialize an empty DataFrame to store all the data\n",
"PD_subjects = pd.DataFrame()\n",
"\n",
"# Loop through all the CSV files in the folder\n",
"for filename in os.listdir(folder_path):\n",
" if filename.endswith('.csv'):\n",
" file_path = os.path.join(folder_path, filename)\n",
" # Read the CSV file\n",
" data = pd.read_csv(file_path)\n",
" # Append the data to the merged DataFrame\n",
" PD_subjects = pd.concat([PD_subjects, data], ignore_index=True)\n",
"\n",
"# Export the merged DataFrame to an Excel file\n",
"output_path = 'CSV/PD.csv'\n",
"PD_subjects.to_csv(output_path, index=False)\n",
"print(f\"All CSV files merged successfully into {output_path}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
80 changes: 80 additions & 0 deletions Data Science/Parkinsons Disease/Code/txt_to_csv.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "1ac3f519-e490-4269-8afd-6a7ea704ede6",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "304ed7d1-9057-43e0-b8c2-babeb6e967bd",
"metadata": {},
"outputs": [
{
"ename": "ParserError",
"evalue": "Error tokenizing data. C error: Expected 26 fields in line 161, saw 30\n",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mParserError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[1;32mIn[9], line 10\u001b[0m\n\u001b[0;32m 7\u001b[0m input_file_path \u001b[38;5;241m=\u001b[39m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(folder_path, txt_file)\n\u001b[0;32m 8\u001b[0m output_file_path \u001b[38;5;241m=\u001b[39m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(output_folder_path, txt_file\u001b[38;5;241m.\u001b[39mreplace(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m.txt\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124m.csv\u001b[39m\u001b[38;5;124m'\u001b[39m))\n\u001b[1;32m---> 10\u001b[0m df \u001b[38;5;241m=\u001b[39m \u001b[43mpd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_csv\u001b[49m\u001b[43m(\u001b[49m\u001b[43minput_file_path\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdelimiter\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;130;43;01m\\t\u001b[39;49;00m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m \n\u001b[0;32m 11\u001b[0m df\u001b[38;5;241m.\u001b[39mto_csv(output_file_path, index\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m)\n\u001b[0;32m 12\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mConverted \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mtxt_file\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m to \u001b[39m\u001b[38;5;132;01m{\u001b[39;00moutput_file_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
"File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\io\\parsers\\readers.py:1026\u001b[0m, in \u001b[0;36mread_csv\u001b[1;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)\u001b[0m\n\u001b[0;32m 1013\u001b[0m kwds_defaults \u001b[38;5;241m=\u001b[39m _refine_defaults_read(\n\u001b[0;32m 1014\u001b[0m dialect,\n\u001b[0;32m 1015\u001b[0m delimiter,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 1022\u001b[0m dtype_backend\u001b[38;5;241m=\u001b[39mdtype_backend,\n\u001b[0;32m 1023\u001b[0m )\n\u001b[0;32m 1024\u001b[0m kwds\u001b[38;5;241m.\u001b[39mupdate(kwds_defaults)\n\u001b[1;32m-> 1026\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_read\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\io\\parsers\\readers.py:626\u001b[0m, in \u001b[0;36m_read\u001b[1;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[0;32m 623\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m parser\n\u001b[0;32m 625\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m parser:\n\u001b[1;32m--> 626\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mparser\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread\u001b[49m\u001b[43m(\u001b[49m\u001b[43mnrows\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\io\\parsers\\readers.py:1923\u001b[0m, in \u001b[0;36mTextFileReader.read\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m 1916\u001b[0m nrows \u001b[38;5;241m=\u001b[39m validate_integer(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnrows\u001b[39m\u001b[38;5;124m\"\u001b[39m, nrows)\n\u001b[0;32m 1917\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m 1918\u001b[0m \u001b[38;5;66;03m# error: \"ParserBase\" has no attribute \"read\"\u001b[39;00m\n\u001b[0;32m 1919\u001b[0m (\n\u001b[0;32m 1920\u001b[0m index,\n\u001b[0;32m 1921\u001b[0m columns,\n\u001b[0;32m 1922\u001b[0m col_dict,\n\u001b[1;32m-> 1923\u001b[0m ) \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_engine\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread\u001b[49m\u001b[43m(\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;66;43;03m# type: ignore[attr-defined]\u001b[39;49;00m\n\u001b[0;32m 1924\u001b[0m \u001b[43m \u001b[49m\u001b[43mnrows\u001b[49m\n\u001b[0;32m 1925\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 1926\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m:\n\u001b[0;32m 1927\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mclose()\n",
"File \u001b[1;32m~\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\pandas\\io\\parsers\\c_parser_wrapper.py:234\u001b[0m, in \u001b[0;36mCParserWrapper.read\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m 232\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m 233\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mlow_memory:\n\u001b[1;32m--> 234\u001b[0m chunks \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_reader\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_low_memory\u001b[49m\u001b[43m(\u001b[49m\u001b[43mnrows\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 235\u001b[0m \u001b[38;5;66;03m# destructive to chunks\u001b[39;00m\n\u001b[0;32m 236\u001b[0m data \u001b[38;5;241m=\u001b[39m _concatenate_chunks(chunks)\n",
"File \u001b[1;32mparsers.pyx:838\u001b[0m, in \u001b[0;36mpandas._libs.parsers.TextReader.read_low_memory\u001b[1;34m()\u001b[0m\n",
"File \u001b[1;32mparsers.pyx:905\u001b[0m, in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[1;34m()\u001b[0m\n",
"File \u001b[1;32mparsers.pyx:874\u001b[0m, in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[1;34m()\u001b[0m\n",
"File \u001b[1;32mparsers.pyx:891\u001b[0m, in \u001b[0;36mpandas._libs.parsers.TextReader._check_tokenize_status\u001b[1;34m()\u001b[0m\n",
"File \u001b[1;32mparsers.pyx:2061\u001b[0m, in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[1;34m()\u001b[0m\n",
"\u001b[1;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 26 fields in line 161, saw 30\n"
]
}
],
"source": [
"folder_path = 'Dataset/'\n",
"output_folder_path = 'CSV files/'\n",
"\n",
"txt_files = [f for f in os.listdir(folder_path) if f.endswith('.txt')]\n",
"\n",
"for txt_file in txt_files:\n",
" input_file_path = os.path.join(folder_path, txt_file)\n",
" output_file_path = os.path.join(output_folder_path, txt_file.replace('.txt', '.csv'))\n",
" \n",
" df = pd.read_csv(input_file_path, delimiter=' ') \n",
" df.to_csv(output_file_path, index=False)\n",
" print(f\"Converted {txt_file} to {output_file_path}\")\n",
"\n",
"print(\"All files converted successfully!\")\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
71 changes: 71 additions & 0 deletions Data Science/Parkinsons Disease/Data Format.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
Data format:
------------

Each line contains 19 columns:

Column 1: Time (in seconds)
Columns 2-9: Vertical ground reaction force (VGRF, in Newton) on each of 8
sensors located under the left foot
Columns 10-17: VGRF on each of the 8 sensors located under the right foot
Column 18: Total force under the left foot
Column 19: Total force under the right foot.

When a person is comfortably standing with both legs parallel to each
other, sensor locations inside the insole can be described (according
to the Infotronic website; http://www.infotronic.nl/) as lying
approximately at the following (X,Y) coordinates, assuming that the
origin (0,0) is just between the legs and the person is facing towards
the positive side of the Y axis:

Sensor X Y
----------------------------
L1 -500 -800
L2 -700 -400
L3 -300 -400
L4 -700 0
L5 -300 0
L6 -700 400
L7 -300 400
L8 -500 800

R1 500 -800
R2 700 -400
R3 300 -400
R4 700 0
R5 300 0
R6 700 400
R7 300 400
R8 500 800

The X and Y numbers are in an arbitrary coordinate system reflecting
the relative (arbitrarily scaled) positions of the sensors within each
insole. During walking, the sensors inside each insole remain at the
same relative position, but the two feet are no longer parallel to
each other. Thus, this coordinate system enables a calculation of a
proxy for the location of the center of pressure (COP) under each
foot.


Data file names:
----------------
These follow a common convention, e.g., GaCo01_02.txt or JuPt03_06.txt,
where

Ga, Ju or Si – indicate the study from which the data originated:
Ga - Galit Yogev et al (dual tasking in PD; Eur J Neuro, 2005)
Ju – Hausdorff et al (RAS in PD; Eur J Neuro, 2007)
Si - Silvi Frenkel-Toledo et al (Treadmill walking in PD; Mov Disorders,
2005)

Co or Pt: Control subject or a PD Patient

01: Subject number in the group

A walk number of 10 (for the "Ga" study) indicates a dual-task walking,
where the subject was engaged in serial-7 subtraction while walking.

A walk number of 01 refers to a usual, normal walk.

.txt: file name extension

The sampling rate was 100 Hz.
Binary file not shown.
Loading
Loading