Python Lists Medical Insurance Estimation Project

AxelHolst · Jul 8, 2024 · 495406a · 495406a
1 parent 06ce182
commit 495406a
Show file tree

Hide file tree

Showing 6 changed files with 2,238 additions and 0 deletions.
diff --git a/.ipynb_checkpoints/README-checkpoint.md b/.ipynb_checkpoints/README-checkpoint.md
@@ -0,0 +1 @@
+# Machine Learning Codecademy Project
diff --git a/... Estimation/.ipynb_checkpoints/Python Lists Medical Insurance Estimation-checkpoint.ipynb b/... Estimation/.ipynb_checkpoints/Python Lists Medical Insurance Estimation-checkpoint.ipynb
@@ -0,0 +1,373 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "79f9fe31",
+   "metadata": {},
+   "source": [
+    "# Python Lists: Medical Insurance Estimation Project"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bfa2012",
+   "metadata": {},
+   "source": [
+    "In this project, you will examine how factors such as age, sex, BMI, number of children, and smoking status contribute to medical insurance costs.\n",
+    "\n",
+    "You will apply your new knowledge of Python Lists to store insurance cost data in a list as well as compare **estimated** insurance costs to **actual** insurance costs.\n",
+    "\n",
+    "Let's get started!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "454063d9",
+   "metadata": {},
+   "source": [
+    "## Creating a List"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5eb6fdde",
+   "metadata": {},
+   "source": [
+    "1. First, take a look at the code in the code block below.\n",
+    "\n",
+    "   The function `estimate_insurance_cost()` estimates the medical insurance cost for an individual, based on five variables:\n",
+    "   - `age`: age of the individual in years\n",
+    "   - `sex`: 0 for female, 1 for male\n",
+    "   - `bmi`: individual's body mass index\n",
+    "   - `num_of_children`: number of children the individual has\n",
+    "   - `smoker`: 0 for a non-smoker, 1 for a smoker\n",
+    "   \n",
+    "   These variables are used in the following formula to estimate an individual's insurance cost (in USD):\n",
+    "   \n",
+    "   $$\n",
+    "   insurance\\_cost = 250*age - 128*sex + 370*bmi + 425*num\\_of\\_children + 24000*smoker - 12500\n",
+    "   $$\n",
+    "   \n",
+    "   Observe below the code the estimated insurance costs for three individuals - Maria, Rohan, and Valentina."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "a6e98fae",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Maria's Estimated Insurance Cost: 4222.0 dollars.\n",
+      "Rohan's Estimated Insurance Cost: 5442.0 dollars.\n",
+      "Valentina's Estimated Insurance Cost: 36368.0 dollars.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Function to estimate insurance cost:\n",
+    "def estimate_insurance_cost(name, age, sex, bmi, num_of_children, smoker):\n",
+    "    estimated_cost = 250*age - 128*sex + 370*bmi + 425*num_of_children + 24000*smoker - 12500\n",
+    "    print(name + \"'s Estimated Insurance Cost: \" + str(estimated_cost) + \" dollars.\")\n",
+    "    return estimated_cost\n",
+    "\n",
+    "# Estimate Maria's insurance cost\n",
+    "maria_insurance_cost = estimate_insurance_cost(name = \"Maria\", age = 31, sex = 0, bmi = 23.1, num_of_children = 1, smoker = 0)\n",
+    "\n",
+    "# Estimate Rohan's insurance cost\n",
+    "rohan_insurance_cost = estimate_insurance_cost(name = \"Rohan\", age = 25, sex = 1, bmi = 28.5, num_of_children = 3, smoker = 0)\n",
+    "\n",
+    "# Estimate Valentina's insurance cost\n",
+    "valentina_insurance_cost = estimate_insurance_cost(name = \"Valentina\", age = 53, sex = 0, bmi = 31.4, num_of_children = 0, smoker = 1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87f5d6d5",
+   "metadata": {},
+   "source": [
+    "2. We want to compare the estimated insurance costs (as calculated by our function) to the actual amounts that Maria, Rohan, and Valentina paid.\n",
+    "\n",
+    "   Create a list called `names` and fill it with the names of individuals you are estimating insurance costs for:\n",
+    "   - `\"Maria\"`\n",
+    "   - `\"Rohan\"`\n",
+    "   - `\"Valentina\"`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6e4218d8",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7aad0105",
+   "metadata": {},
+   "source": [
+    "3. Next, create a list called `insurance_costs` and fill it with the actual amounts that Maria, Rohan, and Valentina paid for insurance:\n",
+    "   - `4150.0`\n",
+    "   - `5320.0`\n",
+    "   - `35210.0`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "93fc21ce",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "752b283b",
+   "metadata": {},
+   "source": [
+    "## Combining Lists"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bbc0c97c",
+   "metadata": {},
+   "source": [
+    "4. Currently the `names` and `insurance_costs` lists are separate, but we want each name to be paired with an insurance cost.\n",
+    "\n",
+    "   Create a new variable called `insurance_data` that combines `names` and `insurance_costs` using the `zip()` function.\n",
+    "   \n",
+    "   Print this new variable."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "42f299ef",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1492a2d",
+   "metadata": {},
+   "source": [
+    "5. The output should look something like:\n",
+    "\n",
+    "   ```\n",
+    "   <zip object at 0x7f1631e86b48>\n",
+    "   ```\n",
+    "   \n",
+    "   This output does not mean much to us. To change it to a format we can actually understand, we must convert the `zip` object to a list by doing the following:\n",
+    "   \n",
+    "   ```\n",
+    "   list(zip(____, ____))\n",
+    "   ```\n",
+    "   \n",
+    "   Convert the `insurance_data` object to a list using this method. Run the code to see the result - you should now see a list of names and insurance costs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ef0311a1",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff1c4435",
+   "metadata": {},
+   "source": [
+    "## Appending to a List"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ea69efb",
+   "metadata": {},
+   "source": [
+    "6. Next, create an empty list called `estimated_insurance_data`.\n",
+    "\n",
+    "   This is the list we'll use to store the estimated insurance costs for our three individuals."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "02025200",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7eb29c6b",
+   "metadata": {},
+   "source": [
+    "7. We want to add our estimated insurance data for Maria, Rohan, and Valentina to the `estimated_insurance_data` list.\n",
+    "\n",
+    "   Use `.append()` to add `(\"Maria\", maria_insurance_cost)` to `estimated_insurance_data`. Do the same for Rohan and Valentina."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f0177c39",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99fb3c71",
+   "metadata": {},
+   "source": [
+    "8. Print `estimated_insurance_data`.\n",
+    "\n",
+    "   Make sure the output is what you expected."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d7066b0c",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21da68be",
+   "metadata": {},
+   "source": [
+    "## Inspecting the data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c41d4c11",
+   "metadata": {},
+   "source": [
+    "9. In the output, you should see two lists. The first one represents the **actual** insurance cost data and the second one represents the **estimated** insurance cost data.\n",
+    "\n",
+    "   However, it's difficult to know this just by looking at the output. As a data scientist, you want to make sure that your data is clean and easy to understand.\n",
+    "   \n",
+    "   Add to the print statement for `insurance_data` so that it's clear what the list contains. The output of the print statement should look like:\n",
+    "   \n",
+    "   ```\n",
+    "   Here is the actual insurance cost data: [...list output...]\n",
+    "   ```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0da6c2e5",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48c95b49",
+   "metadata": {},
+   "source": [
+    "10. Do the same for the print statement that prints `estimated_insurance_data`. The output should look like:\n",
+    "\n",
+    "   ```\n",
+    "   Here is the estimated insurance cost data: [...list output...]\n",
+    "   ```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dc701f7b",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34e7aaa4",
+   "metadata": {},
+   "source": [
+    "11. See the results from both tasks above.\n",
+    "\n",
+    "    It should be much more clear from the output what each of the two lists represents, helping you better understand the data you're working with.\n",
+    "    \n",
+    "    You may notice that there are differences between the actual insurance costs and estimated insurance costs. This means that our `estimate_insurance_cost()` function does not calculate insurance costs with 100% accuracy.\n",
+    "    \n",
+    "    Compare the estimated insurance data to the actual insurance data. Do the estimated insurance costs seem to be overestimated or underestimated?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3bce2ea2",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5341142c",
+   "metadata": {},
+   "source": [
+    "## Extra"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "761c4808",
+   "metadata": {},
+   "source": [
+    "12. Congratulations! In this project, you used Python lists to store **estimated** insurance cost data and then compare that data to **actual** insurance cost data.\n",
+    "\n",
+    "    As you've seen, lists are data structures in Python that can contain multiple pieces of data in a single object. As a data scientist, you'll find yourself working with this data structure quite often. You now have a solid foundation to move forward in your data science journey!\n",
+    "    \n",
+    "    If you'd like additional practice on lists, here are some ways you might extend this project:\n",
+    "    - Calculate the difference between the actual insurance cost data and the estimated insurance cost data for each individual, and store the results in a list called `insurance_cost_dif`.\n",
+    "    - Estimate the insurance cost for a new individual, Akira, who is a 19-year-old male non-smoker with no children and a BMI of 27.1. Make sure to append his name to `names` and his actual insurance cost, `2930.0`, to `insurance_costs`.\n",
+    "    \n",
+    "    Happy coding!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8f50b08c",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}