diff --git a/.ipynb_checkpoints/README-checkpoint.md b/.ipynb_checkpoints/README-checkpoint.md new file mode 100644 index 0000000..a680558 --- /dev/null +++ b/.ipynb_checkpoints/README-checkpoint.md @@ -0,0 +1 @@ +# Machine Learning Codecademy Project diff --git a/Python Lists Medical Insurance Estimation/.ipynb_checkpoints/Python Lists Medical Insurance Estimation-checkpoint.ipynb b/Python Lists Medical Insurance Estimation/.ipynb_checkpoints/Python Lists Medical Insurance Estimation-checkpoint.ipynb new file mode 100644 index 0000000..40c3e0d --- /dev/null +++ b/Python Lists Medical Insurance Estimation/.ipynb_checkpoints/Python Lists Medical Insurance Estimation-checkpoint.ipynb @@ -0,0 +1,373 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "79f9fe31", + "metadata": {}, + "source": [ + "# Python Lists: Medical Insurance Estimation Project" + ] + }, + { + "cell_type": "markdown", + "id": "5bfa2012", + "metadata": {}, + "source": [ + "In this project, you will examine how factors such as age, sex, BMI, number of children, and smoking status contribute to medical insurance costs.\n", + "\n", + "You will apply your new knowledge of Python Lists to store insurance cost data in a list as well as compare **estimated** insurance costs to **actual** insurance costs.\n", + "\n", + "Let's get started!" + ] + }, + { + "cell_type": "markdown", + "id": "454063d9", + "metadata": {}, + "source": [ + "## Creating a List" + ] + }, + { + "cell_type": "markdown", + "id": "5eb6fdde", + "metadata": {}, + "source": [ + "1. First, take a look at the code in the code block below.\n", + "\n", + " The function `estimate_insurance_cost()` estimates the medical insurance cost for an individual, based on five variables:\n", + " - `age`: age of the individual in years\n", + " - `sex`: 0 for female, 1 for male\n", + " - `bmi`: individual's body mass index\n", + " - `num_of_children`: number of children the individual has\n", + " - `smoker`: 0 for a non-smoker, 1 for a smoker\n", + " \n", + " These variables are used in the following formula to estimate an individual's insurance cost (in USD):\n", + " \n", + " $$\n", + " insurance\\_cost = 250*age - 128*sex + 370*bmi + 425*num\\_of\\_children + 24000*smoker - 12500\n", + " $$\n", + " \n", + " Observe below the code the estimated insurance costs for three individuals - Maria, Rohan, and Valentina." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a6e98fae", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Maria's Estimated Insurance Cost: 4222.0 dollars.\n", + "Rohan's Estimated Insurance Cost: 5442.0 dollars.\n", + "Valentina's Estimated Insurance Cost: 36368.0 dollars.\n" + ] + } + ], + "source": [ + "# Function to estimate insurance cost:\n", + "def estimate_insurance_cost(name, age, sex, bmi, num_of_children, smoker):\n", + " estimated_cost = 250*age - 128*sex + 370*bmi + 425*num_of_children + 24000*smoker - 12500\n", + " print(name + \"'s Estimated Insurance Cost: \" + str(estimated_cost) + \" dollars.\")\n", + " return estimated_cost\n", + "\n", + "# Estimate Maria's insurance cost\n", + "maria_insurance_cost = estimate_insurance_cost(name = \"Maria\", age = 31, sex = 0, bmi = 23.1, num_of_children = 1, smoker = 0)\n", + "\n", + "# Estimate Rohan's insurance cost\n", + "rohan_insurance_cost = estimate_insurance_cost(name = \"Rohan\", age = 25, sex = 1, bmi = 28.5, num_of_children = 3, smoker = 0)\n", + "\n", + "# Estimate Valentina's insurance cost\n", + "valentina_insurance_cost = estimate_insurance_cost(name = \"Valentina\", age = 53, sex = 0, bmi = 31.4, num_of_children = 0, smoker = 1)" + ] + }, + { + "cell_type": "markdown", + "id": "87f5d6d5", + "metadata": {}, + "source": [ + "2. We want to compare the estimated insurance costs (as calculated by our function) to the actual amounts that Maria, Rohan, and Valentina paid.\n", + "\n", + " Create a list called `names` and fill it with the names of individuals you are estimating insurance costs for:\n", + " - `\"Maria\"`\n", + " - `\"Rohan\"`\n", + " - `\"Valentina\"`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6e4218d8", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "7aad0105", + "metadata": {}, + "source": [ + "3. Next, create a list called `insurance_costs` and fill it with the actual amounts that Maria, Rohan, and Valentina paid for insurance:\n", + " - `4150.0`\n", + " - `5320.0`\n", + " - `35210.0`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "93fc21ce", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "752b283b", + "metadata": {}, + "source": [ + "## Combining Lists" + ] + }, + { + "cell_type": "markdown", + "id": "bbc0c97c", + "metadata": {}, + "source": [ + "4. Currently the `names` and `insurance_costs` lists are separate, but we want each name to be paired with an insurance cost.\n", + "\n", + " Create a new variable called `insurance_data` that combines `names` and `insurance_costs` using the `zip()` function.\n", + " \n", + " Print this new variable." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "42f299ef", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "f1492a2d", + "metadata": {}, + "source": [ + "5. The output should look something like:\n", + "\n", + " ```\n", + " \n", + " ```\n", + " \n", + " This output does not mean much to us. To change it to a format we can actually understand, we must convert the `zip` object to a list by doing the following:\n", + " \n", + " ```\n", + " list(zip(____, ____))\n", + " ```\n", + " \n", + " Convert the `insurance_data` object to a list using this method. Run the code to see the result - you should now see a list of names and insurance costs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ef0311a1", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "ff1c4435", + "metadata": {}, + "source": [ + "## Appending to a List" + ] + }, + { + "cell_type": "markdown", + "id": "0ea69efb", + "metadata": {}, + "source": [ + "6. Next, create an empty list called `estimated_insurance_data`.\n", + "\n", + " This is the list we'll use to store the estimated insurance costs for our three individuals." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02025200", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "7eb29c6b", + "metadata": {}, + "source": [ + "7. We want to add our estimated insurance data for Maria, Rohan, and Valentina to the `estimated_insurance_data` list.\n", + "\n", + " Use `.append()` to add `(\"Maria\", maria_insurance_cost)` to `estimated_insurance_data`. Do the same for Rohan and Valentina." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f0177c39", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "99fb3c71", + "metadata": {}, + "source": [ + "8. Print `estimated_insurance_data`.\n", + "\n", + " Make sure the output is what you expected." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d7066b0c", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "21da68be", + "metadata": {}, + "source": [ + "## Inspecting the data" + ] + }, + { + "cell_type": "markdown", + "id": "c41d4c11", + "metadata": {}, + "source": [ + "9. In the output, you should see two lists. The first one represents the **actual** insurance cost data and the second one represents the **estimated** insurance cost data.\n", + "\n", + " However, it's difficult to know this just by looking at the output. As a data scientist, you want to make sure that your data is clean and easy to understand.\n", + " \n", + " Add to the print statement for `insurance_data` so that it's clear what the list contains. The output of the print statement should look like:\n", + " \n", + " ```\n", + " Here is the actual insurance cost data: [...list output...]\n", + " ```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0da6c2e5", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "48c95b49", + "metadata": {}, + "source": [ + "10. Do the same for the print statement that prints `estimated_insurance_data`. The output should look like:\n", + "\n", + " ```\n", + " Here is the estimated insurance cost data: [...list output...]\n", + " ```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc701f7b", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "34e7aaa4", + "metadata": {}, + "source": [ + "11. See the results from both tasks above.\n", + "\n", + " It should be much more clear from the output what each of the two lists represents, helping you better understand the data you're working with.\n", + " \n", + " You may notice that there are differences between the actual insurance costs and estimated insurance costs. This means that our `estimate_insurance_cost()` function does not calculate insurance costs with 100% accuracy.\n", + " \n", + " Compare the estimated insurance data to the actual insurance data. Do the estimated insurance costs seem to be overestimated or underestimated?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3bce2ea2", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "5341142c", + "metadata": {}, + "source": [ + "## Extra" + ] + }, + { + "cell_type": "markdown", + "id": "761c4808", + "metadata": {}, + "source": [ + "12. Congratulations! In this project, you used Python lists to store **estimated** insurance cost data and then compare that data to **actual** insurance cost data.\n", + "\n", + " As you've seen, lists are data structures in Python that can contain multiple pieces of data in a single object. As a data scientist, you'll find yourself working with this data structure quite often. You now have a solid foundation to move forward in your data science journey!\n", + " \n", + " If you'd like additional practice on lists, here are some ways you might extend this project:\n", + " - Calculate the difference between the actual insurance cost data and the estimated insurance cost data for each individual, and store the results in a list called `insurance_cost_dif`.\n", + " - Estimate the insurance cost for a new individual, Akira, who is a 19-year-old male non-smoker with no children and a BMI of 27.1. Make sure to append his name to `names` and his actual insurance cost, `2930.0`, to `insurance_costs`.\n", + " \n", + " Happy coding!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8f50b08c", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Python Lists Medical Insurance Estimation/.ipynb_checkpoints/Python Lists Medical Insurance Estimation_Solution-checkpoint.ipynb b/Python Lists Medical Insurance Estimation/.ipynb_checkpoints/Python Lists Medical Insurance Estimation_Solution-checkpoint.ipynb new file mode 100644 index 0000000..fa47357 --- /dev/null +++ b/Python Lists Medical Insurance Estimation/.ipynb_checkpoints/Python Lists Medical Insurance Estimation_Solution-checkpoint.ipynb @@ -0,0 +1,435 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "79f9fe31", + "metadata": {}, + "source": [ + "# Python Lists: Medical Insurance Estimation Project" + ] + }, + { + "cell_type": "markdown", + "id": "5bfa2012", + "metadata": {}, + "source": [ + "In this project, you will examine how factors such as age, sex, BMI, number of children, and smoking status contribute to medical insurance costs.\n", + "\n", + "You will apply your new knowledge of Python Lists to store insurance cost data in a list as well as compare **estimated** insurance costs to **actual** insurance costs.\n", + "\n", + "Let's get started!" + ] + }, + { + "cell_type": "markdown", + "id": "454063d9", + "metadata": {}, + "source": [ + "## Creating a List" + ] + }, + { + "cell_type": "markdown", + "id": "5eb6fdde", + "metadata": {}, + "source": [ + "1. First, take a look at the code in the code block below.\n", + "\n", + " The function `estimate_insurance_cost()` estimates the medical insurance cost for an individual, based on five variables:\n", + " - `age`: age of the individual in years\n", + " - `sex`: 0 for female, 1 for male\n", + " - `bmi`: individual's body mass index\n", + " - `num_of_children`: number of children the individual has\n", + " - `smoker`: 0 for a non-smoker, 1 for a smoker\n", + " \n", + " These variables are used in the following formula to estimate an individual's insurance cost (in USD):\n", + " \n", + " $$\n", + " insurance\\_cost = 250*age - 128*sex + 370*bmi + 425*num\\_of\\_children + 24000*smoker - 12500\n", + " $$\n", + " \n", + " Observe below the code the estimated insurance costs for three individuals - Maria, Rohan, and Valentina." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a6e98fae", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Maria's Estimated Insurance Cost: 4222.0 dollars.\n", + "Rohan's Estimated Insurance Cost: 5442.0 dollars.\n", + "Valentina's Estimated Insurance Cost: 36368.0 dollars.\n" + ] + } + ], + "source": [ + "# Function to estimate insurance cost:\n", + "def estimate_insurance_cost(name, age, sex, bmi, num_of_children, smoker):\n", + " estimated_cost = 250*age - 128*sex + 370*bmi + 425*num_of_children + 24000*smoker - 12500\n", + " print(name + \"'s Estimated Insurance Cost: \" + str(estimated_cost) + \" dollars.\")\n", + " return estimated_cost\n", + "\n", + "# Estimate Maria's insurance cost\n", + "maria_insurance_cost = estimate_insurance_cost(name = \"Maria\", age = 31, sex = 0, bmi = 23.1, num_of_children = 1, smoker = 0)\n", + "\n", + "# Estimate Rohan's insurance cost\n", + "rohan_insurance_cost = estimate_insurance_cost(name = \"Rohan\", age = 25, sex = 1, bmi = 28.5, num_of_children = 3, smoker = 0)\n", + "\n", + "# Estimate Valentina's insurance cost\n", + "valentina_insurance_cost = estimate_insurance_cost(name = \"Valentina\", age = 53, sex = 0, bmi = 31.4, num_of_children = 0, smoker = 1)" + ] + }, + { + "cell_type": "markdown", + "id": "87f5d6d5", + "metadata": {}, + "source": [ + "2. We want to compare the estimated insurance costs (as calculated by our function) to the actual amounts that Maria, Rohan, and Valentina paid.\n", + "\n", + " Create a list called `names` and fill it with the names of individuals you are estimating insurance costs for:\n", + " - `\"Maria\"`\n", + " - `\"Rohan\"`\n", + " - `\"Valentina\"`" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "6e4218d8", + "metadata": {}, + "outputs": [], + "source": [ + "names = [\"Maria\", \"Rohan\", \"Valentina\"]" + ] + }, + { + "cell_type": "markdown", + "id": "7aad0105", + "metadata": {}, + "source": [ + "3. Next, create a list called `insurance_costs` and fill it with the actual amounts that Maria, Rohan, and Valentina paid for insurance:\n", + " - `4150.0`\n", + " - `5320.0`\n", + " - `35210.0`" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "93fc21ce", + "metadata": {}, + "outputs": [], + "source": [ + "insurance_costs = [4150.0, 5320.0, 35210.0]" + ] + }, + { + "cell_type": "markdown", + "id": "752b283b", + "metadata": {}, + "source": [ + "## Combining Lists" + ] + }, + { + "cell_type": "markdown", + "id": "bbc0c97c", + "metadata": {}, + "source": [ + "4. Currently the `names` and `insurance_costs` lists are separate, but we want each name to be paired with an insurance cost.\n", + "\n", + " Create a new variable called `insurance_data` that combines `names` and `insurance_costs` using the `zip()` function.\n", + " \n", + " Print this new variable." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "42f299ef", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "insurance_data = zip(names, insurance_costs)\n", + "print(insurance_data)" + ] + }, + { + "cell_type": "markdown", + "id": "f1492a2d", + "metadata": {}, + "source": [ + "5. The output should look something like:\n", + "\n", + " ```\n", + " \n", + " ```\n", + " \n", + " This output does not mean much to us. To change it to a format we can actually understand, we must convert the `zip` object to a list by doing the following:\n", + " \n", + " ```\n", + " list(zip(____, ____))\n", + " ```\n", + " \n", + " Convert the `insurance_data` object to a list using this method. Run the code to see the result - you should now see a list of names and insurance costs." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "ef0311a1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('Maria', 4150.0), ('Rohan', 5320.0), ('Valentina', 35210.0)]\n" + ] + } + ], + "source": [ + "insurance_data = list(zip(names, insurance_costs))\n", + "print(insurance_data)" + ] + }, + { + "cell_type": "markdown", + "id": "4c11b497", + "metadata": {}, + "source": [ + "## Appending to a List" + ] + }, + { + "cell_type": "markdown", + "id": "d0c41244", + "metadata": {}, + "source": [ + "6. Next, create an empty list called `estimated_insurance_data`.\n", + "\n", + " This is the list we'll use to store the estimated insurance costs for our three individuals." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "51434ee9", + "metadata": {}, + "outputs": [], + "source": [ + "estimated_insurance_data = []" + ] + }, + { + "cell_type": "markdown", + "id": "a75c64a3", + "metadata": {}, + "source": [ + "7. We want to add our estimated insurance data for Maria, Rohan, and Valentina to the `estimated_insurance_data` list.\n", + "\n", + " Use `.append()` to add `(\"Maria\", maria_insurance_cost)` to `estimated_insurance_data`. Do the same for Rohan and Valentina." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "d05e539e", + "metadata": {}, + "outputs": [], + "source": [ + "estimated_insurance_data.append((\"Maria\", maria_insurance_cost))\n", + "estimated_insurance_data.append((\"Rohan\", rohan_insurance_cost))\n", + "estimated_insurance_data.append((\"Valentina\", valentina_insurance_cost))" + ] + }, + { + "cell_type": "markdown", + "id": "0df5c983", + "metadata": {}, + "source": [ + "8. Print `estimated_insurance_data`.\n", + "\n", + " Make sure the output is what you expected." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "b70678b5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('Maria', 4222.0), ('Rohan', 5442.0), ('Valentina', 36368.0)]\n" + ] + } + ], + "source": [ + "print(estimated_insurance_data)" + ] + }, + { + "cell_type": "markdown", + "id": "1a7ef98a", + "metadata": {}, + "source": [ + "## Inspecting the data" + ] + }, + { + "cell_type": "markdown", + "id": "71158a3a", + "metadata": {}, + "source": [ + "9. In the output, you should see two lists. The first one represents the **actual** insurance cost data and the second one represents the **estimated** insurance cost data.\n", + "\n", + " However, it's difficult to know this just by looking at the output. As a data scientist, you want to make sure that your data is clean and easy to understand.\n", + " \n", + " Add to the print statement for `insurance_data` so that it's clear what the list contains. The output of the print statement should look like:\n", + " \n", + " ```\n", + " Here is the actual insurance cost data: [...list output...]\n", + " ```" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "8fcf3709", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Here is the actual insurance cost data: [('Maria', 4150.0), ('Rohan', 5320.0), ('Valentina', 35210.0)]\n" + ] + } + ], + "source": [ + "print(\"Here is the actual insurance cost data: \" + str(insurance_data))" + ] + }, + { + "cell_type": "markdown", + "id": "c7a0802a", + "metadata": {}, + "source": [ + "10. Do the same for the print statement that prints `estimated_insurance_data`. The output should look like:\n", + "\n", + " ```\n", + " Here is the estimated insurance cost data: [...list output...]\n", + " ```" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "609e2b65", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Here is the estimated insurance cost data: [('Maria', 4222.0), ('Rohan', 5442.0), ('Valentina', 36368.0)]\n" + ] + } + ], + "source": [ + "print(\"Here is the estimated insurance cost data: \" + str(estimated_insurance_data))" + ] + }, + { + "cell_type": "markdown", + "id": "b9742315", + "metadata": {}, + "source": [ + "11. See the results from both tasks above.\n", + "\n", + " It should be much more clear from the output what each of the two lists represents, helping you better understand the data you're working with.\n", + " \n", + " You may notice that there are differences between the actual insurance costs and estimated insurance costs. This means that our `estimate_insurance_cost()` function does not calculate insurance costs with 100% accuracy.\n", + " \n", + " Compare the estimated insurance data to the actual insurance data. Do the estimated insurance costs seem to be overestimated or underestimated?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "761203b1", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "9c0014e8", + "metadata": {}, + "source": [ + "## Extra" + ] + }, + { + "cell_type": "markdown", + "id": "6c213aec", + "metadata": {}, + "source": [ + "12. Congratulations! In this project, you used Python lists to store **estimated** insurance cost data and then compare that data to **actual** insurance cost data.\n", + "\n", + " As you've seen, lists are data structures in Python that can contain multiple pieces of data in a single object. As a data scientist, you'll find yourself working with this data structure quite often. You now have a solid foundation to move forward in your data science journey!\n", + " \n", + " If you'd like additional practice on lists, here are some ways you might extend this project:\n", + " - Calculate the difference between the actual insurance cost data and the estimated insurance cost data for each individual, and store the results in a list called `insurance_cost_dif`.\n", + " - Estimate the insurance cost for a new individual, Akira, who is a 19-year-old male non-smoker with no children and a BMI of 27.1. Make sure to append his name to `names` and his actual insurance cost, `2930.0`, to `insurance_costs`.\n", + " \n", + " Happy coding!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20d300f8", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Python Lists Medical Insurance Estimation/Python Lists Medical Insurance Estimation.ipynb b/Python Lists Medical Insurance Estimation/Python Lists Medical Insurance Estimation.ipynb new file mode 100644 index 0000000..2d79978 --- /dev/null +++ b/Python Lists Medical Insurance Estimation/Python Lists Medical Insurance Estimation.ipynb @@ -0,0 +1,498 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "79f9fe31", + "metadata": {}, + "source": [ + "# Python Lists: Medical Insurance Estimation Project" + ] + }, + { + "cell_type": "markdown", + "id": "5bfa2012", + "metadata": {}, + "source": [ + "In this project, you will examine how factors such as age, sex, BMI, number of children, and smoking status contribute to medical insurance costs.\n", + "\n", + "You will apply your new knowledge of Python Lists to store insurance cost data in a list as well as compare **estimated** insurance costs to **actual** insurance costs.\n", + "\n", + "Let's get started!" + ] + }, + { + "cell_type": "markdown", + "id": "454063d9", + "metadata": {}, + "source": [ + "## Creating a List" + ] + }, + { + "cell_type": "markdown", + "id": "5eb6fdde", + "metadata": {}, + "source": [ + "1. First, take a look at the code in the code block below.\n", + "\n", + " The function `estimate_insurance_cost()` estimates the medical insurance cost for an individual, based on five variables:\n", + " - `age`: age of the individual in years\n", + " - `sex`: 0 for female, 1 for male\n", + " - `bmi`: individual's body mass index\n", + " - `num_of_children`: number of children the individual has\n", + " - `smoker`: 0 for a non-smoker, 1 for a smoker\n", + " \n", + " These variables are used in the following formula to estimate an individual's insurance cost (in USD):\n", + " \n", + " $$\n", + " insurance\\_cost = 250*age - 128*sex + 370*bmi + 425*num\\_of\\_children + 24000*smoker - 12500\n", + " $$\n", + " \n", + " Observe below the code the estimated insurance costs for three individuals - Maria, Rohan, and Valentina." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a6e98fae", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Maria's Estimated Insurance Cost: 4222.0 dollars.\n", + "Rohan's Estimated Insurance Cost: 5442.0 dollars.\n", + "Valentina's Estimated Insurance Cost: 36368.0 dollars.\n" + ] + } + ], + "source": [ + "# Function to estimate insurance cost:\n", + "def estimate_insurance_cost(name, age, sex, bmi, num_of_children, smoker):\n", + " estimated_cost = 250*age - 128*sex + 370*bmi + 425*num_of_children + 24000*smoker - 12500\n", + " print(name + \"'s Estimated Insurance Cost: \" + str(estimated_cost) + \" dollars.\")\n", + " return estimated_cost\n", + "\n", + "# Estimate Maria's insurance cost\n", + "maria_insurance_cost = estimate_insurance_cost(name = \"Maria\", age = 31, sex = 0, bmi = 23.1, num_of_children = 1, smoker = 0)\n", + "\n", + "# Estimate Rohan's insurance cost\n", + "rohan_insurance_cost = estimate_insurance_cost(name = \"Rohan\", age = 25, sex = 1, bmi = 28.5, num_of_children = 3, smoker = 0)\n", + "\n", + "# Estimate Valentina's insurance cost\n", + "valentina_insurance_cost = estimate_insurance_cost(name = \"Valentina\", age = 53, sex = 0, bmi = 31.4, num_of_children = 0, smoker = 1)" + ] + }, + { + "cell_type": "markdown", + "id": "87f5d6d5", + "metadata": {}, + "source": [ + "2. We want to compare the estimated insurance costs (as calculated by our function) to the actual amounts that Maria, Rohan, and Valentina paid.\n", + "\n", + " Create a list called `names` and fill it with the names of individuals you are estimating insurance costs for:\n", + " - `\"Maria\"`\n", + " - `\"Rohan\"`\n", + " - `\"Valentina\"`" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "6e4218d8", + "metadata": {}, + "outputs": [], + "source": [ + "names = [\"Maria\", \"Rohan\", \"Valentina\"]\n" + ] + }, + { + "cell_type": "markdown", + "id": "7aad0105", + "metadata": {}, + "source": [ + "3. Next, create a list called `insurance_costs` and fill it with the actual amounts that Maria, Rohan, and Valentina paid for insurance:\n", + " - `4150.0`\n", + " - `5320.0`\n", + " - `35210.0`" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "93fc21ce", + "metadata": {}, + "outputs": [], + "source": [ + "insurance_costs = [4150.0, 5320.0, 35210.0]\n" + ] + }, + { + "cell_type": "markdown", + "id": "752b283b", + "metadata": {}, + "source": [ + "## Combining Lists" + ] + }, + { + "cell_type": "markdown", + "id": "bbc0c97c", + "metadata": {}, + "source": [ + "4. Currently the `names` and `insurance_costs` lists are separate, but we want each name to be paired with an insurance cost.\n", + "\n", + " Create a new variable called `insurance_data` that combines `names` and `insurance_costs` using the `zip()` function.\n", + " \n", + " Print this new variable." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "42f299ef", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "insurance_data = zip(names, insurance_costs)\n", + "print(insurance_data)" + ] + }, + { + "cell_type": "markdown", + "id": "f1492a2d", + "metadata": {}, + "source": [ + "5. The output should look something like:\n", + "\n", + " ```\n", + " \n", + " ```\n", + " \n", + " This output does not mean much to us. To change it to a format we can actually understand, we must convert the `zip` object to a list by doing the following:\n", + " \n", + " ```\n", + " list(zip(____, ____))\n", + " ```\n", + " \n", + " Convert the `insurance_data` object to a list using this method. Run the code to see the result - you should now see a list of names and insurance costs." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "ef0311a1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('Maria', 4150.0), ('Rohan', 5320.0), ('Valentina', 35210.0)]\n" + ] + } + ], + "source": [ + "insurance_data = list(zip(names, insurance_costs))\n", + "print(insurance_data)\n" + ] + }, + { + "cell_type": "markdown", + "id": "ff1c4435", + "metadata": {}, + "source": [ + "## Appending to a List" + ] + }, + { + "cell_type": "markdown", + "id": "0ea69efb", + "metadata": {}, + "source": [ + "6. Next, create an empty list called `estimated_insurance_data`.\n", + "\n", + " This is the list we'll use to store the estimated insurance costs for our three individuals." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "02025200", + "metadata": {}, + "outputs": [], + "source": [ + "estimated_insurance_data = []\n" + ] + }, + { + "cell_type": "markdown", + "id": "7eb29c6b", + "metadata": {}, + "source": [ + "7. We want to add our estimated insurance data for Maria, Rohan, and Valentina to the `estimated_insurance_data` list.\n", + "\n", + " Use `.append()` to add `(\"Maria\", maria_insurance_cost)` to `estimated_insurance_data`. Do the same for Rohan and Valentina." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "f0177c39", + "metadata": {}, + "outputs": [], + "source": [ + "estimated_insurance_data.append((\"Maria\", maria_insurance_cost))\n", + "estimated_insurance_data.append((\"Rohan\", rohan_insurance_cost))\n", + "estimated_insurance_data.append((\"Valentina\", valentina_insurance_cost))\n" + ] + }, + { + "cell_type": "markdown", + "id": "99fb3c71", + "metadata": {}, + "source": [ + "8. Print `estimated_insurance_data`.\n", + "\n", + " Make sure the output is what you expected." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "d7066b0c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('Maria', 4222.0), ('Rohan', 5442.0), ('Valentina', 36368.0)]\n" + ] + } + ], + "source": [ + "print(estimated_insurance_data)\n" + ] + }, + { + "cell_type": "markdown", + "id": "21da68be", + "metadata": {}, + "source": [ + "## Inspecting the data" + ] + }, + { + "cell_type": "markdown", + "id": "c41d4c11", + "metadata": {}, + "source": [ + "9. In the output, you should see two lists. The first one represents the **actual** insurance cost data and the second one represents the **estimated** insurance cost data.\n", + "\n", + " However, it's difficult to know this just by looking at the output. As a data scientist, you want to make sure that your data is clean and easy to understand.\n", + " \n", + " Add to the print statement for `insurance_data` so that it's clear what the list contains. The output of the print statement should look like:\n", + " \n", + " ```\n", + " Here is the actual insurance cost data: [...list output...]\n", + " ```" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "0da6c2e5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Here is the estimated insurance cost data: [('Maria', 4222.0), ('Rohan', 5442.0), ('Valentina', 36368.0)]\n" + ] + } + ], + "source": [ + "print(\"Here is the estimated insurance cost data: \" + str(estimated_insurance_data))\n" + ] + }, + { + "cell_type": "markdown", + "id": "48c95b49", + "metadata": {}, + "source": [ + "10. Do the same for the print statement that prints `estimated_insurance_data`. The output should look like:\n", + "\n", + " ```\n", + " Here is the estimated insurance cost data: [...list output...]\n", + " ```" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "dc701f7b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Here is the actual insurance cost data: [('Maria', 4150.0), ('Rohan', 5320.0), ('Valentina', 35210.0)]\n" + ] + } + ], + "source": [ + "print(\"Here is the actual insurance cost data: \" + str(insurance_data))\n" + ] + }, + { + "cell_type": "markdown", + "id": "34e7aaa4", + "metadata": {}, + "source": [ + "11. See the results from both tasks above.\n", + "\n", + " It should be much more clear from the output what each of the two lists represents, helping you better understand the data you're working with.\n", + " \n", + " You may notice that there are differences between the actual insurance costs and estimated insurance costs. This means that our `estimate_insurance_cost()` function does not calculate insurance costs with 100% accuracy.\n", + " \n", + " Compare the estimated insurance data to the actual insurance data. Do the estimated insurance costs seem to be overestimated or underestimated?" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "3bce2ea2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'From the output, we can observe the differences between the actual and estimated insurance costs. The estimated costs are slightly higher than the actual costs, which indicates that our formula may overestimate the insurance costs.'" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\"\"\"From the output, we can observe the differences between the actual and estimated insurance costs. The estimated costs are slightly higher than the actual costs, which indicates that our formula may overestimate the insurance costs.\"\"\"" + ] + }, + { + "cell_type": "markdown", + "id": "5341142c", + "metadata": {}, + "source": [ + "## Extra" + ] + }, + { + "cell_type": "markdown", + "id": "761c4808", + "metadata": { + "jupyter": { + "source_hidden": true + } + }, + "source": [ + "12. Congratulations! In this project, you used Python lists to store **estimated** insurance cost data and then compare that data to **actual** insurance cost data.\n", + "\n", + " As you've seen, lists are data structures in Python that can contain multiple pieces of data in a single object. As a data scientist, you'll find yourself working with this data structure quite often. You now have a solid foundation to move forward in your data science journey!\n", + " \n", + " If you'd like additional practice on lists, here are some ways you might extend this project:\n", + " - Calculate the difference between the actual insurance cost data and the estimated insurance cost data for each individual, and store the results in a list called `insurance_cost_dif`.\n", + " - Estimate the insurance cost for a new individual, Akira, who is a 19-year-old male non-smoker with no children and a BMI of 27.1. Make sure to append his name to `names` and his actual insurance cost, `2930.0`, to `insurance_costs`.\n", + " \n", + " Happy coding!" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "8f50b08c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Differences between estimated and actual costs: [('Maria', 72.0), ('Rohan', 122.0), ('Valentina', 1158.0)]\n" + ] + } + ], + "source": [ + "insurance_cost_dif = []\n", + "for i in range(len(names)):\n", + " dif = estimated_insurance_data[i][1] - insurance_data[i][1]\n", + " insurance_cost_dif.append((names[i], dif))\n", + "\n", + "print(\"Differences between estimated and actual costs: \" + str(insurance_cost_dif))\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "001e94de-b2b5-468c-8dcb-632a4f7ba99a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Akira's Estimated Insurance Cost: 2149.0 dollars.\n", + "Updated actual insurance cost data: [('Maria', 4150.0), ('Rohan', 5320.0), ('Valentina', 35210.0), ('Akira', 2930.0)]\n", + "Updated estimated insurance cost data: [('Maria', 4222.0), ('Rohan', 5442.0), ('Valentina', 36368.0), ('Akira', 2149.0)]\n" + ] + } + ], + "source": [ + "# Estimate Akira's insurance cost\n", + "akira_insurance_cost = estimate_insurance_cost(name=\"Akira\", age=19, sex=1, bmi=27.1, num_of_children=0, smoker=0)\n", + "\n", + "# Add Akira's data to the lists\n", + "names.append(\"Akira\")\n", + "insurance_costs.append(2930.0)\n", + "estimated_insurance_data.append((\"Akira\", akira_insurance_cost))\n", + "\n", + "# Print updated lists\n", + "insurance_data = list(zip(names, insurance_costs))\n", + "print(\"Updated actual insurance cost data: \" + str(insurance_data))\n", + "print(\"Updated estimated insurance cost data: \" + str(estimated_insurance_data))\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Python Lists Medical Insurance Estimation/Python Lists Medical Insurance Estimation_Solution.ipynb b/Python Lists Medical Insurance Estimation/Python Lists Medical Insurance Estimation_Solution.ipynb new file mode 100644 index 0000000..6aa1127 --- /dev/null +++ b/Python Lists Medical Insurance Estimation/Python Lists Medical Insurance Estimation_Solution.ipynb @@ -0,0 +1,435 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "79f9fe31", + "metadata": {}, + "source": [ + "# Python Lists: Medical Insurance Estimation Project" + ] + }, + { + "cell_type": "markdown", + "id": "5bfa2012", + "metadata": {}, + "source": [ + "In this project, you will examine how factors such as age, sex, BMI, number of children, and smoking status contribute to medical insurance costs.\n", + "\n", + "You will apply your new knowledge of Python Lists to store insurance cost data in a list as well as compare **estimated** insurance costs to **actual** insurance costs.\n", + "\n", + "Let's get started!" + ] + }, + { + "cell_type": "markdown", + "id": "454063d9", + "metadata": {}, + "source": [ + "## Creating a List" + ] + }, + { + "cell_type": "markdown", + "id": "5eb6fdde", + "metadata": {}, + "source": [ + "1. First, take a look at the code in the code block below.\n", + "\n", + " The function `estimate_insurance_cost()` estimates the medical insurance cost for an individual, based on five variables:\n", + " - `age`: age of the individual in years\n", + " - `sex`: 0 for female, 1 for male\n", + " - `bmi`: individual's body mass index\n", + " - `num_of_children`: number of children the individual has\n", + " - `smoker`: 0 for a non-smoker, 1 for a smoker\n", + " \n", + " These variables are used in the following formula to estimate an individual's insurance cost (in USD):\n", + " \n", + " $$\n", + " insurance\\_cost = 250*age - 128*sex + 370*bmi + 425*num\\_of\\_children + 24000*smoker - 12500\n", + " $$\n", + " \n", + " Observe below the code the estimated insurance costs for three individuals - Maria, Rohan, and Valentina." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a6e98fae", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Maria's Estimated Insurance Cost: 4222.0 dollars.\n", + "Rohan's Estimated Insurance Cost: 5442.0 dollars.\n", + "Valentina's Estimated Insurance Cost: 36368.0 dollars.\n" + ] + } + ], + "source": [ + "# Function to estimate insurance cost:\n", + "def estimate_insurance_cost(name, age, sex, bmi, num_of_children, smoker):\n", + " estimated_cost = 250*age - 128*sex + 370*bmi + 425*num_of_children + 24000*smoker - 12500\n", + " print(name + \"'s Estimated Insurance Cost: \" + str(estimated_cost) + \" dollars.\")\n", + " return estimated_cost\n", + "\n", + "# Estimate Maria's insurance cost\n", + "maria_insurance_cost = estimate_insurance_cost(name = \"Maria\", age = 31, sex = 0, bmi = 23.1, num_of_children = 1, smoker = 0)\n", + "\n", + "# Estimate Rohan's insurance cost\n", + "rohan_insurance_cost = estimate_insurance_cost(name = \"Rohan\", age = 25, sex = 1, bmi = 28.5, num_of_children = 3, smoker = 0)\n", + "\n", + "# Estimate Valentina's insurance cost\n", + "valentina_insurance_cost = estimate_insurance_cost(name = \"Valentina\", age = 53, sex = 0, bmi = 31.4, num_of_children = 0, smoker = 1)" + ] + }, + { + "cell_type": "markdown", + "id": "87f5d6d5", + "metadata": {}, + "source": [ + "2. We want to compare the estimated insurance costs (as calculated by our function) to the actual amounts that Maria, Rohan, and Valentina paid.\n", + "\n", + " Create a list called `names` and fill it with the names of individuals you are estimating insurance costs for:\n", + " - `\"Maria\"`\n", + " - `\"Rohan\"`\n", + " - `\"Valentina\"`" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "6e4218d8", + "metadata": {}, + "outputs": [], + "source": [ + "names = [\"Maria\", \"Rohan\", \"Valentina\"]" + ] + }, + { + "cell_type": "markdown", + "id": "7aad0105", + "metadata": {}, + "source": [ + "3. Next, create a list called `insurance_costs` and fill it with the actual amounts that Maria, Rohan, and Valentina paid for insurance:\n", + " - `4150.0`\n", + " - `5320.0`\n", + " - `35210.0`" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "93fc21ce", + "metadata": {}, + "outputs": [], + "source": [ + "insurance_costs = [4150.0, 5320.0, 35210.0]" + ] + }, + { + "cell_type": "markdown", + "id": "752b283b", + "metadata": {}, + "source": [ + "## Combining Lists" + ] + }, + { + "cell_type": "markdown", + "id": "bbc0c97c", + "metadata": {}, + "source": [ + "4. Currently the `names` and `insurance_costs` lists are separate, but we want each name to be paired with an insurance cost.\n", + "\n", + " Create a new variable called `insurance_data` that combines `names` and `insurance_costs` using the `zip()` function.\n", + " \n", + " Print this new variable." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "42f299ef", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "insurance_data = zip(names, insurance_costs)\n", + "print(insurance_data)" + ] + }, + { + "cell_type": "markdown", + "id": "f1492a2d", + "metadata": {}, + "source": [ + "5. The output should look something like:\n", + "\n", + " ```\n", + " \n", + " ```\n", + " \n", + " This output does not mean much to us. To change it to a format we can actually understand, we must convert the `zip` object to a list by doing the following:\n", + " \n", + " ```\n", + " list(zip(____, ____))\n", + " ```\n", + " \n", + " Convert the `insurance_data` object to a list using this method. Run the code to see the result - you should now see a list of names and insurance costs." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "ef0311a1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('Maria', 4150.0), ('Rohan', 5320.0), ('Valentina', 35210.0)]\n" + ] + } + ], + "source": [ + "insurance_data = list(zip(names, insurance_costs))\n", + "print(insurance_data)" + ] + }, + { + "cell_type": "markdown", + "id": "4c11b497", + "metadata": {}, + "source": [ + "## Appending to a List" + ] + }, + { + "cell_type": "markdown", + "id": "d0c41244", + "metadata": {}, + "source": [ + "6. Next, create an empty list called `estimated_insurance_data`.\n", + "\n", + " This is the list we'll use to store the estimated insurance costs for our three individuals." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "51434ee9", + "metadata": {}, + "outputs": [], + "source": [ + "estimated_insurance_data = []" + ] + }, + { + "cell_type": "markdown", + "id": "a75c64a3", + "metadata": {}, + "source": [ + "7. We want to add our estimated insurance data for Maria, Rohan, and Valentina to the `estimated_insurance_data` list.\n", + "\n", + " Use `.append()` to add `(\"Maria\", maria_insurance_cost)` to `estimated_insurance_data`. Do the same for Rohan and Valentina." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "d05e539e", + "metadata": {}, + "outputs": [], + "source": [ + "estimated_insurance_data.append((\"Maria\", maria_insurance_cost))\n", + "estimated_insurance_data.append((\"Rohan\", rohan_insurance_cost))\n", + "estimated_insurance_data.append((\"Valentina\", valentina_insurance_cost))" + ] + }, + { + "cell_type": "markdown", + "id": "0df5c983", + "metadata": {}, + "source": [ + "8. Print `estimated_insurance_data`.\n", + "\n", + " Make sure the output is what you expected." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "b70678b5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('Maria', 4222.0), ('Rohan', 5442.0), ('Valentina', 36368.0)]\n" + ] + } + ], + "source": [ + "print(estimated_insurance_data)" + ] + }, + { + "cell_type": "markdown", + "id": "1a7ef98a", + "metadata": {}, + "source": [ + "## Inspecting the data" + ] + }, + { + "cell_type": "markdown", + "id": "71158a3a", + "metadata": {}, + "source": [ + "9. In the output, you should see two lists. The first one represents the **actual** insurance cost data and the second one represents the **estimated** insurance cost data.\n", + "\n", + " However, it's difficult to know this just by looking at the output. As a data scientist, you want to make sure that your data is clean and easy to understand.\n", + " \n", + " Add to the print statement for `insurance_data` so that it's clear what the list contains. The output of the print statement should look like:\n", + " \n", + " ```\n", + " Here is the actual insurance cost data: [...list output...]\n", + " ```" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "8fcf3709", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Here is the actual insurance cost data: [('Maria', 4150.0), ('Rohan', 5320.0), ('Valentina', 35210.0)]\n" + ] + } + ], + "source": [ + "print(\"Here is the actual insurance cost data: \" + str(insurance_data))" + ] + }, + { + "cell_type": "markdown", + "id": "c7a0802a", + "metadata": {}, + "source": [ + "10. Do the same for the print statement that prints `estimated_insurance_data`. The output should look like:\n", + "\n", + " ```\n", + " Here is the estimated insurance cost data: [...list output...]\n", + " ```" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "609e2b65", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Here is the estimated insurance cost data: [('Maria', 4222.0), ('Rohan', 5442.0), ('Valentina', 36368.0)]\n" + ] + } + ], + "source": [ + "print(\"Here is the estimated insurance cost data: \" + str(estimated_insurance_data))" + ] + }, + { + "cell_type": "markdown", + "id": "b9742315", + "metadata": {}, + "source": [ + "11. See the results from both tasks above.\n", + "\n", + " It should be much more clear from the output what each of the two lists represents, helping you better understand the data you're working with.\n", + " \n", + " You may notice that there are differences between the actual insurance costs and estimated insurance costs. This means that our `estimate_insurance_cost()` function does not calculate insurance costs with 100% accuracy.\n", + " \n", + " Compare the estimated insurance data to the actual insurance data. Do the estimated insurance costs seem to be overestimated or underestimated?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "761203b1", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "9c0014e8", + "metadata": {}, + "source": [ + "## Extra" + ] + }, + { + "cell_type": "markdown", + "id": "6c213aec", + "metadata": {}, + "source": [ + "12. Congratulations! In this project, you used Python lists to store **estimated** insurance cost data and then compare that data to **actual** insurance cost data.\n", + "\n", + " As you've seen, lists are data structures in Python that can contain multiple pieces of data in a single object. As a data scientist, you'll find yourself working with this data structure quite often. You now have a solid foundation to move forward in your data science journey!\n", + " \n", + " If you'd like additional practice on lists, here are some ways you might extend this project:\n", + " - Calculate the difference between the actual insurance cost data and the estimated insurance cost data for each individual, and store the results in a list called `insurance_cost_dif`.\n", + " - Estimate the insurance cost for a new individual, Akira, who is a 19-year-old male non-smoker with no children and a BMI of 27.1. Make sure to append his name to `names` and his actual insurance cost, `2930.0`, to `insurance_costs`.\n", + " \n", + " Happy coding!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "20d300f8", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Python Syntax Medical Insurance Project/.ipynb_checkpoints/Python Syntax Medical Insurance Project-checkpoint.ipynb b/Python Syntax Medical Insurance Project/.ipynb_checkpoints/Python Syntax Medical Insurance Project-checkpoint.ipynb new file mode 100644 index 0000000..9b3bae3 --- /dev/null +++ b/Python Syntax Medical Insurance Project/.ipynb_checkpoints/Python Syntax Medical Insurance Project-checkpoint.ipynb @@ -0,0 +1,496 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7ac921d1", + "metadata": {}, + "source": [ + "# Python Syntax: Medical Insurance Project" + ] + }, + { + "cell_type": "markdown", + "id": "84c02393", + "metadata": {}, + "source": [ + "Suppose you are a medical professional curious about how certain factors contribute to medical insurance costs. Using a formula that estimates a person's yearly insurance costs, you will investigate how different factors such as age, sex, BMI, etc. affect the prediction." + ] + }, + { + "cell_type": "markdown", + "id": "05392afb", + "metadata": {}, + "source": [ + "## Setting up Factors" + ] + }, + { + "cell_type": "markdown", + "id": "e433bd9f", + "metadata": {}, + "source": [ + "1. Our first step is to create the variables for each factor we will consider when estimating medical insurance costs.\n", + "\n", + " These are the variables we will need to create:\n", + " - `age`: age of the individual in years\n", + " - `sex`: 0 for female, 1 for male*\n", + " - `bmi`: individual's body mass index\n", + " - `num_of_children`: number of children the individual has\n", + " - `smoker`: 0 for a non-smoker, 1 for a smoker\n", + " \n", + " In the code block below, create the following variables for a **28**-year-old, **nonsmoking woman** who has **three children** and a **BMI** of **26.2**.\n", + " \n", + " **Note**: We are using this [medical insurance dataset](https://www.kaggle.com/mirichoi0218/insurance) as a guide, which unfortunately does not include data for non-binary individuals." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "7f311028", + "metadata": {}, + "outputs": [], + "source": [ + "# create the initial variables below\n", + "# Initializing variables\n", + "age = 28\n", + "smoker = 0 # 1 if smoker, 0 if not a smoker\n", + "sex = 0 # 0 if female, 1 if male\n", + "num_of_children = 3\n", + "bmi = 26.2\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "2b7c2cbc", + "metadata": {}, + "source": [ + "## Working with the Formula" + ] + }, + { + "cell_type": "markdown", + "id": "6db974a5", + "metadata": {}, + "source": [ + "2. After the declaration of the variables, create a variable called `insurance_cost` that utilizes the following formula:\n", + "\n", + " $$\n", + " \\begin{aligned}\n", + " insurance\\_cost = 250*age - 128*sex \\\\\n", + " + 370*bmi + 425*num\\_of\\_children \\\\\n", + " + 24000*smoker - 12500 \\\\\n", + " \\end{aligned}\n", + " $$" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "91f86188", + "metadata": {}, + "outputs": [], + "source": [ + "# Add insurance estimate formula below\n", + "# Insurance formula\n", + "insurance_cost = 250 * age - 128 * sex + 370 * bmi + 425 * num_of_children + 24000 * smoker - 12500\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "90e6bbe0", + "metadata": {}, + "source": [ + "3. Let's display this value in an informative way. Print out the following string in the kernel:\n", + "\n", + " ```\n", + " This person's insurance cost is {insurance_cost} dollars.\n", + " ```\n", + " \n", + " You will need to use string concatenation, including the `str()` function to print out the `insurance_cost`." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "cf6d3790", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This person's insurance cost is 5469.0 dollars.\n" + ] + } + ], + "source": [ + "# Print out the insurance cost\n", + "print(\"This person's insurance cost is \" + str(insurance_cost) + \" dollars.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "203e7e61", + "metadata": {}, + "source": [ + "## Looking at Age Factor" + ] + }, + { + "cell_type": "markdown", + "id": "5c2f65d1", + "metadata": {}, + "source": [ + "4. We have seen how our formula can estimate costs for one individual. Now let's play with some individual factors to see what role each one plays in our estimation!\n", + "\n", + " Let's start with the `age` factor. Using a plus-equal operator, add 4 years to our `age` variable." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "eb23f0c0", + "metadata": {}, + "outputs": [], + "source": [ + "# Add 4 years to age\n", + "age += 4\n" + ] + }, + { + "cell_type": "markdown", + "id": "a18a8926", + "metadata": {}, + "source": [ + "5. Now that we have changed our `age` value, we want to recalculate our insurance cost. Declare a new variable called `new_insurance_cost` in the code block below.\n", + "\n", + " Make sure you leave the `insurance_cost` variable the same as in Task 2. We will use it later in our program!" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "50f6a44c", + "metadata": {}, + "outputs": [], + "source": [ + "# Calculate the new insurance cost\n", + "new_insurance_cost = 250 * age - 128 * sex + 370 * bmi + 425 * num_of_children + 24000 * smoker - 12500\n" + ] + }, + { + "cell_type": "markdown", + "id": "d6c2393a", + "metadata": {}, + "source": [ + "6. Next, we want to find the difference between our `new_insurance_cost` and `insurance_cost`. To do this, let's create a new variable called `change_in_insurance_cost` and set it equal to the difference between `new_insurance_cost` and `insurance_cost`.\n", + "\n", + " Note: depending on the order that we subtract (eg., `new_insurance_cost - insurance_cost` vs. `insurance_cost - new_insurance_cost`), we'll get a positive or negative version of the same number. To make this difference interpretable, let's calculate `new_insurance cost - insurance_cost`. Then we can say, \"people who are four years older have estimated insurance costs that are `change_in_insurance_cost` dollars different, where the sign of `change_in_insurance_cost` tells us whether the cost is higher or lower\"." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "6b72279c", + "metadata": {}, + "outputs": [], + "source": [ + "# Calculate change between the new insurance cost and original insurance cost\n", + "change_in_insurance_cost = new_insurance_cost - insurance_cost\n" + ] + }, + { + "cell_type": "markdown", + "id": "2e6df15c", + "metadata": {}, + "source": [ + "7. We want to display this information in an informative way similar to the output from instruction 3. In the code block below, print the following string, where `XXX` is replaced by the value of `change_in_insurance_cost`:\n", + "\n", + " ```\n", + " The change in cost of insurance after increasing the age by 4 years is XXX dollars.\n", + " ```\n", + " \n", + " Doing this will tell us how 4 years in age affects medical insurance cost estimates assuming that all other variables remain the same.\n", + " \n", + " You will need to concatenate strings and use the `str()` method." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "0a999d40", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The change in cost of insurance after increasing the age by 4 years is 1000.0 dollars.\n" + ] + } + ], + "source": [ + "print(\"The change in cost of insurance after increasing the age by 4 years is \" + str(change_in_insurance_cost) + \" dollars.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "23bc90cf", + "metadata": {}, + "source": [ + "## Looking at BMI Factor" + ] + }, + { + "cell_type": "markdown", + "id": "81747d10", + "metadata": {}, + "source": [ + "8. Now that you have looked at the age factor, let's move onto another one: BMI. First, we have to redefine our `age` variable to be its original value.\n", + "\n", + " Set `age` to `28`. This will reset its value and allow us to focus on just the change in the BMI factor moving forward.\n", + " \n", + " On the next line, using the plus-equal operator, add `3.1` to our `bmi` variable." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "53ea7df3", + "metadata": {}, + "outputs": [], + "source": [ + "# Reset age to 28\n", + "age = 28\n", + "\n", + "# Add 3.1 to BMI\n", + "bmi += 3.1\n" + ] + }, + { + "cell_type": "markdown", + "id": "febe2c07", + "metadata": {}, + "source": [ + "9. Now let's find out how a change in BMI affects insurance costs. Our next steps are pretty much the same as we have done before when looking at `age`.\n", + " 1. Below the line where `bmi` was increased by `3.1`, rewrite the insurance cost formula and assign it to the variable name `new_insurance_cost`.\n", + " 2. Save the difference between `new_insurance_cost` and `insurance_cost` in a variable called `change_in_insurance_cost`.\n", + " 3. Display the following string in the output terminal, where `XXX` is replaced by the value of `change_in_insurance_cost`:\n", + " \n", + " ```py\n", + " The change in estimated insurance cost after increasing BMI by 3.1 is XXX dollars.\n", + " ```" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "19d121c8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The change in estimated insurance cost after increasing BMI by 3.1 is 1147.0 dollars.\n" + ] + } + ], + "source": [ + "# Calculate the new insurance cost\n", + "new_insurance_cost = 250 * age - 128 * sex + 370 * bmi + 425 * num_of_children + 24000 * smoker - 12500\n", + "\n", + "# Calculate change between the new insurance cost and original insurance cost\n", + "change_in_insurance_cost = new_insurance_cost - insurance_cost\n", + "\n", + "print(\"The change in estimated insurance cost after increasing BMI by 3.1 is \" + str(change_in_insurance_cost) + \" dollars.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "26c3c473", + "metadata": {}, + "source": [ + "## Looking at Male vs. Female Factor" + ] + }, + { + "cell_type": "markdown", + "id": "e0c96041", + "metadata": {}, + "source": [ + "10. Let's look at the effect sex has on medical insurance costs. Before we make any additional changes, first reassign your `bmi` variable back to its original value of `26.2`.\n", + "\n", + " On a new line of code in the code block below, reassign the value of `sex` to `1`. A reminder that `1` identifies male individuals and `0` identifies female individuals." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "ae87bfec", + "metadata": {}, + "outputs": [], + "source": [ + "# Reset BMI to original value\n", + "bmi = 26.2\n", + "\n", + "# Change sex to male\n", + "sex = 1\n" + ] + }, + { + "cell_type": "markdown", + "id": "da20c656", + "metadata": {}, + "source": [ + "11. Perform the steps below!\n", + " 1. Rewrite the insurance cost formula and assign it to the variable name `new_insurance_cost`.\n", + " 2. Save the difference between `new_insurance_cost` and `insurance_cost` in a variable called `change_in_insurance_cost`.\n", + " 3. Display the following string, where `XXX` is replaced by the value of `change_in_insurance_cost`:\n", + " ```\n", + " The change in estimated cost for being male instead of female is XXX dollars.\n", + " ```" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "ce2da0e8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The change in estimated insurance cost for being male instead of female is -128.0 dollars.\n" + ] + } + ], + "source": [ + "# Calculate the new insurance cost\n", + "new_insurance_cost = 250 * age - 128 * sex + 370 * bmi + 425 * num_of_children + 24000 * smoker - 12500\n", + "\n", + "# Calculate change between the new insurance cost and original insurance cost\n", + "change_in_insurance_cost = new_insurance_cost - insurance_cost\n", + "\n", + "print(\"The change in estimated insurance cost for being male instead of female is \" + str(change_in_insurance_cost) + \" dollars.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "f652d964", + "metadata": {}, + "source": [ + "12. Notice that this time you got a negative value for `change_in_insurance_cost`. Let's think about what that means. We changed the sex variable from `0` (female) to `1` (male) and it decreased the estimated insurance costs.\n", + "\n", + " This means that men tend to have lower medical costs on average than women. Reflect on the other findings you have dug up from this investigation so far." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1190661d", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "deae95f4", + "metadata": {}, + "source": [ + "## Extra Practice" + ] + }, + { + "cell_type": "markdown", + "id": "2e44bb53", + "metadata": {}, + "source": [ + "13. Great job on the project!!!\n", + "\n", + " So far we have looked at 3 of the 5 factors in the insurance costs formula. The two remaining are `smoker` and `num_of_children`. If you want to keep challenging yourself, spend some time investigating these factors!\n", + " 1. Rewrite the insurance cost formula and assign it to the variable name `new_insurance_cost`.\n", + " 2. Save the difference between `new_insurance_cost` in a variable called `change_in_insurance_cost`.\n", + " 3. Display the information below!" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "ee321873", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The change in estimated cost for being a smoker is 23872.0 dollars.\n" + ] + } + ], + "source": [ + "# Change smoker status to 1 (smoker)\n", + "smoker = 1\n", + "\n", + "# Calculate the new insurance cost\n", + "new_insurance_cost = 250 * age - 128 * sex + 370 * bmi + 425 * num_of_children + 24000 * smoker - 12500\n", + "\n", + "# Calculate change between the new insurance cost and original insurance cost\n", + "change_in_insurance_cost = new_insurance_cost - insurance_cost\n", + "\n", + "print(\"The change in estimated cost for being a smoker is \" + str(change_in_insurance_cost) + \" dollars.\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "de87f356", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The change in estimated cost for having 5 children is 24722.0 dollars.\n" + ] + } + ], + "source": [ + "# Change number of children to 5\n", + "num_of_children = 5\n", + "\n", + "# Calculate the new insurance cost\n", + "new_insurance_cost = 250 * age - 128 * sex + 370 * bmi + 425 * num_of_children + 24000 * smoker - 12500\n", + "\n", + "# Calculate change between the new insurance cost and original insurance cost\n", + "change_in_insurance_cost = new_insurance_cost - insurance_cost\n", + "\n", + "print(\"The change in estimated cost for having 5 children is \" + str(change_in_insurance_cost) + \" dollars.\")\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}