update handbook docs

PovertyAction · Jun 12, 2024 · b1a0a17 · b1a0a17
1 parent 53a532d
commit b1a0a17
Show file tree

Hide file tree

Showing 21 changed files with 724 additions and 76 deletions.
diff --git a/Justfile b/Justfile
@@ -58,16 +58,16 @@ fmt-all: lint-py fmt-py lint-sql fmt-markdown
 
 [windows]
 pre-install:
-    winget install Casey.Just Rye.Rye Posit.Quarto
+    winget install Casey.Just Rye.Rye GitHub.cli Posit.Quarto
 
 [linux]
 pre-install:
     quarto_version :=  "1.4.554"
-    brew install just rye
+    brew install just rye gh
     curl -sfL https://github.com/quarto-dev/quarto-cli/releases/download/v{{quarto_version}}/quarto-{{quarto_version}}-linux-amd64.deb  | sudo apt install ./quarto-{{quarto_version}}-linux-amd64.deb
     rm quarto-{{quarto_version}}-linux-amd64.deb
 
 [macos]
 pre-install:
-    brew install just rye
+    brew install just rye gh
     brew install --cask quarto
diff --git a/_freeze/docs/software/python/execute-results/html.json b/_freeze/docs/software/python/execute-results/html.json
@@ -1,12 +1,16 @@
 {
-  "hash": "4b63122f49e879066f27419fa2410bbd",
+  "hash": "d0290ef074488257c8582cf93d3a1c81",
   "result": {
     "engine": "jupyter",
-    "markdown": "---\ntitle: Python\nexecute:\n  eval: true\n---\n\n## What is Python?\n\nPython is a high-level, general-purpose programming language that is widely used in data\nscience, machine learning, and web development. It has a large standard library and a\nvibrant community that provides a wide range of libraries and tools for various\napplications. As such, Python provides a general-purpose ecosystem that can be used for\na wide range of applications.\n\n## How to install Python?\n\nThere are many ways to install Python. We recommend using Python in a virtual\nenvironment to avoid conflicts with other Python installations on your system.\n\nWe recommend using a [rye](https://rye.astral.sh/) or [pixi](https://pixi.sh/latest/).\nBoth of these tools provide a simple way to create and manage Python virtual\nenvironments.\n\nIn both cases, you can manage the python packages that are installed in the virtual\nenvironment using a `pyproject.toml` file. See the pyproject.toml example in this\nrepository for an example of how to manage Python packages. To add package dependencies\nto the virtual environment, using `rye`, you can run:\n\nWe'll use `rye` to demonstrate how to manage a Python virtual environment. Watch the\nfollowing video for a quick introduction to `rye`:\n\n\n{{< video https://youtu.be/q99TYA7LnuA >}}\n\n\n\nFirst, install `rye` using `winget` (Windows) or `brew` (MacOS/Linux):\n\n| Platform | Commands             |\n| -------- | -------------------- |\n| Windows  | `winget install rye` |\n| MacOS    | `brew install rye`   |\n| Linux    | `brew install rye`   |\n\nAdd libraries to the virtual environment using `rye add ...`:\n\n::: {#17aaecd9 .cell execution_count=1}\n``` {.python .cell-code}\n> rye add jupyterlab pandas matplotlib seaborn\n```\n:::\n\n\n## Coding Conventions\n\nSee the\n[GitLab Data Team's Python Guide](https://handbook.gitlab.com/handbook/business-technology/data-team/platform/python-guide/)\n\n## Learning Resources\n\nDownload data from the a URL:\n\n::: {#3cb20ef0 .cell execution_count=2}\n``` {.python .cell-code}\nfrom urllib.request import urlretrieve\n\n_ = urlretrieve(\n    \"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv\",\n    \"../assets/data/penguins.csv\",\n)\n```\n:::\n\n\n::: {#c593971b .cell execution_count=3}\n``` {.python .cell-code}\nimport pandas as pd\n\ndf = pd.read_csv(\"../assets/data/penguins.csv\")\n```\n:::\n\n\n::: {#a5d84efe .cell execution_count=4}\n``` {.python .cell-code}\nimport matplotlib.pyplot as plt  # noqa: E402\nimport seaborn as sns  # noqa: E402\n\ng = sns.FacetGrid(df, hue=\"species\", height=3, aspect=1.5)\ng.map(plt.scatter, \"bill_length_mm\", \"bill_depth_mm\").add_legend()\n```\n\n::: {.cell-output .cell-output-display}\n![](python_files/figure-html/cell-5-output-1.png){width=527 height=278}\n:::\n:::\n\n\n## Learning Resources\n\n- [The Python Tutorial](https://docs.python.org/3.12/tutorial/index.html)\n- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)\n- [Efficient Python for Data Scientists](https://khuyentran1401.github.io/Efficient_Python_tricks_and_tools_for_data_scientists/README.html)\n- [The Hitchhiker's Guide to Python](https://docs.python-guide.org/)\n\n",
+    "markdown": "---\ntitle: Python\nexecute:\n  eval: true\n---\n\n## What is Python?\n\nPython is a high-level, general-purpose programming language that is widely used in data\nscience, machine learning, and web development. It has a large standard library and a\nvibrant community that provides a wide range of libraries and tools for various\napplications. As such, Python provides a general-purpose ecosystem that can be used for\na wide range of applications.\n\n## How to install Python?\n\nThere are many ways to install Python. We recommend using Python in a virtual\nenvironment to avoid conflicts with other Python installations on your system.\n\nWe recommend using a [rye](https://rye.astral.sh/) or [pixi](https://pixi.sh/latest/).\nBoth of these tools provide a simple way to create and manage Python virtual\nenvironments.\n\nIn both cases, you can manage the python packages that are installed in the virtual\nenvironment using a `pyproject.toml` file. See the pyproject.toml example in this\nrepository for an example of how to manage Python packages. To add package dependencies\nto the virtual environment, using `rye`, you can run:\n\nWe'll use `rye` to demonstrate how to manage a Python virtual environment. Watch the\nfollowing video for a quick introduction to `rye`:\n\n{{\\< video https://youtu.be/q99TYA7LnuA >}}\n\nFirst, install `rye` using `winget` (Windows) or `brew` (MacOS/Linux):\n\n| Platform | Commands             |\n| -------- | -------------------- |\n| Windows  | `winget install rye` |\n| MacOS    | `brew install rye`   |\n| Linux    | `brew install rye`   |\n\nAdd libraries to the virtual environment using `rye add ...`:\n\n::: {#b90c8a05 .cell execution_count=1}\n``` {.python .cell-code}\n> rye add jupyterlab pandas matplotlib seaborn\n```\n:::\n\n\n## Coding Conventions\n\nWe highly recommend working with a [virtual environment](../guides/venv.md) to manage\nPython dependencies. The `pyproject.toml` is the preferred way to keep track of python\ndependencies as well as project-specific python conventions.\n\nWe recommend using [Ruff](https://docs.astral.sh/ruff/) to enforce\n[linting](<https://en.wikipedia.org/wiki/Lint_(software)>) and formatting rules. In most\ncases you can use the default linting and formatting rules provided by `ruff`. However,\nyou can customize the rules by modifying the `[tool.ruff]` section of the\n`pyproject.toml` file in the root of your project. for more about the configuration\noptions, see the [Ruff documentation](https://docs.astral.sh/ruff/configuration/).\n\nIf you are working in a virtual environment created by `rye`, you automatically have\naccess to `Ruff via  `rye lint`and`rye fmt\\` commands to lint and format your code.\n\nFor more inspiration, see the\n[GitLab Data Team's Python Guide](https://handbook.gitlab.com/handbook/business-technology/data-team/platform/python-guide/)\nand [Google's Python Style Guide](https://google.github.io/styleguide/pyguide.html).\n\n## Example Usage\n\nLet's load an example World Bank data via [Gapminder](https://www.gapminder.org/) using\nthe [causaldata](https://github.com/NickCH-K/causaldata) package.\n\n::: {#faeff2b5 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport statsmodels.formula.api as sm\nfrom causaldata import gapminder\n```\n:::\n\n\nLoad the Gapminder data as a pandas DataFrame:\n\n::: {#51d3d753 .cell execution_count=3}\n``` {.python .cell-code}\ndf = gapminder.load_pandas().data\n```\n:::\n\n\nWe can check the dimensions of the DataFrame using `df.info()`:\n\n::: {#fc6a5331 .cell execution_count=4}\n``` {.python .cell-code}\ndf.info()\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 1704 entries, 0 to 1703\nData columns (total 6 columns):\n #   Column     Non-Null Count  Dtype  \n---  ------     --------------  -----  \n 0   country    1704 non-null   object \n 1   continent  1704 non-null   object \n 2   year       1704 non-null   int64  \n 3   lifeExp    1704 non-null   float64\n 4   pop        1704 non-null   int64  \n 5   gdpPercap  1704 non-null   float64\ndtypes: float64(2), int64(2), object(2)\nmemory usage: 80.0+ KB\n```\n:::\n:::\n\n\nLet's take a look at the first few rows of the DataFrame using `df.head()`:\n\n::: {#d2897874 .cell execution_count=5}\n``` {.python .cell-code}\ndf.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>country</th>\n      <th>continent</th>\n      <th>year</th>\n      <th>lifeExp</th>\n      <th>pop</th>\n      <th>gdpPercap</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Afghanistan</td>\n      <td>Asia</td>\n      <td>1952</td>\n      <td>28.801</td>\n      <td>8425333</td>\n      <td>779.445314</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Afghanistan</td>\n      <td>Asia</td>\n      <td>1957</td>\n      <td>30.332</td>\n      <td>9240934</td>\n      <td>820.853030</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Afghanistan</td>\n      <td>Asia</td>\n      <td>1962</td>\n      <td>31.997</td>\n      <td>10267083</td>\n      <td>853.100710</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Afghanistan</td>\n      <td>Asia</td>\n      <td>1967</td>\n      <td>34.020</td>\n      <td>11537966</td>\n      <td>836.197138</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Afghanistan</td>\n      <td>Asia</td>\n      <td>1972</td>\n      <td>36.088</td>\n      <td>13079460</td>\n      <td>739.981106</td>\n    </tr>\n  </tbody>\n</table>\n</div>\n```\n:::\n:::\n\n\nTake a look at the relationship between GDP per Capita and Life Expectancy:\n\n::: {#1c03acc5 .cell execution_count=6}\n``` {.python .cell-code}\nsns.scatterplot(x=\"gdpPercap\", y=\"lifeExp\", hue=\"continent\", data=df).set(\n    xscale=\"log\", ylabel=\"Life Expectancy\", xlabel=\"GDP per Capita\"\n)\n```\n\n::: {.cell-output .cell-output-display}\n![](python_files/figure-html/cell-7-output-1.png){width=585 height=431}\n:::\n:::\n\n\nSeparate the data by year, focusing on 1957 and 2007:\n\n::: {#67ef360d .cell execution_count=7}\n``` {.python .cell-code}\nsns.relplot(\n    data=df.where(df[\"year\"].isin([1957, 2007])),\n    x=\"gdpPercap\",\n    y=\"lifeExp\",\n    col=\"year\",\n    hue=\"continent\",\n    col_wrap=1,\n    kind=\"scatter\",\n    palette=\"muted\",\n).set(xscale=\"log\", ylabel=\"Life Expectancy\", xlabel=\"GDP per Capita\")\n```\n\n::: {.cell-output .cell-output-display}\n![](python_files/figure-html/cell-8-output-1.png){width=575 height=952}\n:::\n:::\n\n\n## Learning Resources\n\n- [The Python Tutorial](https://docs.python.org/3.12/tutorial/index.html)\n- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)\n- [Efficient Python for Data Scientists](https://khuyentran1401.github.io/Efficient_Python_tricks_and_tools_for_data_scientists/README.html)\n- [The Hitchhiker's Guide to Python](https://docs.python-guide.org/)\n\n",
     "supporting": [
       "python_files"
     ],
     "filters": [],
-    "includes": {}
+    "includes": {
+      "include-in-header": [
+        "<script src=\"https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js\" integrity=\"sha512-c3Nl8+7g4LMSTdrm621y7kf9v3SDPnhxLNhcjFJbKECVnmZHTdo+IRO05sNLTH/D3vA6u1X32ehoLC7WFVdheg==\" crossorigin=\"anonymous\"></script>\n<script src=\"https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js\" integrity=\"sha512-bLT0Qm9VnAYZDflyKcBaQ2gg0hSYNQrJ8RilYldYQ1FxQYoCLtUjuuRuZo+fjqhx/qtq/1itJ0C2ejDxltZVFg==\" crossorigin=\"anonymous\" data-relocate-top=\"true\"></script>\n<script type=\"application/javascript\">define('jquery', [],function() {return window.jQuery;})</script>\n"
+      ]
+    }
   }
 }
diff --git a/_freeze/docs/software/python/figure-html/cell-5-output-1.png b/_freeze/docs/software/python/figure-html/cell-5-output-1.png
diff --git a/_freeze/docs/software/python/figure-html/cell-6-output-1.png b/_freeze/docs/software/python/figure-html/cell-6-output-1.png
diff --git a/_freeze/docs/software/python/figure-html/cell-7-output-1.png b/_freeze/docs/software/python/figure-html/cell-7-output-1.png
diff --git a/_freeze/docs/software/python/figure-html/cell-8-output-1.png b/_freeze/docs/software/python/figure-html/cell-8-output-1.png
diff --git a/_freeze/docs/software/python/figure-html/cell-9-output-1.png b/_freeze/docs/software/python/figure-html/cell-9-output-1.png
diff --git a/_freeze/docs/software/stata/execute-results/html.json b/_freeze/docs/software/stata/execute-results/html.json
@@ -0,0 +1,12 @@
+{
+  "hash": "54c4717e3d26189de834c3bdb445e4a8",
+  "result": {
+    "engine": "jupyter",
+    "markdown": "---\ntitle: \"Stata\"\nexecute:\n    eval: true\n---\n\n## What is Stata?\n\nStata is a statistical software package that is commonly used in the social sciences and economics.\nIt is widely used at IPA for data analysis and management. It offers a comprehensive\nlibrary of methods for data cleaning, descriptive statistics, and econometric analysis.\nStata is very well suited for research data workflows and research design tasks,\nincluding power calculations, sample design adjustments, panel data analysis,\ntime series analysis, etc. See [Stata Features](https://www.stata.com/features/)\nfor a full list of what Stata makes available.\n\n## How to install Stata?\n\nIPA staff can download and install the relevant version (`.exe` for Windows,\n`.dmg` for MacOS, or  `.tar.gz` for Linux) from IPA on the Box\n[installation packages](https://ipastorage.app.box.com/folder/129276324764?v=install-stata).\n\n## Coding Conventions\n\nSee the following resources for coding conventions in Stata:\n\n- [DIME Analytics Stata Style Guide](https://worldbank.github.io/dime-data-handbook/coding.html#the-dime-analytics-stata-style-guide)\n- [Sean Higgins's Stata guide](https://github.com/skhiggins/Stata_guide)\non GitHub.\n- [Coding and Data for the Social Sciences](https://web.stanford.edu/~gentzkow/research/CodeAndData.xhtml#magicparlabel-1248),\nby Matthew Gentzkow and Jesse Shapiro\n\n## Using Stata from Python\n\nWithin a Python script or Jupyter Notebook, you can call Stata using [pystata](https://www.stata.com/python/pystata18/notebook/Quick%20Start0.html).\n\n::: {#437f602b .cell execution_count=1}\n``` {.python .cell-code}\nimport stata_setup\n\n# set configuration to the path where Stata is installed and the flavor of Stata\n# in the case below, we're using Stata 18 SE\nstata_setup.config(\"C:/Program Files/Stata18/\", \"se\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\n  ___  ____  ____  ____  ____ ®\n /__    /   ____/   /   ____/      Stata 18.0\n___/   /   /___/   /   /___/       SE—Standard Edition\n\n Statistics and Data Science       Copyright 1985-2023 StataCorp LLC\n                                   StataCorp\n                                   4905 Lakeway Drive\n                                   College Station, Texas 77845 USA\n                                   800-782-8272        https://www.stata.com\n                                   979-696-4600        [email protected]\n\nStata license: Unlimited-user network, expiring 22 Jan 2025\nSerial number: 401809300803\n  Licensed to: Niall Keleher\n               Innovations for Poverty Action\n\nNotes:\n      1. Unicode is supported; see help unicode_advice.\n      2. Maximum number of variables is set to 5,000 but can be increased;\n          see help set_maxvar.\n```\n:::\n:::\n\n\n### Call Stata using pystata API functions\n\n::: {#621f5a24 .cell execution_count=2}\n``` {.python .cell-code}\nfrom pystata import stata\n```\n:::\n\n\n::: {#c71c757b .cell execution_count=3}\n``` {.python .cell-code}\nstata.run(\n    \"\"\"\n    sysuse auto, clear\n    reg mpg price i.foreign\n    \"\"\"\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\n. \n.     sysuse auto, clear\n(1978 automobile data)\n\n.     reg mpg price i.foreign\n\n      Source |       SS           df       MS      Number of obs   =        74\n-------------+----------------------------------   F(2, 71)        =     23.01\n       Model |  960.866305         2  480.433152   Prob > F        =    0.0000\n    Residual |  1482.59315        71  20.8815937   R-squared       =    0.3932\n-------------+----------------------------------   Adj R-squared   =    0.3761\n       Total |  2443.45946        73  33.4720474   Root MSE        =    4.5696\n\n------------------------------------------------------------------------------\n         mpg | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]\n-------------+----------------------------------------------------------------\n       price |   -.000959   .0001815    -5.28   0.000     -.001321    -.000597\n             |\n     foreign |\n    Foreign  |   5.245271   1.163592     4.51   0.000     2.925135    7.565407\n       _cons |   25.65058   1.271581    20.17   0.000     23.11512    28.18605\n------------------------------------------------------------------------------\n\n.     \n. \n```\n:::\n:::\n\n\nOr use IPython magic commands to run Stata code in a Jupyter Notebook.\n\n::: {#24c9e231 .cell execution_count=4}\n``` {.python .cell-code}\n%%stata\nsysuse auto, clear\ndescribe\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\n. sysuse auto, clear\n(1978 automobile data)\n\n. describe\n\nContains data from C:\\Program Files\\Stata18/ado\\base/a/auto.dta\n Observations:            74                  1978 automobile data\n    Variables:            12                  13 Apr 2022 17:45\n                                              (_dta has notes)\n-------------------------------------------------------------------------------\nVariable      Storage   Display    Value\n    name         type    format    label      Variable label\n-------------------------------------------------------------------------------\nmake            str18   %-18s                 Make and model\nprice           int     %8.0gc                Price\nmpg             int     %8.0g                 Mileage (mpg)\nrep78           int     %8.0g                 Repair record 1978\nheadroom        float   %6.1f                 Headroom (in.)\ntrunk           int     %8.0g                 Trunk space (cu. ft.)\nweight          int     %8.0gc                Weight (lbs.)\nlength          int     %8.0g                 Length (in.)\nturn            int     %8.0g                 Turn circle (ft.)\ndisplacement    int     %8.0g                 Displacement (cu. in.)\ngear_ratio      float   %6.2f                 Gear ratio\nforeign         byte    %8.0g      origin     Car origin\n-------------------------------------------------------------------------------\nSorted by: foreign\n\n. \n```\n:::\n:::\n\n\n::: {#ec57624a .cell execution_count=5}\n``` {.python .cell-code}\n%stata scatter mpg price\n```\n\n::: {.cell-output .cell-output-display}\n![](stata_files/figure-html/cell-6-output-1.svg){}\n:::\n:::\n\n\n## Data Visualization\n\nConsider installing the [ipaplots](https://github.com/PovertyAction/ipaplots) for the IPA graph schema in Stata.\n\n## Learning References\n\nFor more information on learning and using Stata, see the\n[IPA-Stata-Trainings](https://github.com/PovertyAction/IPA-Stata-Trainings) repository on GitHub.\n\n- [Stata Video Tutorials](https://www.stata.com/links/video-tutorials/)\n\n",
+    "supporting": [
+      "stata_files\\figure-html"
+    ],
+    "filters": [],
+    "includes": {}
+  }
+}