Skip to content

Commit

Permalink
update handbook docs
Browse files Browse the repository at this point in the history
  • Loading branch information
NKeleher committed Jun 12, 2024
1 parent 53a532d commit b1a0a17
Show file tree
Hide file tree
Showing 21 changed files with 724 additions and 76 deletions.
6 changes: 3 additions & 3 deletions Justfile
Original file line number Diff line number Diff line change
Expand Up @@ -58,16 +58,16 @@ fmt-all: lint-py fmt-py lint-sql fmt-markdown

[windows]
pre-install:
winget install Casey.Just Rye.Rye Posit.Quarto
winget install Casey.Just Rye.Rye GitHub.cli Posit.Quarto

[linux]
pre-install:
quarto_version := "1.4.554"
brew install just rye
brew install just rye gh
curl -sfL https://github.com/quarto-dev/quarto-cli/releases/download/v{{quarto_version}}/quarto-{{quarto_version}}-linux-amd64.deb | sudo apt install ./quarto-{{quarto_version}}-linux-amd64.deb
rm quarto-{{quarto_version}}-linux-amd64.deb

[macos]
pre-install:
brew install just rye
brew install just rye gh
brew install --cask quarto
10 changes: 7 additions & 3 deletions _freeze/docs/software/python/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
{
"hash": "4b63122f49e879066f27419fa2410bbd",
"hash": "d0290ef074488257c8582cf93d3a1c81",
"result": {
"engine": "jupyter",
"markdown": "---\ntitle: Python\nexecute:\n eval: true\n---\n\n## What is Python?\n\nPython is a high-level, general-purpose programming language that is widely used in data\nscience, machine learning, and web development. It has a large standard library and a\nvibrant community that provides a wide range of libraries and tools for various\napplications. As such, Python provides a general-purpose ecosystem that can be used for\na wide range of applications.\n\n## How to install Python?\n\nThere are many ways to install Python. We recommend using Python in a virtual\nenvironment to avoid conflicts with other Python installations on your system.\n\nWe recommend using a [rye](https://rye.astral.sh/) or [pixi](https://pixi.sh/latest/).\nBoth of these tools provide a simple way to create and manage Python virtual\nenvironments.\n\nIn both cases, you can manage the python packages that are installed in the virtual\nenvironment using a `pyproject.toml` file. See the pyproject.toml example in this\nrepository for an example of how to manage Python packages. To add package dependencies\nto the virtual environment, using `rye`, you can run:\n\nWe'll use `rye` to demonstrate how to manage a Python virtual environment. Watch the\nfollowing video for a quick introduction to `rye`:\n\n\n{{< video https://youtu.be/q99TYA7LnuA >}}\n\n\n\nFirst, install `rye` using `winget` (Windows) or `brew` (MacOS/Linux):\n\n| Platform | Commands |\n| -------- | -------------------- |\n| Windows | `winget install rye` |\n| MacOS | `brew install rye` |\n| Linux | `brew install rye` |\n\nAdd libraries to the virtual environment using `rye add ...`:\n\n::: {#17aaecd9 .cell execution_count=1}\n``` {.python .cell-code}\n> rye add jupyterlab pandas matplotlib seaborn\n```\n:::\n\n\n## Coding Conventions\n\nSee the\n[GitLab Data Team's Python Guide](https://handbook.gitlab.com/handbook/business-technology/data-team/platform/python-guide/)\n\n## Learning Resources\n\nDownload data from the a URL:\n\n::: {#3cb20ef0 .cell execution_count=2}\n``` {.python .cell-code}\nfrom urllib.request import urlretrieve\n\n_ = urlretrieve(\n \"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv\",\n \"../assets/data/penguins.csv\",\n)\n```\n:::\n\n\n::: {#c593971b .cell execution_count=3}\n``` {.python .cell-code}\nimport pandas as pd\n\ndf = pd.read_csv(\"../assets/data/penguins.csv\")\n```\n:::\n\n\n::: {#a5d84efe .cell execution_count=4}\n``` {.python .cell-code}\nimport matplotlib.pyplot as plt # noqa: E402\nimport seaborn as sns # noqa: E402\n\ng = sns.FacetGrid(df, hue=\"species\", height=3, aspect=1.5)\ng.map(plt.scatter, \"bill_length_mm\", \"bill_depth_mm\").add_legend()\n```\n\n::: {.cell-output .cell-output-display}\n![](python_files/figure-html/cell-5-output-1.png){width=527 height=278}\n:::\n:::\n\n\n## Learning Resources\n\n- [The Python Tutorial](https://docs.python.org/3.12/tutorial/index.html)\n- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)\n- [Efficient Python for Data Scientists](https://khuyentran1401.github.io/Efficient_Python_tricks_and_tools_for_data_scientists/README.html)\n- [The Hitchhiker's Guide to Python](https://docs.python-guide.org/)\n\n",
"markdown": "---\ntitle: Python\nexecute:\n eval: true\n---\n\n## What is Python?\n\nPython is a high-level, general-purpose programming language that is widely used in data\nscience, machine learning, and web development. It has a large standard library and a\nvibrant community that provides a wide range of libraries and tools for various\napplications. As such, Python provides a general-purpose ecosystem that can be used for\na wide range of applications.\n\n## How to install Python?\n\nThere are many ways to install Python. We recommend using Python in a virtual\nenvironment to avoid conflicts with other Python installations on your system.\n\nWe recommend using a [rye](https://rye.astral.sh/) or [pixi](https://pixi.sh/latest/).\nBoth of these tools provide a simple way to create and manage Python virtual\nenvironments.\n\nIn both cases, you can manage the python packages that are installed in the virtual\nenvironment using a `pyproject.toml` file. See the pyproject.toml example in this\nrepository for an example of how to manage Python packages. To add package dependencies\nto the virtual environment, using `rye`, you can run:\n\nWe'll use `rye` to demonstrate how to manage a Python virtual environment. Watch the\nfollowing video for a quick introduction to `rye`:\n\n{{\\< video https://youtu.be/q99TYA7LnuA >}}\n\nFirst, install `rye` using `winget` (Windows) or `brew` (MacOS/Linux):\n\n| Platform | Commands |\n| -------- | -------------------- |\n| Windows | `winget install rye` |\n| MacOS | `brew install rye` |\n| Linux | `brew install rye` |\n\nAdd libraries to the virtual environment using `rye add ...`:\n\n::: {#b90c8a05 .cell execution_count=1}\n``` {.python .cell-code}\n> rye add jupyterlab pandas matplotlib seaborn\n```\n:::\n\n\n## Coding Conventions\n\nWe highly recommend working with a [virtual environment](../guides/venv.md) to manage\nPython dependencies. The `pyproject.toml` is the preferred way to keep track of python\ndependencies as well as project-specific python conventions.\n\nWe recommend using [Ruff](https://docs.astral.sh/ruff/) to enforce\n[linting](<https://en.wikipedia.org/wiki/Lint_(software)>) and formatting rules. In most\ncases you can use the default linting and formatting rules provided by `ruff`. However,\nyou can customize the rules by modifying the `[tool.ruff]` section of the\n`pyproject.toml` file in the root of your project. for more about the configuration\noptions, see the [Ruff documentation](https://docs.astral.sh/ruff/configuration/).\n\nIf you are working in a virtual environment created by `rye`, you automatically have\naccess to `Ruff via `rye lint`and`rye fmt\\` commands to lint and format your code.\n\nFor more inspiration, see the\n[GitLab Data Team's Python Guide](https://handbook.gitlab.com/handbook/business-technology/data-team/platform/python-guide/)\nand [Google's Python Style Guide](https://google.github.io/styleguide/pyguide.html).\n\n## Example Usage\n\nLet's load an example World Bank data via [Gapminder](https://www.gapminder.org/) using\nthe [causaldata](https://github.com/NickCH-K/causaldata) package.\n\n::: {#faeff2b5 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport statsmodels.formula.api as sm\nfrom causaldata import gapminder\n```\n:::\n\n\nLoad the Gapminder data as a pandas DataFrame:\n\n::: {#51d3d753 .cell execution_count=3}\n``` {.python .cell-code}\ndf = gapminder.load_pandas().data\n```\n:::\n\n\nWe can check the dimensions of the DataFrame using `df.info()`:\n\n::: {#fc6a5331 .cell execution_count=4}\n``` {.python .cell-code}\ndf.info()\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 1704 entries, 0 to 1703\nData columns (total 6 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 country 1704 non-null object \n 1 continent 1704 non-null object \n 2 year 1704 non-null int64 \n 3 lifeExp 1704 non-null float64\n 4 pop 1704 non-null int64 \n 5 gdpPercap 1704 non-null float64\ndtypes: float64(2), int64(2), object(2)\nmemory usage: 80.0+ KB\n```\n:::\n:::\n\n\nLet's take a look at the first few rows of the DataFrame using `df.head()`:\n\n::: {#d2897874 .cell execution_count=5}\n``` {.python .cell-code}\ndf.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>country</th>\n <th>continent</th>\n <th>year</th>\n <th>lifeExp</th>\n <th>pop</th>\n <th>gdpPercap</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Afghanistan</td>\n <td>Asia</td>\n <td>1952</td>\n <td>28.801</td>\n <td>8425333</td>\n <td>779.445314</td>\n </tr>\n <tr>\n <th>1</th>\n <td>Afghanistan</td>\n <td>Asia</td>\n <td>1957</td>\n <td>30.332</td>\n <td>9240934</td>\n <td>820.853030</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Afghanistan</td>\n <td>Asia</td>\n <td>1962</td>\n <td>31.997</td>\n <td>10267083</td>\n <td>853.100710</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Afghanistan</td>\n <td>Asia</td>\n <td>1967</td>\n <td>34.020</td>\n <td>11537966</td>\n <td>836.197138</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Afghanistan</td>\n <td>Asia</td>\n <td>1972</td>\n <td>36.088</td>\n <td>13079460</td>\n <td>739.981106</td>\n </tr>\n </tbody>\n</table>\n</div>\n```\n:::\n:::\n\n\nTake a look at the relationship between GDP per Capita and Life Expectancy:\n\n::: {#1c03acc5 .cell execution_count=6}\n``` {.python .cell-code}\nsns.scatterplot(x=\"gdpPercap\", y=\"lifeExp\", hue=\"continent\", data=df).set(\n xscale=\"log\", ylabel=\"Life Expectancy\", xlabel=\"GDP per Capita\"\n)\n```\n\n::: {.cell-output .cell-output-display}\n![](python_files/figure-html/cell-7-output-1.png){width=585 height=431}\n:::\n:::\n\n\nSeparate the data by year, focusing on 1957 and 2007:\n\n::: {#67ef360d .cell execution_count=7}\n``` {.python .cell-code}\nsns.relplot(\n data=df.where(df[\"year\"].isin([1957, 2007])),\n x=\"gdpPercap\",\n y=\"lifeExp\",\n col=\"year\",\n hue=\"continent\",\n col_wrap=1,\n kind=\"scatter\",\n palette=\"muted\",\n).set(xscale=\"log\", ylabel=\"Life Expectancy\", xlabel=\"GDP per Capita\")\n```\n\n::: {.cell-output .cell-output-display}\n![](python_files/figure-html/cell-8-output-1.png){width=575 height=952}\n:::\n:::\n\n\n## Learning Resources\n\n- [The Python Tutorial](https://docs.python.org/3.12/tutorial/index.html)\n- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)\n- [Efficient Python for Data Scientists](https://khuyentran1401.github.io/Efficient_Python_tricks_and_tools_for_data_scientists/README.html)\n- [The Hitchhiker's Guide to Python](https://docs.python-guide.org/)\n\n",
"supporting": [
"python_files"
],
"filters": [],
"includes": {}
"includes": {
"include-in-header": [
"<script src=\"https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js\" integrity=\"sha512-c3Nl8+7g4LMSTdrm621y7kf9v3SDPnhxLNhcjFJbKECVnmZHTdo+IRO05sNLTH/D3vA6u1X32ehoLC7WFVdheg==\" crossorigin=\"anonymous\"></script>\n<script src=\"https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js\" integrity=\"sha512-bLT0Qm9VnAYZDflyKcBaQ2gg0hSYNQrJ8RilYldYQ1FxQYoCLtUjuuRuZo+fjqhx/qtq/1itJ0C2ejDxltZVFg==\" crossorigin=\"anonymous\" data-relocate-top=\"true\"></script>\n<script type=\"application/javascript\">define('jquery', [],function() {return window.jQuery;})</script>\n"
]
}
}
}
Binary file modified _freeze/docs/software/python/figure-html/cell-5-output-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions _freeze/docs/software/stata/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"hash": "54c4717e3d26189de834c3bdb445e4a8",
"result": {
"engine": "jupyter",
"markdown": "---\ntitle: \"Stata\"\nexecute:\n eval: true\n---\n\n## What is Stata?\n\nStata is a statistical software package that is commonly used in the social sciences and economics.\nIt is widely used at IPA for data analysis and management. It offers a comprehensive\nlibrary of methods for data cleaning, descriptive statistics, and econometric analysis.\nStata is very well suited for research data workflows and research design tasks,\nincluding power calculations, sample design adjustments, panel data analysis,\ntime series analysis, etc. See [Stata Features](https://www.stata.com/features/)\nfor a full list of what Stata makes available.\n\n## How to install Stata?\n\nIPA staff can download and install the relevant version (`.exe` for Windows,\n`.dmg` for MacOS, or `.tar.gz` for Linux) from IPA on the Box\n[installation packages](https://ipastorage.app.box.com/folder/129276324764?v=install-stata).\n\n## Coding Conventions\n\nSee the following resources for coding conventions in Stata:\n\n- [DIME Analytics Stata Style Guide](https://worldbank.github.io/dime-data-handbook/coding.html#the-dime-analytics-stata-style-guide)\n- [Sean Higgins's Stata guide](https://github.com/skhiggins/Stata_guide)\non GitHub.\n- [Coding and Data for the Social Sciences](https://web.stanford.edu/~gentzkow/research/CodeAndData.xhtml#magicparlabel-1248),\nby Matthew Gentzkow and Jesse Shapiro\n\n## Using Stata from Python\n\nWithin a Python script or Jupyter Notebook, you can call Stata using [pystata](https://www.stata.com/python/pystata18/notebook/Quick%20Start0.html).\n\n::: {#437f602b .cell execution_count=1}\n``` {.python .cell-code}\nimport stata_setup\n\n# set configuration to the path where Stata is installed and the flavor of Stata\n# in the case below, we're using Stata 18 SE\nstata_setup.config(\"C:/Program Files/Stata18/\", \"se\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\n ___ ____ ____ ____ ____ ®\n /__ / ____/ / ____/ Stata 18.0\n___/ / /___/ / /___/ SE—Standard Edition\n\n Statistics and Data Science Copyright 1985-2023 StataCorp LLC\n StataCorp\n 4905 Lakeway Drive\n College Station, Texas 77845 USA\n 800-782-8272 https://www.stata.com\n 979-696-4600 [email protected]\n\nStata license: Unlimited-user network, expiring 22 Jan 2025\nSerial number: 401809300803\n Licensed to: Niall Keleher\n Innovations for Poverty Action\n\nNotes:\n 1. Unicode is supported; see help unicode_advice.\n 2. Maximum number of variables is set to 5,000 but can be increased;\n see help set_maxvar.\n```\n:::\n:::\n\n\n### Call Stata using pystata API functions\n\n::: {#621f5a24 .cell execution_count=2}\n``` {.python .cell-code}\nfrom pystata import stata\n```\n:::\n\n\n::: {#c71c757b .cell execution_count=3}\n``` {.python .cell-code}\nstata.run(\n \"\"\"\n sysuse auto, clear\n reg mpg price i.foreign\n \"\"\"\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\n. \n. sysuse auto, clear\n(1978 automobile data)\n\n. reg mpg price i.foreign\n\n Source | SS df MS Number of obs = 74\n-------------+---------------------------------- F(2, 71) = 23.01\n Model | 960.866305 2 480.433152 Prob > F = 0.0000\n Residual | 1482.59315 71 20.8815937 R-squared = 0.3932\n-------------+---------------------------------- Adj R-squared = 0.3761\n Total | 2443.45946 73 33.4720474 Root MSE = 4.5696\n\n------------------------------------------------------------------------------\n mpg | Coefficient Std. err. t P>|t| [95% conf. interval]\n-------------+----------------------------------------------------------------\n price | -.000959 .0001815 -5.28 0.000 -.001321 -.000597\n |\n foreign |\n Foreign | 5.245271 1.163592 4.51 0.000 2.925135 7.565407\n _cons | 25.65058 1.271581 20.17 0.000 23.11512 28.18605\n------------------------------------------------------------------------------\n\n. \n. \n```\n:::\n:::\n\n\nOr use IPython magic commands to run Stata code in a Jupyter Notebook.\n\n::: {#24c9e231 .cell execution_count=4}\n``` {.python .cell-code}\n%%stata\nsysuse auto, clear\ndescribe\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\n. sysuse auto, clear\n(1978 automobile data)\n\n. describe\n\nContains data from C:\\Program Files\\Stata18/ado\\base/a/auto.dta\n Observations: 74 1978 automobile data\n Variables: 12 13 Apr 2022 17:45\n (_dta has notes)\n-------------------------------------------------------------------------------\nVariable Storage Display Value\n name type format label Variable label\n-------------------------------------------------------------------------------\nmake str18 %-18s Make and model\nprice int %8.0gc Price\nmpg int %8.0g Mileage (mpg)\nrep78 int %8.0g Repair record 1978\nheadroom float %6.1f Headroom (in.)\ntrunk int %8.0g Trunk space (cu. ft.)\nweight int %8.0gc Weight (lbs.)\nlength int %8.0g Length (in.)\nturn int %8.0g Turn circle (ft.)\ndisplacement int %8.0g Displacement (cu. in.)\ngear_ratio float %6.2f Gear ratio\nforeign byte %8.0g origin Car origin\n-------------------------------------------------------------------------------\nSorted by: foreign\n\n. \n```\n:::\n:::\n\n\n::: {#ec57624a .cell execution_count=5}\n``` {.python .cell-code}\n%stata scatter mpg price\n```\n\n::: {.cell-output .cell-output-display}\n![](stata_files/figure-html/cell-6-output-1.svg){}\n:::\n:::\n\n\n## Data Visualization\n\nConsider installing the [ipaplots](https://github.com/PovertyAction/ipaplots) for the IPA graph schema in Stata.\n\n## Learning References\n\nFor more information on learning and using Stata, see the\n[IPA-Stata-Trainings](https://github.com/PovertyAction/IPA-Stata-Trainings) repository on GitHub.\n\n- [Stata Video Tutorials](https://www.stata.com/links/video-tutorials/)\n\n",
"supporting": [
"stata_files\\figure-html"
],
"filters": [],
"includes": {}
}
}
Loading

0 comments on commit b1a0a17

Please sign in to comment.