Skip to content

Commit

Permalink
flesh out install steps and other clean up on driver the browser robo…
Browse files Browse the repository at this point in the history
…t NB
  • Loading branch information
zstumgoren committed Apr 8, 2024
1 parent 560c325 commit d750b24
Showing 1 changed file with 17 additions and 5 deletions.
22 changes: 17 additions & 5 deletions content/web_scraping/drive_the_browser_robot.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,22 +55,34 @@
"source": [
"## Setting up the Environment\n",
"\n",
"Install `playwright` using `pip` or `pipenv`.\n",
"If you haven't already done so, clone the [data-journalism-notebooks GitHub repository](https://github.com/stanfordjournalism/data-journalism-notebooks), either using VS Code or the command line:\n",
"\n",
"```bash\n",
"# Here's how to use plain old git to clone on the command line\n",
"git clone [email protected]:stanfordjournalism/data-journalism-notebooks.git\n",
"```\n",
"\n",
"You can install `playwright` using `pip` or `pipenv`. We've included `playwright` in the `Pipfile` for this repo, so plain-old `pipenv install` should have you covered.\n",
"\n",
"```bash\n",
"# On the command line, navigate to the code repo,\n",
"# replacing \"path/to\" with a real path on your machine.\n",
"cd path/to/data-journalism-notebooks\n",
"pipenv install playwright\n",
"pipenv install # will include playwright and some other goodies\n",
"```\n",
"\n",
"Make sure you have the Google Chromium installed on your machine, though note you can also use Firefox and other browsers with `playwright`.\n",
"Make sure you have the Google Chromium browser installed on your machine, though note you can also use Firefox and other browsers with `playwright`.\n",
"\n",
"`playwright` requires browser drivers to interact with web browsers installed on your machine.\n",
"\n",
"Run the following command to install the browser drivers:\n",
" \n",
"```bash\n",
"# On the command line, navigate to our repo\n",
"cd path/to/data-journalism-notebooks\n",
"# Activate the virtual environment (where playwright was installed)\n",
"pipenv shell\n",
"# Install the drivers\n",
"playwright install\n",
"```\n",
"\n",
Expand Down Expand Up @@ -116,9 +128,9 @@
"\n",
"Because we're working in Jupyter Lab, which has its own [\"event loop\"](https://medium.com/@dpzhcmy/running-asynchronous-code-in-jupyter-notebooks-managing-event-loops-b9696a596ce4), we have to use Playwright in so-called `async` mode.\n",
"\n",
"In practical terms, that means you have to prepend the `await` keyword on all invocations of `playwright`. \n",
"In practical terms, that means you have to prepend the `await` keyword on most invocations of `playwright` classes, methods, etc.\n",
"\n",
"> *See [Async IO in Python: A Complete Walkthrough](https://realpython.com/async-io-python/) for background.*"
"> *See [Hidden Life of Objects](../classes_and_oop/hidden_life_of_objects.ipynb) if you're unfamiliar with terms such as classes and methods. Check out [Async IO in Python: A Complete Walkthrough](https://realpython.com/async-io-python/) for background on asynchronous Python.*"
]
},
{
Expand Down

0 comments on commit d750b24

Please sign in to comment.