From 9b987feed128fe07fa3e5d8e41aef6225463f6bf Mon Sep 17 00:00:00 2001 From: Serdar Tumgoren Date: Sun, 7 Apr 2024 15:28:29 -0700 Subject: [PATCH] web scraping tweaks --- content/web_scraping/skip_scraping_cheat.ipynb | 4 ++-- content/web_scraping/wysiwyg_scraping.ipynb | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/content/web_scraping/skip_scraping_cheat.ipynb b/content/web_scraping/skip_scraping_cheat.ipynb index c01c459..bac98f5 100644 --- a/content/web_scraping/skip_scraping_cheat.ipynb +++ b/content/web_scraping/skip_scraping_cheat.ipynb @@ -78,7 +78,7 @@ "- Clicking on the web request for the API call\n", "- Heading over to the `Headers` tab for the web request\n", "\n", - "In the information panel, you should see a downright awful URL. It contains a boatload of URL parameters after the `?` in the form `key=value` pairs, separated by ampersands (`&`). These are variables of sorts that instruct the API on what data to return. Normally, these parameters are configured by a web form filled out by a human visiting the website.\n", + "In the information panel, you should see a downright awful URL. It contains a boatload of URL parameters after the `?` in the form of `key=value` pairs, separated by ampersands (`&`). These are variables of sorts that instruct the API on what data to return. Normally, these parameters are configured by a web form filled out by a human visiting the website.\n", "\n", "If you look close, you may notice that the URL parameters include one particularly interesting morsel: `pageSize=20`\n", "\n", @@ -103,7 +103,7 @@ "\n", "There was no need to scrape the search page, fill out a form, get the results back, and then page through the search results, extracting data points from HTML along the way. If that sounds painful and error-prone, you have good instincts. It's a workable solution, but in this case it's total overkill.\n", "\n", - "Instead, we gave the site a phsyical exam (sorry, had to sneak one more in...) and realized that we could skip the scraping entirely and just grab the data.\n", + "Instead, we gave the site a [phsyical exam](dissecting_websites.ipynb) and realized that we could skip the scraping entirely and just grab the data.\n", "\n", "If you've never dissected a website like this before, all of the above likely seems like magic. It might even feel like this process would take just as long as writing a web scraper. But you'd be wrong. As you gain comfort with dissecting websites, the techniques described here will take you minutes -- perhaps even seconds -- on many sites.\n", "\n", diff --git a/content/web_scraping/wysiwyg_scraping.ipynb b/content/web_scraping/wysiwyg_scraping.ipynb index 2d71733..3b72e53 100644 --- a/content/web_scraping/wysiwyg_scraping.ipynb +++ b/content/web_scraping/wysiwyg_scraping.ipynb @@ -15,7 +15,7 @@ "\n", "Why bring this up?\n", "\n", - "Because it's a useful analogy for web pages. Back in the days of yore, many (perhaps most?) websites followed the WYSIWYG principle. These were simpler times, when the the content displayed on a web page closely matched the HTML in the underlying document for a page.\n", + "Because it's a useful analogy for web pages. Back in the days of yore, many (perhaps most?) websites followed the WYSIWYG principle. These were simpler times, when the content displayed on a web page closely matched the HTML in the underlying document for a page.\n", "\n", "If your web browser showed a table of data, it was quite likely that you'd find a `` element somewhere in the page's HTML. \n", "\n", @@ -424,7 +424,7 @@ " fields[6].text.strip()\n", " ]\n", " # Mash up the headers with the field values into a dictionary\n", - " # - zip creates pairs each column header with the corresponding field in a two-element list\n", + " # - zip pairs each column header with the corresponding field in a two-element list\n", " # - dict transforms the list of column/value pairs into a dictionary\n", " bank_data = dict(zip(column_names, field_values))\n", " all_banks.append(bank_data)\n", @@ -456,7 +456,7 @@ "id": "d57695a8-f62b-4c6e-9615-0afdd73bc3c3", "metadata": {}, "source": [ - "Does that number match the count on the FDIC site? And in their download CSV?" + "Does that number match the count on the FDIC site? And in their downloadable CSV?" ] }, {