Skip to content

Commit

Permalink
mention captchas and ip blocking on dissecting a website
Browse files Browse the repository at this point in the history
  • Loading branch information
zstumgoren committed Apr 8, 2024
1 parent 7a34d7d commit 78ef99d
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions content/web_scraping/dissecting_websites.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
"1. Does the site require a user to log in?\n",
"1. Is the site using sessions/cookies to manage client connections?\n",
"1. Is the target data in the source HTML or is it dynamically generated by Javascript after the page has loaded in the browser? (*see [Website Personalities](website_personalities.ipynb) and [Driver the Browser, Robot](drive_the_browser_robot.ipynb)*)\n",
"1. Are there CAPTCHAs or does the site block IP addresses that issue too many requests? *Note: Often you'll only discover these roadblocks while testing or running a scraping. see [Website Personalities](website_personalities.ipynb) for more background.*\n",
"\n",
"[pagination]: https://en.wikipedia.org/wiki/Pagination#Pagination_on_UI"
]
Expand Down

0 comments on commit 78ef99d

Please sign in to comment.