diff --git a/build-a-web-scraper/01_inspect.ipynb b/build-a-web-scraper/01_inspect.ipynb
index c52866c54a..ca75fbe87a 100644
--- a/build-a-web-scraper/01_inspect.ipynb
+++ b/build-a-web-scraper/01_inspect.ipynb
@@ -11,6 +11,19 @@
"- Inspect the Site Using Developer Tools"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## ⚠️ Durabilty Warning ⚠️\n",
+ "\n",
+ "Like [mentioned in the course](https://realpython.com/lessons/challenge-of-durability/), websites frequently change. Unfortunately the job board that you'll see in the course, indeed.com, has started to block scraping of their site since the recording of the course.\n",
+ "\n",
+ "Just like in the associated written tutorial on [web scraping with beautiful soup](https://realpython.com/beautiful-soup-web-scraper-python/#scrape-the-fake-python-job-site), you can instead use [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) to practice scraping a static website.\n",
+ "\n",
+ "All the concepts discussed in the course lessons are still accurate. Translating what you see onto a different website will be a good learning opportunity where you'll have to synthesize the information and apply it practically."
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -22,1728 +35,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "In this course, you will work with [indeed.com](https://www.indeed.com/worldwide)"
+ "In this course, you will see the instructor work with [indeed.com](https://www.indeed.com/worldwide). You should instead apply the shown concepts to [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/)."
]
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": null,
"metadata": {
- "collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- "\n",
- "
Displayed here are Job Ads that match your query. Indeed may be compensated by these employers, helping keep Indeed free for jobseekers. Indeed ranks Job Ads based on a combination of employer bids and relevance, such as your search terms and other activity on Indeed. For more information, see the Indeed Terms of Service
\n",
- "Be the first to see new python jobs in new york\n",
- "
\n",
- "
\n",
- "\n",
- "
\n",
- "\n",
- "\n",
- "By creating a job alert, you agree to our Terms. You can change your consent settings at any time by unsubscribing or as detailed in our terms.\n",
- "
\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n"
- ],
- "text/plain": [
- ""
- ]
- },
- "execution_count": 4,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
"from IPython.display import HTML\n",
"\n",
- "HTML(\"https://www.indeed.com/jobs?q=python&l=new+york\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Decipher the Information in URLs\n",
- "\n",
- "`https://www.indeed.com/jobs?q=python&l=new+york`\n",
- "\n",
- "- **Base URL**\n",
- " - `https://www.indeed.com/jobs`\n",
- "- **Query Parameters**\n",
- " - Start & Separators: `?`, `&`\n",
- " - Information: `q=python`, `l=new+york`"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Every page is different. E.g., here is a weird thing that happens on this site: When typing a city name into the URL directly rather than running the search, the URL changes to `https://www.indeed.com/q-python-l-new-york-jobs.html`. The results are the same and one refers to the other, so it doesn't matter in this case.\n",
- "\n",
- "Always spend time getting to know the page you want to scrape before you start writing code. It saves a lot of time and effort in the long run."
+ "HTML(\"https://realpython.github.io/fake-jobs/\")"
]
},
{
@@ -1752,13 +59,15 @@
"source": [
"## Inspect the Site Using Developer Tools\n",
"\n",
- "Let's head back over to your [example search results](https://www.indeed.com/jobs?q=python&l=new+york) and inspect it using Developer Tools."
+ "Always spend time getting to know the page you want to scrape before you start writing code. It saves a lot of time and effort in the long run.\n",
+ "\n",
+ "Let's head back over to [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) and inspect it using Developer Tools."
]
}
],
"metadata": {
"kernelspec": {
- "display_name": "Python 3",
+ "display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -1772,7 +81,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.8.0"
+ "version": "3.11.0"
}
},
"nbformat": 4,
diff --git a/build-a-web-scraper/02_scrape.ipynb b/build-a-web-scraper/02_scrape.ipynb
index ff3de1b243..72bb338e78 100644
--- a/build-a-web-scraper/02_scrape.ipynb
+++ b/build-a-web-scraper/02_scrape.ipynb
@@ -13,6 +13,19 @@
"In this course, you will work with a static website. You will also get a high-level overview about the challenges of scraping dynamically generated information and data behind logins."
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## ⚠️ Durabilty Warning ⚠️\n",
+ "\n",
+ "Like [mentioned in the course](https://realpython.com/lessons/challenge-of-durability/), websites frequently change. Unfortunately the job board that you'll see in the course, indeed.com, has started to block scraping of their site since the recording of the course.\n",
+ "\n",
+ "Just like in the associated written tutorial on [web scraping with beautiful soup](https://realpython.com/beautiful-soup-web-scraper-python/#scrape-the-fake-python-job-site), you can instead use [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) to practice scraping a static website.\n",
+ "\n",
+ "All the concepts discussed in the course lessons are still accurate. Translating what you see onto a different website will be a good learning opportunity where you'll have to synthesize the information and apply it practically."
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -22,7 +35,7 @@
},
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -35,7 +48,7 @@
"metadata": {},
"outputs": [],
"source": [
- "url = \"https://www.indeed.com/jobs?q=python&l=new+york\"\n",
+ "url = \"https://realpython.github.io/fake-jobs/\"\n",
"response = requests.get(url)"
]
},
@@ -72,7 +85,7 @@
"metadata": {},
"outputs": [],
"source": [
- "response.content[loc - 10 : loc + 10]"
+ "response.content[loc - 50 : loc + 50]"
]
},
{
@@ -145,7 +158,7 @@
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -154,40 +167,18 @@
},
{
"cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "200"
- ]
- },
- "execution_count": 3,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
"source": [
"res.status_code # the code says all is fine..."
]
},
{
"cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n
\\n\\n\\n\\n\\n\\n\\n\\n\\n'"
- ]
- },
- "execution_count": 4,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
"source": [
"res.content # ... but the content doesn't contain what you're looking for!"
]
@@ -202,7 +193,7 @@
],
"metadata": {
"kernelspec": {
- "display_name": "Python 3",
+ "display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -216,7 +207,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.8.0"
+ "version": "3.11.0"
}
},
"nbformat": 4,
diff --git a/build-a-web-scraper/03_parse.ipynb b/build-a-web-scraper/03_parse.ipynb
index 4fd6988c0e..4db03a1dea 100644
--- a/build-a-web-scraper/03_parse.ipynb
+++ b/build-a-web-scraper/03_parse.ipynb
@@ -12,16 +12,29 @@
"- Extract Attributes From HTML Elements"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## ⚠️ Durabilty Warning ⚠️\n",
+ "\n",
+ "Like [mentioned in the course](https://realpython.com/lessons/challenge-of-durability/), websites frequently change. Unfortunately the job board that you'll see in the course, indeed.com, has started to block scraping of their site since the recording of the course.\n",
+ "\n",
+ "Just like in the associated written tutorial on [web scraping with beautiful soup](https://realpython.com/beautiful-soup-web-scraper-python/#scrape-the-fake-python-job-site), you can instead use [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) to practice scraping a static website.\n",
+ "\n",
+ "All the concepts discussed in the course lessons are still accurate. Translating what you see onto a different website will be a good learning opportunity where you'll have to synthesize the information and apply it practically."
+ ]
+ },
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# scrape the site\n",
"import requests\n",
"\n",
- "url = \"https://www.indeed.com/jobs?q=python&l=new+york\"\n",
+ "url = \"https://realpython.github.io/fake-jobs/\"\n",
"response = requests.get(url)"
]
},
@@ -34,7 +47,7 @@
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -43,7 +56,7 @@
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -52,1308 +65,13 @@
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": null,
"metadata": {
- "collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "Python Jobs, Employment in New York State | Indeed.com\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "\n",
- "
Displayed here are Job Ads that match your query. Indeed may be compensated by these employers, helping keep Indeed free for jobseekers. Indeed ranks Job Ads based on a combination of employer bids and relevance, such as your search terms and other activity on Indeed. For more information, see the Indeed Terms of Service
\n",
- "\n",
- "New York State\n",
- "•\n",
- "Remote work available\n",
- "
\n",
- "
\n",
- "
\n",
- "
You will help drive business results through building a robust data engine to build business-critical, scalable, and robust data pipelines and intuitive data…
\n",
- "\n",
- "New York, NY 10020 (Midtown area)\n",
- "
\n",
- "
\n",
- "
\n",
- "
Perform research and identify unique alternative datasets relevant to financial services applications; perform data validation further assessing validity,…
\n",
- "\n",
- "New York, NY 10003 (Greenwich Village area)\n",
- "
\n",
- "
\n",
- "
\n",
- "
The Data Technician role is an entry-level position suitable for someone looking to break into the startup/big data world and gain experience working with top…
\n",
- "\n",
- "New York, NY 10005 (Financial District area)\n",
- "
\n",
- "
\n",
- "
\n",
- "
We are looking for an up-and-coming developer who loves coding, enjoys taking on challenging problems, and wants to make an immediate and tangible impact.
\n",
- "\n",
- "New York State\n",
- "•\n",
- "Remote work available\n",
- "
\n",
- "
\n",
- "
\n",
- "
The Subject Matter Expert provides the Codecademy Curriculum team with specialized, up-to-date, nuanced insight into the field of database engineering and how…
\n",
- "\n",
- "New York State\n",
- "•\n",
- "Remote work available\n",
- "
\n",
- "
\n",
- "
\n",
- "
The Subject Matter Expert provides the Codecademy Curriculum team with specialized, up-to-date, nuanced insight into the field of database engineering and how…
\n",
- "\n",
- "New York, NY\n",
- "•\n",
- "Remote work available\n",
- "
\n",
- "
\n",
- "
\n",
- "
Systems Thinking is an increasingly popular method of analysis and problem solving, widely applicable in industry, government, not-for-profit, education,…
\n",
- "Be the first to see new python jobs in new york\n",
- "
\n",
- "
\n",
- "\n",
- "
\n",
- "\n",
- "\n",
- "By creating a job alert, you agree to our Terms. You can change your consent settings at any time by unsubscribing or as detailed in our terms.\n",
- "
Displayed here are Job Ads that match your query. Indeed may be compensated by these employers, helping keep Indeed free for jobseekers. Indeed ranks Job Ads based on a combination of employer bids and relevance, such as your search terms and other activity on Indeed. For more information, see the Indeed Terms of Service
\n",
- "\n",
- "New York State\n",
- "•\n",
- "Remote work available\n",
- "
\n",
- "
\n",
- "
\n",
- "
You will help drive business results through building a robust data engine to build business-critical, scalable, and robust data pipelines and intuitive data…
\n",
- "\n",
- "New York, NY 10020 (Midtown area)\n",
- "
\n",
- "
\n",
- "
\n",
- "
Perform research and identify unique alternative datasets relevant to financial services applications; perform data validation further assessing validity,…
\n",
- "\n",
- "New York, NY 10003 (Greenwich Village area)\n",
- "
\n",
- "
\n",
- "
\n",
- "
The Data Technician role is an entry-level position suitable for someone looking to break into the startup/big data world and gain experience working with top…
\n",
- "\n",
- "New York, NY 10005 (Financial District area)\n",
- "
\n",
- "
\n",
- "
\n",
- "
We are looking for an up-and-coming developer who loves coding, enjoys taking on challenging problems, and wants to make an immediate and tangible impact.
\n",
- "\n",
- "New York State\n",
- "•\n",
- "Remote work available\n",
- "
\n",
- "
\n",
- "
\n",
- "
The Subject Matter Expert provides the Codecademy Curriculum team with specialized, up-to-date, nuanced insight into the field of database engineering and how…
\n",
- "\n",
- "New York State\n",
- "•\n",
- "Remote work available\n",
- "
\n",
- "
\n",
- "
\n",
- "
The Subject Matter Expert provides the Codecademy Curriculum team with specialized, up-to-date, nuanced insight into the field of database engineering and how…
\n",
- "\n",
- "New York, NY\n",
- "•\n",
- "Remote work available\n",
- "
\n",
- "
\n",
- "
\n",
- "
Systems Thinking is an increasingly popular method of analysis and problem solving, widely applicable in industry, government, not-for-profit, education,…