diff --git a/build-a-web-scraper/01_inspect.ipynb b/build-a-web-scraper/01_inspect.ipynb index c52866c54a..ca75fbe87a 100644 --- a/build-a-web-scraper/01_inspect.ipynb +++ b/build-a-web-scraper/01_inspect.ipynb @@ -11,6 +11,19 @@ "- Inspect the Site Using Developer Tools" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## ⚠️ Durabilty Warning ⚠️\n", + "\n", + "Like [mentioned in the course](https://realpython.com/lessons/challenge-of-durability/), websites frequently change. Unfortunately the job board that you'll see in the course, indeed.com, has started to block scraping of their site since the recording of the course.\n", + "\n", + "Just like in the associated written tutorial on [web scraping with beautiful soup](https://realpython.com/beautiful-soup-web-scraper-python/#scrape-the-fake-python-job-site), you can instead use [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) to practice scraping a static website.\n", + "\n", + "All the concepts discussed in the course lessons are still accurate. Translating what you see onto a different website will be a good learning opportunity where you'll have to synthesize the information and apply it practically." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -22,1728 +35,22 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In this course, you will work with [indeed.com](https://www.indeed.com/worldwide)" + "In this course, you will see the instructor work with [indeed.com](https://www.indeed.com/worldwide). You should instead apply the shown concepts to [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/)." ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { - "collapsed": true, "jupyter": { "outputs_hidden": true } }, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "Python Jobs, Employment in New York State | Indeed.com\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
Skip to Job Postings, SearchClose
\n", - "\n", - "
\n", - "\n", - "\n", - ":\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - " \n", - " \n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n", - " \n", - "\n", - "
\n", - "\n", - "\n", - "
\n", - " Upload your resume - Let employers find you

\n", - " python jobs in New York State

\n", - "
\n", - "Sort by: \n", - "relevance -\n", - "date\n", - "
\n", - "
\n", - "
\n", - " Page 1 of 3,370 jobs
\n", - "\n", - "

Displayed here are Job Ads that match your query. Indeed may be compensated by these employers, helping keep Indeed free for jobseekers. Indeed ranks Job Ads based on a combination of employer bids and relevance, such as your search terms and other activity on Indeed. For more information, see the Indeed Terms of Service
\n", - "
\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "

\n", - "\n", - "Python Tutor\n", - "\n", - "new

\n", - "\n", - "
\n", - "
\n", - "\n", - "Premier Plus Arts & Science\n", - "\n", - "
\n", - "
\n", - "\n", - "Syosset, NY 11791\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "$35 - $70 an hour\n", - "\n", - "
\n", - "
Be an early applicant
\n", - "
    \n", - "
  • We also offer an assortment of test preparation courses.
  • \n", - "
  • We are looking for people with well-developed communication skills, and a spark that will engage…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "

\n", - "\n", - "Python / Django Developer Self-Employed Contractor\n", - "\n", - "new

\n", - "\n", - "
\n", - "
\n", - "\n", - "BCB International Inc\n", - "\n", - "
\n", - "
\n", - "\n", - "Buffalo, NY 14213 (Front Park area)\n", - "\n", - "Remote work available\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "$30 - $50 an hour\n", - "\n", - "
\n", - "
Requirements
Python: 3 years
\n", - "
    \n", - "
  • Python/django Developer Self-Employed Contractor wanted for maintenance, update and development of critical application.
  • \n", - "
  • Must be able to work under pressure.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "

\n", - "\n", - "Marketing Data Analyst\n", - "\n", - "new

\n", - "\n", - "
\n", - "
\n", - "\n", - "Kung Fu Tea\n", - "\n", - "
\n", - "
\n", - "\n", - "New York, NY 10018 (Clinton area)\n", - "\n", - "Remote work available\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "$60,000 - $75,000 a year\n", - "\n", - "
\n", - "
Requirements
Google Analytics: 1 year
Excel: 3 years
Data Analytics: 3 years
1 more
\n", - "
    \n", - "
  • Analyze Kung Fu Tea app users’ demographics and segmented users depend on marketing campaign goals.
  • \n", - "
  • Dive deep into app user data, campaign performance data and…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "

\n", - "\n", - "Python Developer\n", - "\n", - "new

\n", - "\n", - "
\n", - "
\n", - "\n", - "Info Origin Inc,\n", - "\n", - "
\n", - "
\n", - "\n", - "New York, NY\n", - "\n", - "Remote work available\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "$80 an hour\n", - "\n", - "
\n", - "
Requirements
Python: 5 years
Data Science: 5 years
Machine Learning: 5 years
1 more
\n", - "
    \n", - "
  • New opportunity available in New York!
  • \n", - "
  • Candidate must come on our payroll (Only for W2).
  • \n", - "
  • 5-12 years experience developing & designing complex, data-driven…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "

\n", - "\n", - "Research Associates\n", - "\n", - "new

\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "Rosenblatt Securities\n", - "\n", - "\n", - "\n", - "\n", - "3.7\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "New York, NY 10005 (Financial District area)\n", - "
\n", - "\n", - "
\n", - "
    \n", - "
  • Rosenblatt Securities is hiring full-time Research Associates to support our research analysts.
  • \n", - "
  • You will do primary research such as surveys, web-scraping,…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "

\n", - "\n", - "Python Developer\n", - "\n", - "new

\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "PruTech Solutions\n", - "\n", - "\n", - "\n", - "\n", - "4.2\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "New York, NY\n", - "
\n", - "\n", - "
\n", - "
    \n", - "
  • Expert in Python API development to support an angular frontend app.
  • \n", - "
  • Experience in working with Conda libraries.
  • \n", - "
  • Ability to work with Python Data Frames.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "

\n", - "\n", - "Data Scientist\n", - "\n", - "new

\n", - "\n", - "
\n", - "
\n", - "\n", - "Glean\n", - "\n", - "
\n", - "
\n", - "\n", - "New York, NY 10001 (Chelsea area)\n", - "\n", - "Remote work available\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "$100,000 - $140,000 a year\n", - "\n", - "
\n", - "
\n", - "
    \n", - "
  • Glean is injecting intelligence into expense management.
  • \n", - "
  • Founded by fintech veteran and seasoned CFO Howard Katzenberg, Glean is an AI-powered spend…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "

\n", - "\n", - "C++ / Python Software Engineer\n", - "\n", - "

\n", - "\n", - "
\n", - "
\n", - "\n", - "Luminous Analytics\n", - "\n", - "
\n", - "
\n", - "\n", - "New York, NY 10016 (Murray Hill area)\n", - "\n", - "Remote work available\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "$180,000 - $200,000 a year\n", - "\n", - "
\n", - "
\n", - "
    \n", - "
  • We utilize industry best-in-class analysis through data warehousing and business intelligence.
  • \n", - "
  • Strong experience designing, developing and deploying application…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "

\n", - "\n", - "Python summer Teacher\n", - "\n", - "

\n", - "\n", - "
\n", - "
\n", - "\n", - "All for kids development center\n", - "\n", - "
\n", - "
\n", - "\n", - "Jericho, NY 11753\n", - "\n", - "Remote work available\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "$40 - $50 an hour\n", - "\n", - "
\n", - "
Responsive employer
\n", - "
    \n", - "
  • Looking for teacher to do Introduction to Python Programming for Elementary School Students (grades 4-5) who have no prior programming experience.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "

\n", - "\n", - "Python Developer\n", - "\n", - "

\n", - "\n", - "
\n", - "
\n", - "\n", - "Systemonex INC\n", - "\n", - "
\n", - "
\n", - "\n", - "New York, NY\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "$35 - $45 an hour\n", - "\n", - "
\n", - "
Requirements
Bachelor's
\n", - "
    \n", - "
  • The company with relevant services and trustworthy efforts.
  • \n", - "
  • Systemonex Inc maintains the standard as suitable scientific techniques and surveys.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "
\n", - "Resume Resources\n", - "Resume Samples\n", - "- Resume Templates\n", - "
\n", - "
\n", - "Career Resources: \n", - "Career Explorer -\n", - "Salary Calculator\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "Be the first to see new python jobs in new york\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "By creating a job alert, you agree to our Terms. You can change your consent settings at any time by unsubscribing or as detailed in our terms.\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "
\n", - "\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "
Let Employers Find You Upload Your Resume
\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "from IPython.display import HTML\n", "\n", - "HTML(\"https://www.indeed.com/jobs?q=python&l=new+york\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Decipher the Information in URLs\n", - "\n", - "`https://www.indeed.com/jobs?q=python&l=new+york`\n", - "\n", - "- **Base URL**\n", - " - `https://www.indeed.com/jobs`\n", - "- **Query Parameters**\n", - " - Start & Separators: `?`, `&`\n", - " - Information: `q=python`, `l=new+york`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Every page is different. E.g., here is a weird thing that happens on this site: When typing a city name into the URL directly rather than running the search, the URL changes to `https://www.indeed.com/q-python-l-new-york-jobs.html`. The results are the same and one refers to the other, so it doesn't matter in this case.\n", - "\n", - "Always spend time getting to know the page you want to scrape before you start writing code. It saves a lot of time and effort in the long run." + "HTML(\"https://realpython.github.io/fake-jobs/\")" ] }, { @@ -1752,13 +59,15 @@ "source": [ "## Inspect the Site Using Developer Tools\n", "\n", - "Let's head back over to your [example search results](https://www.indeed.com/jobs?q=python&l=new+york) and inspect it using Developer Tools." + "Always spend time getting to know the page you want to scrape before you start writing code. It saves a lot of time and effort in the long run.\n", + "\n", + "Let's head back over to [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) and inspect it using Developer Tools." ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -1772,7 +81,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.0" + "version": "3.11.0" } }, "nbformat": 4, diff --git a/build-a-web-scraper/02_scrape.ipynb b/build-a-web-scraper/02_scrape.ipynb index ff3de1b243..72bb338e78 100644 --- a/build-a-web-scraper/02_scrape.ipynb +++ b/build-a-web-scraper/02_scrape.ipynb @@ -13,6 +13,19 @@ "In this course, you will work with a static website. You will also get a high-level overview about the challenges of scraping dynamically generated information and data behind logins." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## ⚠️ Durabilty Warning ⚠️\n", + "\n", + "Like [mentioned in the course](https://realpython.com/lessons/challenge-of-durability/), websites frequently change. Unfortunately the job board that you'll see in the course, indeed.com, has started to block scraping of their site since the recording of the course.\n", + "\n", + "Just like in the associated written tutorial on [web scraping with beautiful soup](https://realpython.com/beautiful-soup-web-scraper-python/#scrape-the-fake-python-job-site), you can instead use [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) to practice scraping a static website.\n", + "\n", + "All the concepts discussed in the course lessons are still accurate. Translating what you see onto a different website will be a good learning opportunity where you'll have to synthesize the information and apply it practically." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -22,7 +35,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -35,7 +48,7 @@ "metadata": {}, "outputs": [], "source": [ - "url = \"https://www.indeed.com/jobs?q=python&l=new+york\"\n", + "url = \"https://realpython.github.io/fake-jobs/\"\n", "response = requests.get(url)" ] }, @@ -72,7 +85,7 @@ "metadata": {}, "outputs": [], "source": [ - "response.content[loc - 10 : loc + 10]" + "response.content[loc - 50 : loc + 50]" ] }, { @@ -145,7 +158,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -154,40 +167,18 @@ }, { "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "200" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "res.status_code # the code says all is fine..." ] }, { "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "b'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n
Something went wrong, but don\\xe2\\x80\\x99t fret \\xe2\\x80\\x94 let\\xe2\\x80\\x99s give it another shot.
\\n\\n\\n\\n\\n\\n\\n\\n\\n'" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "res.content # ... but the content doesn't contain what you're looking for!" ] @@ -202,7 +193,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -216,7 +207,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.0" + "version": "3.11.0" } }, "nbformat": 4, diff --git a/build-a-web-scraper/03_parse.ipynb b/build-a-web-scraper/03_parse.ipynb index 4fd6988c0e..4db03a1dea 100644 --- a/build-a-web-scraper/03_parse.ipynb +++ b/build-a-web-scraper/03_parse.ipynb @@ -12,16 +12,29 @@ "- Extract Attributes From HTML Elements" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## ⚠️ Durabilty Warning ⚠️\n", + "\n", + "Like [mentioned in the course](https://realpython.com/lessons/challenge-of-durability/), websites frequently change. Unfortunately the job board that you'll see in the course, indeed.com, has started to block scraping of their site since the recording of the course.\n", + "\n", + "Just like in the associated written tutorial on [web scraping with beautiful soup](https://realpython.com/beautiful-soup-web-scraper-python/#scrape-the-fake-python-job-site), you can instead use [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) to practice scraping a static website.\n", + "\n", + "All the concepts discussed in the course lessons are still accurate. Translating what you see onto a different website will be a good learning opportunity where you'll have to synthesize the information and apply it practically." + ] + }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# scrape the site\n", "import requests\n", "\n", - "url = \"https://www.indeed.com/jobs?q=python&l=new+york\"\n", + "url = \"https://realpython.github.io/fake-jobs/\"\n", "response = requests.get(url)" ] }, @@ -34,7 +47,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -43,7 +56,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -52,1308 +65,13 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { - "collapsed": true, "jupyter": { "outputs_hidden": true } }, - "outputs": [ - { - "data": { - "text/plain": [ - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "Python Jobs, Employment in New York State | Indeed.com\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
Skip to Job Postings, SearchClose
\n", - "\n", - "
\n", - "\n", - "\n", - ":\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "\n", - "
\n", - " Upload your resume - Let employers find you

\n", - " python jobs in New York State

\n", - "
\n", - "Sort by: \n", - "relevance -\n", - "date\n", - "
\n", - "
\n", - "
\n", - " Page 1 of 3,401 jobs
\n", - "

Displayed here are Job Ads that match your query. Indeed may be compensated by these employers, helping keep Indeed free for jobseekers. Indeed ranks Job Ads based on a combination of employer bids and relevance, such as your search terms and other activity on Indeed. For more information, see the Indeed Terms of Service
\n", - "
\n", - "
\n", - "\n", - "
\n", - "

\n", - "\n", - "Penetration Testing Trainee (Remote USA)\n", - "

\n", - "
\n", - "
\n", - "\n", - "BreachLock\n", - "
\n", - "
\n", - "Florida, NY\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • BreachLocks’s modern SaaS-based approach redefines the old school and time-consuming pen test model into fast and comprehensive security as service.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Data Engineer Summer Internship (REMOTE)\n", - "new

\n", - "
\n", - "
\n", - "\n", - "Vee Search\n", - "
\n", - "
\n", - "New York State\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • You will help drive business results through building a robust data engine to build business-critical, scalable, and robust data pipelines and intuitive data…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Python & JavaScript Developer\n", - "

\n", - "
\n", - "\n", - "
\n", - "New York, NY 10013 (SoHo area)\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • The candidate will join the growing R&D team to develop software and technologies to enabled media activation, optimization, and measurement products.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Alternative Data Research Analyst\n", - "new

\n", - "
\n", - "
\n", - "\n", - "Yewno\n", - "
\n", - "
\n", - "New York, NY 10020 (Midtown area)\n", - "
\n", - "
\n", - "
    \n", - "
  • Perform research and identify unique alternative datasets relevant to financial services applications; perform data validation further assessing validity,…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Data Technician (Full- or Part-Time)\n", - "

\n", - "
\n", - "
\n", - "\n", - "Thasos Group\n", - "
\n", - "
\n", - "New York, NY 10003 (Greenwich Village area)\n", - "
\n", - "
\n", - "
    \n", - "
  • The Data Technician role is an entry-level position suitable for someone looking to break into the startup/big data world and gain experience working with top…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Python Developer - Compliance\n", - "

\n", - "
\n", - "\n", - "
\n", - "New York, NY 10005 (Financial District area)\n", - "
\n", - "
\n", - "
    \n", - "
  • We are looking for an up-and-coming developer who loves coding, enjoys taking on challenging problems, and wants to make an immediate and tangible impact.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Content Contributor: Deep Learning with TensorFlow\n", - "

\n", - "
\n", - "\n", - "
\n", - "New York State\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • The Subject Matter Expert provides the Codecademy Curriculum team with specialized, up-to-date, nuanced insight into the field of database engineering and how…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Subject Matter Expert: Deep Learning with TensorFlow\n", - "

\n", - "
\n", - "\n", - "
\n", - "New York State\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • The Subject Matter Expert provides the Codecademy Curriculum team with specialized, up-to-date, nuanced insight into the field of database engineering and how…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Junior Front End / Full Stack Software Engineer\n", - "

\n", - "
\n", - "
\n", - "\n", - "Plectica\n", - "
\n", - "
\n", - "New York, NY\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • Systems Thinking is an increasingly popular method of analysis and problem solving, widely applicable in industry, government, not-for-profit, education,…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "2020 Enterprise Data Accelerated Talent Entry Program\n", - "new

\n", - "
\n", - "\n", - "
\n", - "New York, NY\n", - "
\n", - "
\n", - "
    \n", - "
  • We partner closely with our clients, taking time to understand their unique businesses and individual data and technology needs.
  • \n", - "
  • NET, Java, C++ and Python.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "
\n", - "Resume Resources\n", - "Resume Samples\n", - "- Resume Templates\n", - "
\n", - "
\n", - "Career Resources: \n", - "Career Explorer -\n", - "Salary Calculator\n", - "\n", - "
\n", - "\n", - "
\n", - "Employer Resources: \n", - "How to Write a Job Description -\n", - "How to Hire Employees\n", - "
\n", - " \n", - "\n", - "
\n", - "
\n", - "
\n", - "Be the first to see new python jobs in new york\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "By creating a job alert, you agree to our Terms. You can change your consent settings at any time by unsubscribing or as detailed in our terms.\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "\n", - "\n", - "
\n", - "
\n", - "
\n", - "\n", - "
Let Employers Find You Upload Your Resume
\n", - "\n", - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "soup" ] @@ -1376,554 +94,22 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "results = soup.find(id=\"resultsCol\")" + "results = soup.find(id=\"ResultsContainer\")" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": { - "collapsed": true, "jupyter": { "outputs_hidden": true } }, - "outputs": [ - { - "data": { - "text/plain": [ - "\n", - "
\n", - "
\n", - "\n", - "
\n", - "\n", - "
\n", - " Upload your resume - Let employers find you

\n", - " python jobs in New York State

\n", - "
\n", - "Sort by: \n", - "relevance -\n", - "date\n", - "
\n", - "
\n", - "
\n", - " Page 1 of 3,401 jobs
\n", - "

Displayed here are Job Ads that match your query. Indeed may be compensated by these employers, helping keep Indeed free for jobseekers. Indeed ranks Job Ads based on a combination of employer bids and relevance, such as your search terms and other activity on Indeed. For more information, see the Indeed Terms of Service
\n", - "
\n", - "
\n", - "\n", - "
\n", - "

\n", - "\n", - "Penetration Testing Trainee (Remote USA)\n", - "

\n", - "
\n", - "
\n", - "\n", - "BreachLock\n", - "
\n", - "
\n", - "Florida, NY\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • BreachLocks’s modern SaaS-based approach redefines the old school and time-consuming pen test model into fast and comprehensive security as service.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Data Engineer Summer Internship (REMOTE)\n", - "new

\n", - "
\n", - "
\n", - "\n", - "Vee Search\n", - "
\n", - "
\n", - "New York State\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • You will help drive business results through building a robust data engine to build business-critical, scalable, and robust data pipelines and intuitive data…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Python & JavaScript Developer\n", - "

\n", - "
\n", - "
\n", - "\n", - "\n", - "Media Storm, LLC\n", - "\n", - "\n", - "\n", - "3.3\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "New York, NY 10013 (SoHo area)\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • The candidate will join the growing R&D team to develop software and technologies to enabled media activation, optimization, and measurement products.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Alternative Data Research Analyst\n", - "new

\n", - "
\n", - "
\n", - "\n", - "Yewno\n", - "
\n", - "
\n", - "New York, NY 10020 (Midtown area)\n", - "
\n", - "
\n", - "
    \n", - "
  • Perform research and identify unique alternative datasets relevant to financial services applications; perform data validation further assessing validity,…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Data Technician (Full- or Part-Time)\n", - "

\n", - "
\n", - "
\n", - "\n", - "Thasos Group\n", - "
\n", - "
\n", - "New York, NY 10003 (Greenwich Village area)\n", - "
\n", - "
\n", - "
    \n", - "
  • The Data Technician role is an entry-level position suitable for someone looking to break into the startup/big data world and gain experience working with top…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Python Developer - Compliance\n", - "

\n", - "
\n", - "
\n", - "\n", - "\n", - "Hudson River Trading\n", - "\n", - "\n", - "\n", - "3.5\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "New York, NY 10005 (Financial District area)\n", - "
\n", - "
\n", - "
    \n", - "
  • We are looking for an up-and-coming developer who loves coding, enjoys taking on challenging problems, and wants to make an immediate and tangible impact.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Content Contributor: Deep Learning with TensorFlow\n", - "

\n", - "
\n", - "
\n", - "\n", - "\n", - "Codecademy\n", - "\n", - "\n", - "\n", - "4.2\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "New York State\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • The Subject Matter Expert provides the Codecademy Curriculum team with specialized, up-to-date, nuanced insight into the field of database engineering and how…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Subject Matter Expert: Deep Learning with TensorFlow\n", - "

\n", - "
\n", - "
\n", - "\n", - "\n", - "Codecademy\n", - "\n", - "\n", - "\n", - "4.2\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "New York State\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • The Subject Matter Expert provides the Codecademy Curriculum team with specialized, up-to-date, nuanced insight into the field of database engineering and how…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "Junior Front End / Full Stack Software Engineer\n", - "

\n", - "
\n", - "
\n", - "\n", - "Plectica\n", - "
\n", - "
\n", - "New York, NY\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • Systems Thinking is an increasingly popular method of analysis and problem solving, widely applicable in industry, government, not-for-profit, education,…
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "

\n", - "\n", - "2020 Enterprise Data Accelerated Talent Entry Program\n", - "new

\n", - "
\n", - "
\n", - "\n", - "\n", - "Bloomberg\n", - "\n", - "\n", - "\n", - "3.9\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
\n", - "
\n", - "New York, NY\n", - "
\n", - "
\n", - "
    \n", - "
  • We partner closely with our clients, taking time to understand their unique businesses and individual data and technology needs.
  • \n", - "
  • NET, Java, C++ and Python.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "\n", - "\n", - "
\n", - "Resume Resources\n", - "Resume Samples\n", - "- Resume Templates\n", - "
\n", - "
\n", - "Career Resources: \n", - "Career Explorer -\n", - "Salary Calculator\n", - "\n", - "
" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "results" ] @@ -1946,80 +132,27 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "jobs = results.find_all(\"div\", class_=\"result\")" + "jobs = results.find_all(\"div\", class_=\"card-content\")" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "10" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "len(jobs) # how many?" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "
\n", - "

\n", - "\n", - "Penetration Testing Trainee (Remote USA)\n", - "

\n", - "
\n", - "
\n", - "\n", - "BreachLock\n", - "
\n", - "
\n", - "Florida, NY\n", - "\n", - "Remote work available\n", - "
\n", - "
\n", - "
    \n", - "
  • BreachLocks’s modern SaaS-based approach redefines the old school and time-consuming pen test model into fast and comprehensive security as service.
  • \n", - "
\n", - "
\n", - "
\n", - "\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
\n", - "
" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "jobs[0] # let's check out just one of them" ] @@ -2035,90 +168,32 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "

\n", - "\n", - "Penetration Testing Trainee (Remote USA)\n", - "

" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "title = jobs[0].find(\"h2\")\n", - "title" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "\n", - "Penetration Testing Trainee (Remote USA)" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "title_link = title.find(\"a\")\n", - "title_link" + "title_element = jobs[0].find(\"h2\")\n", + "title_element" ] }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'\\nPenetration Testing Trainee (Remote USA)'" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "link_text = title_link.text\n", - "link_text" + "title = title_element.text\n", + "title" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'Penetration Testing Trainee (Remote USA)'" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "# clean it up\n", - "link_text.strip()" + "# clean it up - not necessary here, but often helpful to remove whitespace\n", + "title.strip()" ] }, { @@ -2130,38 +205,18 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "job_titles = [job.find(\"h2\").find(\"a\").text.strip() for job in jobs]" + "job_titles = [job.find(\"h2\").text.strip() for job in jobs]" ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['Penetration Testing Trainee (Remote USA)',\n", - " 'Data Engineer Summer Internship (REMOTE)',\n", - " 'Python & JavaScript Developer',\n", - " 'Alternative Data Research Analyst',\n", - " 'Data Technician (Full- or Part-Time)',\n", - " 'Python Developer - Compliance',\n", - " 'Content Contributor: Deep Learning with TensorFlow',\n", - " 'Subject Matter Expert: Deep Learning with TensorFlow',\n", - " 'Junior Front End / Full Stack Software Engineer',\n", - " '2020 Enterprise Data Accelerated Talent Entry Program']" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "job_titles" ] @@ -2177,71 +232,21 @@ }, { "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "\n", - "Penetration Testing Trainee (Remote USA)" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "title_link" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'/rc/clk?jk=487b30db63184515&fccid=bf0600f0f252b45b&vjs=3'" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "title_link[\"href\"]" - ] - }, - { - "cell_type": "markdown", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "That's a **relative link**. In order to be able to access the resource, you will need to assemble the absolute URL." + "apply_link = jobs[0].find(\"a\", text=\"Apply\")\n", + "apply_link" ] }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'https://www.indeed.com/rc/clk?jk=487b30db63184515&fccid=bf0600f0f252b45b&vjs=3'" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "base_url = \"https://www.indeed.com\"\n", - "job_url = base_url + title_link[\"href\"]\n", + "job_url = apply_link[\"href\"]\n", "job_url" ] }, @@ -2254,7 +259,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -2264,25 +269,13 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": null, "metadata": { - "collapsed": true, "jupyter": { "outputs_hidden": true } }, - "outputs": [ - { - "data": { - "text/plain": [ - "'\\n\\nPenetration Testing Trainee (Remote USA) - Florida, NY - Indeed.com\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nFind jobsCompany reviewsFind salariesUpload your resumeSign inEmployers / Post Job\\n\\nWhatWhereFind JobsAdvanced Job SearchPenetration Testing Trainee (Remote USA)BreachLock-Florida, NYRemoteOtherWho are we?\\nBreachLock is a security startup that offers a unique SaaS platform delivering on-demand, continuous, and scalable security testing suitable for modern cloud and DevOps powered businesses. The BreachLock platform leverages both human-powered penetration testing and AI-powered automated scans to create a powerful and easy to use solution that delivers continuous and on-demand vulnerability management. BreachLocks’s modern SaaS-based approach redefines the old school and time-consuming pen test model into fast and comprehensive security as service. As a result, CIO’s and CISO’s get a single pane view into their application and network security posture. The BreachLock platform facilitates collaboration between your DevOps and BreachLock security researchers, empowering them to fix security gaps at the speed of business.\\nSome of our achievements include:\\nOne of the fastest-growing SaaS companies in Cyber Security\\nCyber Security Innovator for Analysis and Testing category 2019 – SC Magazine\\nTop 10 Vulnerability Management Solution for 2019 – Enterprise Security Magazine\\nMost promising Cyber Security startup 2019 – CIO Review\\nCyber Security Innovator for the year 2019 – Mirror Review\\nTop 10 Vulnerability Assessment vendor in Gartner Peerinsights\\nWho you are\\nYou want to work with global leaders in Cyber Security\\nHave a passion for various disciplines of Cyber Security\\nHave track record that proves you have invested time in research and learning about security via:\\nWriting blogs, articles, research papers\\nBug bounty\\nAttended training related to Cyber Security\\nHave certifications like CEH or ISO 27001\\nHave developed projects using AI, Machine Learning or Machine Learning technologies\\nAbout the opportunity\\nFunctions you will perform may include one or more of the following:\\nTechnical writing\\nSecurity analysis\\nDocumentation\\nManual Testing of inhouse products\\nSecurity Research\\nCompetitor analysis and Testing of various security products\\nPenetration Testing\\nVulnerability scanning\\nPython development\\nAI, Machine Learning research\\nYou are based in the United States and can work without additional sponsorship/VISA\\nThis is a telecommute and remote position\\nYou work on a flexible schedule and manage your deliverables\\nYou have a choice to work on a three to five days per week schedule\\nYou get support from our security experts to learn our processes and perform your day to day activities\\nYou will get an opportunity to test your limits in this promising startup\\nYou will be working alongside international experts\\nIndustry-standard financial benefits\\nStrong career prospects in an early-stage startup\\nYou get a chance to continue as an employee based on your performance as a traineeBreachLock - 30+ days ago - save jobreport job - original jobApply NowApply On Company SiteSave this jobShare this jobJobs at BreachLock in Florida, NYCompany InfoFollowGet job updates from BreachLockLet employers find youThousands of employers search for candidates on IndeedUpload your resumeHiring LabCareer AdviceBrowse JobsBrowse CompaniesSalariesFind CertificationsIndeed EventsWork at IndeedCountriesAboutHelp Center© 2020 IndeedDo Not Sell My Personal InformationPrivacy CenterCookies, Privacy and TermsLet Employers Find YouUpload Your Resume\\n\\n\\n\\n\\n\\n\\n'" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "job_soup.text" ] @@ -2299,7 +292,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -2313,7 +306,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.0" + "version": "3.11.0" } }, "nbformat": 4, diff --git a/build-a-web-scraper/04_pipeline.ipynb b/build-a-web-scraper/04_pipeline.ipynb index 256748db9c..d12e68dc35 100644 --- a/build-a-web-scraper/04_pipeline.ipynb +++ b/build-a-web-scraper/04_pipeline.ipynb @@ -12,15 +12,28 @@ "- Target & Save Specific Information You Want" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## ⚠️ Durabilty Warning ⚠️\n", + "\n", + "Like [mentioned in the course](https://realpython.com/lessons/challenge-of-durability/), websites frequently change. Unfortunately the job board that you'll see in the course, indeed.com, has started to block scraping of their site since the recording of the course.\n", + "\n", + "Just like in the associated written tutorial on [web scraping with beautiful soup](https://realpython.com/beautiful-soup-web-scraper-python/#scrape-the-fake-python-job-site), you can instead use [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) to practice scraping a static website.\n", + "\n", + "All the concepts discussed in the course lessons are still accurate. Translating what you see onto a different website will be a good learning opportunity where you'll have to synthesize the information and apply it practically." + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Your Tasks:\n", "\n", - "- Scrape the first 100 available search results\n", + "- Scrape all 100 available job postings\n", "- Generalize your code to allow searching for different locations/jobs\n", - "- Pick out information about the URL, job title, and job location\n", + "- Pick out information about the apply URL, job title, and job location\n", "- Save the results to a file" ] }, @@ -40,8 +53,7 @@ "source": [ "### Part 1: Inspect\n", "\n", - "- How do the URLs change when you navigate to the next results page?\n", - "- How do the URLs change when you use a different location and/or job title search?\n", + "- How do the URLs change when you navigate to a job detail?\n", "- Which HTML elements contain the link, title, and location of each job?" ] }, @@ -58,8 +70,9 @@ "source": [ "### Part 2: Scrape\n", "\n", - "- Build the code to fetch the first 100 search results. This means you will need to automatically navigate to multiple results pages\n", - "- Write functions that allow you to specify the job title, location, and amount of results as arguments" + "- Build the code to fetch all 100 available job postings.\n", + "- Write functions that allow you to specify the job title, location, and amount of results as arguments\n", + "- Also fetch the information provided on each job details page. For this, you'll need to automatically follow URLs that you've fetched when getting the job postings." ] }, { @@ -75,8 +88,9 @@ "source": [ "### Part 3: Parse\n", "\n", - "- Sieve through your HTML soup to pick out only the job title, link, and location\n", - "- Format the results in a readable format (e.g. JSON)\n", + "- Sieve through your HTML soup to pick out only the job title, link, and location from the main page\n", + "- Sieve through the HTML of each details page to get the job description and combine it with the other information\n", + "- Format the results in a readable format (e.g. JSON, TXT, TOML, ...)\n", "- Save the results to a file" ] }, @@ -90,7 +104,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -104,7 +118,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.0" + "version": "3.11.0" } }, "nbformat": 4, diff --git a/build-a-web-scraper/05_pipeline_solution.ipynb b/build-a-web-scraper/05_pipeline_solution.ipynb deleted file mode 100644 index 359b992aa8..0000000000 --- a/build-a-web-scraper/05_pipeline_solution.ipynb +++ /dev/null @@ -1,500 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Possible Solution: Build A Pipeline\n", - "\n", - "- Combine Your Knowledge of the Website, `requests` and `bs4`\n", - "- Automate Your Scraping Process Across Multiple Pages\n", - "- Generalize Your Code For Varying Searches\n", - "- Target & Save Specific Information You Want" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Your Tasks:\n", - "\n", - "- Scrape the first 100 available search results\n", - "- Generalize your code to allow searching for different locations/jobs\n", - "- Pick out information about the URL, job title, and job location\n", - "- Save the results to a file" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "from bs4 import BeautifulSoup" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "### Part 1: Inspect\n", - "\n", - "- How do the URLs change when you navigate to the next results page?\n", - "- How do the URLs change when you use a different location and/or job title search?\n", - "- Which HTML elements contain the link, title, and location of each job?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Next Page**: The `start=` parameter gets added and incremented by the value of `10` for each additional page. This is because each results page displays 10 job results.\n", - "\n", - "E.g.: " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Different Location/Job Title**: The values for the query parameters `q` (for job title) and `l` (for location) change accordingly." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "page = requests.get(\"https://www.indeed.com/jobs?q=python&l=new+york\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**HTML Elements**: A single job posting lives inside of a `div` element with the class name `result`. Inside there are other elements. You can find the specific info you're looking for here:\n", - "\n", - "- **Link**: In the `href` attribute of the `` Element that is a child of the title `

` element\n", - "- **Title**: The text of the link in the `

` element which also contains the link URL mentioned above\n", - "- **Location**: A `` element with the telling class name `location`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "### Part 2: Scrape\n", - "\n", - "- Build the code to fetch the first 100 search results. This means you will need to automatically navigate to multiple results pages\n", - "- Write functions that allow you to specify the job title, location, and amount of results as arguments" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "page_2 = requests.get(\n", - " \"https://www.indeed.com/jobs?q=python&l=new+york&start=20\"\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Every 10 results means you're on a new page. Let's make that an argument to a function:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def get_jobs(page=1):\n", - " \"\"\"Fetches the HTML from a search for Python jobs in New York on Indeed.com from a specified page.\"\"\"\n", - " base_url_indeed = \"https://www.indeed.com/jobs?q=python&l=new+york&start=\"\n", - " results_start_num = page * 10\n", - " url = f\"{base_url_indeed}{results_start_num}\"\n", - " page = requests.get(url)\n", - " return page" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "get_jobs(3)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "get_jobs(4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Great! Let's customize this function some more to allow for different search queries and search locations:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def get_jobs(title, location, page=1):\n", - " \"\"\"Fetches the HTML from a search for Python jobs in New York on Indeed.com from a specified page.\"\"\"\n", - " loc = location.replace(\" \", \"+\") # for multi-part locations\n", - " base_url_indeed = f\"https://www.indeed.com/jobs?q={title}&l={loc}&start=\"\n", - " results_start_num = page * 10\n", - " url = f\"{base_url_indeed}{results_start_num}\"\n", - " page = requests.get(url)\n", - " return page" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "get_jobs(\"python\", \"new york\", 3)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "With a generalized way of scraping the page done, you can move on to picking out the information you need by parsing the HTML." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "### Part 3: Parse\n", - "\n", - "- Sieve through your HTML soup to pick out only the job title, link, and location\n", - "- Format the results in a readable format (e.g. JSON)\n", - "- Save the results to a file" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's start by getting access to all interesting search results for one page:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "site = get_jobs(\"python\", \"new york\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "soup = BeautifulSoup(site.content)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "results = soup.find(id=\"resultsCol\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "jobs = results.find_all(\"div\", class_=\"result\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Job Titles** can be found like this:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "job_titles = [job.find(\"h2\").find(\"a\").text.strip() for job in jobs]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "job_titles" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Link URLs** need to be assembled, and can be found like this:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "base_url = \"https://www.indeed.com\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "job_links = [base_url + job.find(\"h2\").find(\"a\")[\"href\"] for job in jobs]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "job_links" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Locations** can be picked out of the soup by their class name:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "job_locations = [job.find(class_=\"location\").text for job in jobs]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "job_locations" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's assemble all this info into a function, so you can pick out the pieces and save them to a useful data structure:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def parse_info(soup):\n", - " \"\"\"\n", - " Parses HTML containing job postings and picks out job title, location, and link.\n", - "\n", - " args:\n", - " soup (BeautifulSoup object): A parsed bs4.BeautifulSoup object of a search results page on indeed.com\n", - "\n", - " returns:\n", - " job_list (list): A list of dictionaries containing the title, link, and location of each job posting\n", - " \"\"\"\n", - " results = soup.find(id=\"resultsCol\")\n", - " jobs = results.find_all(\"div\", class_=\"result\")\n", - " base_url = \"https://www.indeed.com\"\n", - "\n", - " job_list = list()\n", - " for job in jobs:\n", - " title = job.find(\"h2\").find(\"a\").text.strip()\n", - " link = base_url + job.find(\"h2\").find(\"a\")[\"href\"]\n", - " location = job.find(class_=\"location\").text\n", - " job_list.append({\"title\": title, \"link\": link, \"location\": location})\n", - "\n", - " return job_list" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's give it a try:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "page = get_jobs(\"python\", \"new_york\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "soup = BeautifulSoup(page.content)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "results = parse_info(soup)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "results" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And let's add a final step of generalization:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def get_job_listings(title, location, amount=100):\n", - " results = list()\n", - " for page in range(amount // 10):\n", - " site = get_jobs(title, location, page=page)\n", - " soup = BeautifulSoup(site.content)\n", - " page_results = parse_info(soup)\n", - " results += page_results\n", - " return results" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "r = get_job_listings(\"python\", \"new york\", 100)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "len(r)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "r[42]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "\n", - "### Keep Expanding!\n", - "\n", - "Currently you are only fetching the title, link and location of the job. Change that to get also get the **company name**. Maybe you also want to know the beginning of the **text blurb** what the job is about? You could also build this script out to follow the links you gathered and fetch the individual job listing details pages for even more information.\n", - "\n", - "The sky is the limit, and the more you train, the better you will get at this. :)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.0" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/build-a-web-scraper/Pipfile b/build-a-web-scraper/Pipfile deleted file mode 100644 index 2c619bf39c..0000000000 --- a/build-a-web-scraper/Pipfile +++ /dev/null @@ -1,13 +0,0 @@ -[[source]] -name = "pypi" -url = "https://pypi.org/simple" -verify_ssl = true - -[dev-packages] - -[packages] -jupyterlab = "*" -beautifulsoup4 = "*" - -[requires] -python_version = "3.8" diff --git a/build-a-web-scraper/Pipfile.lock b/build-a-web-scraper/Pipfile.lock deleted file mode 100644 index fe954558c0..0000000000 --- a/build-a-web-scraper/Pipfile.lock +++ /dev/null @@ -1,446 +0,0 @@ -{ - "_meta": { - "hash": { - "sha256": "f8fcc479f5f1691e915d5dffd570980b72616513c59d7d7bc083028d041b5241" - }, - "pipfile-spec": 6, - "requires": { - "python_version": "3.8" - }, - "sources": [ - { - "name": "pypi", - "url": "https://pypi.org/simple", - "verify_ssl": true - } - ] - }, - "default": { - "appnope": { - "hashes": [ - "sha256:5b26757dc6f79a3b7dc9fab95359328d5747fcb2409d331ea66d0272b90ab2a0", - "sha256:8b995ffe925347a2138d7ac0fe77155e4311a0ea6d6da4f5128fe4b3cbe5ed71" - ], - "markers": "sys_platform == 'darwin'", - "version": "==0.1.0" - }, - "attrs": { - "hashes": [ - "sha256:08a96c641c3a74e44eb59afb61a24f2cb9f4d7188748e76ba4bb5edfa3cb7d1c", - "sha256:f7b7ce16570fe9965acd6d30101a28f62fb4a7f9e926b3bbc9b61f8b04247e72" - ], - "version": "==19.3.0" - }, - "backcall": { - "hashes": [ - "sha256:38ecd85be2c1e78f77fd91700c76e14667dc21e2713b63876c0eb901196e01e4", - "sha256:bbbf4b1e5cd2bdb08f915895b51081c041bac22394fdfcfdfbe9f14b77c08bf2" - ], - "version": "==0.1.0" - }, - "beautifulsoup4": { - "hashes": [ - "sha256:73cc4d115b96f79c7d77c1c7f7a0a8d4c57860d1041df407dd1aae7f07a77fd7", - "sha256:a6237df3c32ccfaee4fd201c8f5f9d9df619b93121d01353a64a73ce8c6ef9a8", - "sha256:e718f2342e2e099b640a34ab782407b7b676f47ee272d6739e60b8ea23829f2c" - ], - "index": "pypi", - "version": "==4.9.1" - }, - "bleach": { - "hashes": [ - "sha256:2bce3d8fab545a6528c8fa5d9f9ae8ebc85a56da365c7f85180bfe96a35ef22f", - "sha256:3c4c520fdb9db59ef139915a5db79f8b51bc2a7257ea0389f30c846883430a4b" - ], - "version": "==3.1.5" - }, - "certifi": { - "hashes": [ - "sha256:5ad7e9a056d25ffa5082862e36f119f7f7cec6457fa07ee2f8c339814b80c9b1", - "sha256:9cd41137dc19af6a5e03b630eefe7d1f458d964d406342dd3edf625839b944cc" - ], - "version": "==2020.4.5.2" - }, - "chardet": { - "hashes": [ - "sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae", - "sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691" - ], - "version": "==3.0.4" - }, - "decorator": { - "hashes": [ - "sha256:41fa54c2a0cc4ba648be4fd43cff00aedf5b9465c9bf18d64325bc225f08f760", - "sha256:e3a62f0520172440ca0dcc823749319382e377f37f140a0b99ef45fecb84bfe7" - ], - "version": "==4.4.2" - }, - "defusedxml": { - "hashes": [ - "sha256:6687150770438374ab581bb7a1b327a847dd9c5749e396102de3fad4e8a3ef93", - "sha256:f684034d135af4c6cbb949b8a4d2ed61634515257a67299e5f940fbaa34377f5" - ], - "version": "==0.6.0" - }, - "entrypoints": { - "hashes": [ - "sha256:589f874b313739ad35be6e0cd7efde2a4e9b6fea91edcc34e58ecbb8dbe56d19", - "sha256:c70dd71abe5a8c85e55e12c19bd91ccfeec11a6e99044204511f9ed547d48451" - ], - "version": "==0.3" - }, - "idna": { - "hashes": [ - "sha256:7588d1c14ae4c77d74036e8c22ff447b26d0fde8f007354fd48a7814db15b7cb", - "sha256:a068a21ceac8a4d63dbfd964670474107f541babbd2250d61922f029858365fa" - ], - "version": "==2.9" - }, - "ipykernel": { - "hashes": [ - "sha256:731adb3f2c4ebcaff52e10a855ddc87670359a89c9c784d711e62d66fccdafae", - "sha256:a8362e3ae365023ca458effe93b026b8cdadc0b73ff3031472128dd8a2cf0289" - ], - "version": "==5.3.0" - }, - "ipython": { - "hashes": [ - "sha256:0ef1433879816a960cd3ae1ae1dc82c64732ca75cec8dab5a4e29783fb571d0e", - "sha256:1b85d65632211bf5d3e6f1406f3393c8c429a47d7b947b9a87812aa5bce6595c" - ], - "version": "==7.15.0" - }, - "ipython-genutils": { - "hashes": [ - "sha256:72dd37233799e619666c9f639a9da83c34013a73e8bbc79a7a6348d93c61fab8", - "sha256:eb2e116e75ecef9d4d228fdc66af54269afa26ab4463042e33785b887c628ba8" - ], - "version": "==0.2.0" - }, - "jedi": { - "hashes": [ - "sha256:cd60c93b71944d628ccac47df9a60fec53150de53d42dc10a7fc4b5ba6aae798", - "sha256:df40c97641cb943661d2db4c33c2e1ff75d491189423249e989bcea4464f3030" - ], - "version": "==0.17.0" - }, - "jinja2": { - "hashes": [ - "sha256:89aab215427ef59c34ad58735269eb58b1a5808103067f7bb9d5836c651b3bb0", - "sha256:f0a4641d3cf955324a89c04f3d94663aa4d638abe8f733ecd3582848e1c37035" - ], - "version": "==2.11.2" - }, - "json5": { - "hashes": [ - "sha256:703cfee540790576b56a92e1c6aaa6c4b0d98971dc358ead83812aa4d06bdb96", - "sha256:af1a1b9a2850c7f62c23fde18be4749b3599fd302f494eebf957e2ada6b9e42c" - ], - "version": "==0.9.5" - }, - "jsonschema": { - "hashes": [ - "sha256:4e5b3cf8216f577bee9ce139cbe72eca3ea4f292ec60928ff24758ce626cd163", - "sha256:c8a85b28d377cc7737e46e2d9f2b4f44ee3c0e1deac6bf46ddefc7187d30797a" - ], - "version": "==3.2.0" - }, - "jupyter-client": { - "hashes": [ - "sha256:3a32fa4d0b16d1c626b30c3002a62dfd86d6863ed39eaba3f537fade197bb756", - "sha256:cde8e83aab3ec1c614f221ae54713a9a46d3bf28292609d2db1b439bef5a8c8e" - ], - "version": "==6.1.3" - }, - "jupyter-core": { - "hashes": [ - "sha256:394fd5dd787e7c8861741880bdf8a00ce39f95de5d18e579c74b882522219e7e", - "sha256:a4ee613c060fe5697d913416fc9d553599c05e4492d58fac1192c9a6844abb21" - ], - "version": "==4.6.3" - }, - "jupyterlab": { - "hashes": [ - "sha256:7b5bd4a05330a01c8522ee7f1cda5cb2e0d96412d9e1e879a19b3afb63d4ac69", - "sha256:8eb5920f2ac1aaa98d5e24da6c5909e4ad91a2ee6712d756318572069f715b4d" - ], - "index": "pypi", - "version": "==2.1.4" - }, - "jupyterlab-server": { - "hashes": [ - "sha256:3398e401b95da868bc96bdaa44fa61252bf3e68fc9dd1645bd93293cce095f6c", - "sha256:ee62690778c90b07a62a9bc5e6f530eebe8cd7550a0ef0bd1363b1f2380e1797" - ], - "version": "==1.1.5" - }, - "markupsafe": { - "hashes": [ - "sha256:00bc623926325b26bb9605ae9eae8a215691f33cae5df11ca5424f06f2d1f473", - "sha256:09027a7803a62ca78792ad89403b1b7a73a01c8cb65909cd876f7fcebd79b161", - "sha256:09c4b7f37d6c648cb13f9230d847adf22f8171b1ccc4d5682398e77f40309235", - "sha256:1027c282dad077d0bae18be6794e6b6b8c91d58ed8a8d89a89d59693b9131db5", - "sha256:13d3144e1e340870b25e7b10b98d779608c02016d5184cfb9927a9f10c689f42", - "sha256:24982cc2533820871eba85ba648cd53d8623687ff11cbb805be4ff7b4c971aff", - "sha256:29872e92839765e546828bb7754a68c418d927cd064fd4708fab9fe9c8bb116b", - "sha256:43a55c2930bbc139570ac2452adf3d70cdbb3cfe5912c71cdce1c2c6bbd9c5d1", - "sha256:46c99d2de99945ec5cb54f23c8cd5689f6d7177305ebff350a58ce5f8de1669e", - "sha256:500d4957e52ddc3351cabf489e79c91c17f6e0899158447047588650b5e69183", - "sha256:535f6fc4d397c1563d08b88e485c3496cf5784e927af890fb3c3aac7f933ec66", - "sha256:596510de112c685489095da617b5bcbbac7dd6384aeebeda4df6025d0256a81b", - "sha256:62fe6c95e3ec8a7fad637b7f3d372c15ec1caa01ab47926cfdf7a75b40e0eac1", - "sha256:6788b695d50a51edb699cb55e35487e430fa21f1ed838122d722e0ff0ac5ba15", - "sha256:6dd73240d2af64df90aa7c4e7481e23825ea70af4b4922f8ede5b9e35f78a3b1", - "sha256:717ba8fe3ae9cc0006d7c451f0bb265ee07739daf76355d06366154ee68d221e", - "sha256:79855e1c5b8da654cf486b830bd42c06e8780cea587384cf6545b7d9ac013a0b", - "sha256:7c1699dfe0cf8ff607dbdcc1e9b9af1755371f92a68f706051cc8c37d447c905", - "sha256:88e5fcfb52ee7b911e8bb6d6aa2fd21fbecc674eadd44118a9cc3863f938e735", - "sha256:8defac2f2ccd6805ebf65f5eeb132adcf2ab57aa11fdf4c0dd5169a004710e7d", - "sha256:98c7086708b163d425c67c7a91bad6e466bb99d797aa64f965e9d25c12111a5e", - "sha256:9add70b36c5666a2ed02b43b335fe19002ee5235efd4b8a89bfcf9005bebac0d", - "sha256:9bf40443012702a1d2070043cb6291650a0841ece432556f784f004937f0f32c", - "sha256:ade5e387d2ad0d7ebf59146cc00c8044acbd863725f887353a10df825fc8ae21", - "sha256:b00c1de48212e4cc9603895652c5c410df699856a2853135b3967591e4beebc2", - "sha256:b1282f8c00509d99fef04d8ba936b156d419be841854fe901d8ae224c59f0be5", - "sha256:b2051432115498d3562c084a49bba65d97cf251f5a331c64a12ee7e04dacc51b", - "sha256:ba59edeaa2fc6114428f1637ffff42da1e311e29382d81b339c1817d37ec93c6", - "sha256:c8716a48d94b06bb3b2524c2b77e055fb313aeb4ea620c8dd03a105574ba704f", - "sha256:cd5df75523866410809ca100dc9681e301e3c27567cf498077e8551b6d20e42f", - "sha256:cdb132fc825c38e1aeec2c8aa9338310d29d337bebbd7baa06889d09a60a1fa2", - "sha256:e249096428b3ae81b08327a63a485ad0878de3fb939049038579ac0ef61e17e7", - "sha256:e8313f01ba26fbbe36c7be1966a7b7424942f670f38e666995b88d012765b9be" - ], - "version": "==1.1.1" - }, - "mistune": { - "hashes": [ - "sha256:59a3429db53c50b5c6bcc8a07f8848cb00d7dc8bdb431a4ab41920d201d4756e", - "sha256:88a1051873018da288eee8538d476dffe1262495144b33ecb586c4ab266bb8d4" - ], - "version": "==0.8.4" - }, - "nbconvert": { - "hashes": [ - "sha256:21fb48e700b43e82ba0e3142421a659d7739b65568cc832a13976a77be16b523", - "sha256:f0d6ec03875f96df45aa13e21fd9b8450c42d7e1830418cccc008c0df725fcee" - ], - "version": "==5.6.1" - }, - "nbformat": { - "hashes": [ - "sha256:049af048ed76b95c3c44043620c17e56bc001329e07f83fec4f177f0e3d7b757", - "sha256:276343c78a9660ab2a63c28cc33da5f7c58c092b3f3a40b6017ae2ce6689320d" - ], - "version": "==5.0.6" - }, - "notebook": { - "hashes": [ - "sha256:3edc616c684214292994a3af05eaea4cc043f6b4247d830f3a2f209fa7639a80", - "sha256:47a9092975c9e7965ada00b9a20f0cf637d001db60d241d479f53c0be117ad48" - ], - "version": "==6.0.3" - }, - "packaging": { - "hashes": [ - "sha256:4357f74f47b9c12db93624a82154e9b120fa8293699949152b22065d556079f8", - "sha256:998416ba6962ae7fbd6596850b80e17859a5753ba17c32284f67bfff33784181" - ], - "version": "==20.4" - }, - "pandocfilters": { - "hashes": [ - "sha256:b3dd70e169bb5449e6bc6ff96aea89c5eea8c5f6ab5e207fc2f521a2cf4a0da9" - ], - "version": "==1.4.2" - }, - "parso": { - "hashes": [ - "sha256:158c140fc04112dc45bca311633ae5033c2c2a7b732fa33d0955bad8152a8dd0", - "sha256:908e9fae2144a076d72ae4e25539143d40b8e3eafbaeae03c1bfe226f4cdf12c" - ], - "version": "==0.7.0" - }, - "pexpect": { - "hashes": [ - "sha256:0b48a55dcb3c05f3329815901ea4fc1537514d6ba867a152b581d69ae3710937", - "sha256:fc65a43959d153d0114afe13997d439c22823a27cefceb5ff35c2178c6784c0c" - ], - "markers": "sys_platform != 'win32'", - "version": "==4.8.0" - }, - "pickleshare": { - "hashes": [ - "sha256:87683d47965c1da65cdacaf31c8441d12b8044cdec9aca500cd78fc2c683afca", - "sha256:9649af414d74d4df115d5d718f82acb59c9d418196b7b4290ed47a12ce62df56" - ], - "version": "==0.7.5" - }, - "prometheus-client": { - "hashes": [ - "sha256:983c7ac4b47478720db338f1491ef67a100b474e3bc7dafcbaefb7d0b8f9b01c", - "sha256:c6e6b706833a6bd1fd51711299edee907857be10ece535126a158f911ee80915" - ], - "version": "==0.8.0" - }, - "prompt-toolkit": { - "hashes": [ - "sha256:563d1a4140b63ff9dd587bda9557cffb2fe73650205ab6f4383092fb882e7dc8", - "sha256:df7e9e63aea609b1da3a65641ceaf5bc7d05e0a04de5bd45d05dbeffbabf9e04" - ], - "version": "==3.0.5" - }, - "ptyprocess": { - "hashes": [ - "sha256:923f299cc5ad920c68f2bc0bc98b75b9f838b93b599941a6b63ddbc2476394c0", - "sha256:d7cc528d76e76342423ca640335bd3633420dc1366f258cb31d05e865ef5ca1f" - ], - "markers": "os_name != 'nt'", - "version": "==0.6.0" - }, - "pygments": { - "hashes": [ - "sha256:647344a061c249a3b74e230c739f434d7ea4d8b1d5f3721bc0f3558049b38f44", - "sha256:ff7a40b4860b727ab48fad6360eb351cc1b33cbf9b15a0f689ca5353e9463324" - ], - "version": "==2.6.1" - }, - "pyparsing": { - "hashes": [ - "sha256:c203ec8783bf771a155b207279b9bccb8dea02d8f0c9e5f8ead507bc3246ecc1", - "sha256:ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b" - ], - "version": "==2.4.7" - }, - "pyrsistent": { - "hashes": [ - "sha256:28669905fe725965daa16184933676547c5bb40a5153055a8dee2a4bd7933ad3" - ], - "version": "==0.16.0" - }, - "python-dateutil": { - "hashes": [ - "sha256:73ebfe9dbf22e832286dafa60473e4cd239f8592f699aa5adaf10050e6e1823c", - "sha256:75bb3f31ea686f1197762692a9ee6a7550b59fc6ca3a1f4b5d7e32fb98e2da2a" - ], - "version": "==2.8.1" - }, - "pyzmq": { - "hashes": [ - "sha256:07fb8fe6826a229dada876956590135871de60dbc7de5a18c3bcce2ed1f03c98", - "sha256:13a5638ab24d628a6ade8f794195e1a1acd573496c3b85af2f1183603b7bf5e0", - "sha256:15b4cb21118f4589c4db8be4ac12b21c8b4d0d42b3ee435d47f686c32fe2e91f", - "sha256:21f7d91f3536f480cb2c10d0756bfa717927090b7fb863e6323f766e5461ee1c", - "sha256:2a88b8fabd9cc35bd59194a7723f3122166811ece8b74018147a4ed8489e6421", - "sha256:342fb8a1dddc569bc361387782e8088071593e7eaf3e3ecf7d6bd4976edff112", - "sha256:4ee0bfd82077a3ff11c985369529b12853a4064320523f8e5079b630f9551448", - "sha256:54aa24fd60c4262286fc64ca632f9e747c7cc3a3a1144827490e1dc9b8a3a960", - "sha256:58688a2dfa044fad608a8e70ba8d019d0b872ec2acd75b7b5e37da8905605891", - "sha256:5b99c2ae8089ef50223c28bac57510c163bfdff158c9e90764f812b94e69a0e6", - "sha256:5b9d21fc56c8aacd2e6d14738021a9d64f3f69b30578a99325a728e38a349f85", - "sha256:5f1f2eb22aab606f808163eb1d537ac9a0ba4283fbeb7a62eb48d9103cf015c2", - "sha256:6ca519309703e95d55965735a667809bbb65f52beda2fdb6312385d3e7a6d234", - "sha256:87c78f6936e2654397ca2979c1d323ee4a889eef536cc77a938c6b5be33351a7", - "sha256:8952f6ba6ae598e792703f3134af5a01af8f5c7cf07e9a148f05a12b02412cea", - "sha256:931339ac2000d12fe212e64f98ce291e81a7ec6c73b125f17cf08415b753c087", - "sha256:956775444d01331c7eb412c5fb9bb62130dfaac77e09f32764ea1865234e2ca9", - "sha256:97b6255ae77328d0e80593681826a0479cb7bac0ba8251b4dd882f5145a2293a", - "sha256:aaa8b40b676576fd7806839a5de8e6d5d1b74981e6376d862af6c117af2a3c10", - "sha256:af0c02cf49f4f9eedf38edb4f3b6bb621d83026e7e5d76eb5526cc5333782fd6", - "sha256:b08780e3a55215873b3b8e6e7ca8987f14c902a24b6ac081b344fd430d6ca7cd", - "sha256:ba6f24431b569aec674ede49cad197cad59571c12deed6ad8e3c596da8288217", - "sha256:bafd651b557dd81d89bd5f9c678872f3e7b7255c1c751b78d520df2caac80230", - "sha256:bfff5ffff051f5aa47ba3b379d87bd051c3196b0c8a603e8b7ed68a6b4f217ec", - "sha256:cf5d689ba9513b9753959164cf500079383bc18859f58bf8ce06d8d4bef2b054", - "sha256:dcbc3f30c11c60d709c30a213dc56e88ac016fe76ac6768e64717bd976072566", - "sha256:f9d7e742fb0196992477415bb34366c12e9bb9a0699b8b3f221ff93b213d7bec", - "sha256:faee2604f279d31312bc455f3d024f160b6168b9c1dde22bf62d8c88a4deca8e" - ], - "version": "==19.0.1" - }, - "requests": { - "hashes": [ - "sha256:43999036bfa82904b6af1d99e4882b560e5e2c68e5c4b0aa03b655f3d7d73fee", - "sha256:b3f43d496c6daba4493e7c431722aeb7dbc6288f52a6e04e7b6023b0247817e6" - ], - "version": "==2.23.0" - }, - "send2trash": { - "hashes": [ - "sha256:60001cc07d707fe247c94f74ca6ac0d3255aabcb930529690897ca2a39db28b2", - "sha256:f1691922577b6fa12821234aeb57599d887c4900b9ca537948d2dac34aea888b" - ], - "version": "==1.5.0" - }, - "six": { - "hashes": [ - "sha256:30639c035cdb23534cd4aa2dd52c3bf48f06e5f4a941509c8bafd8ce11080259", - "sha256:8b74bedcbbbaca38ff6d7491d76f2b06b3592611af620f8426e82dddb04a5ced" - ], - "version": "==1.15.0" - }, - "soupsieve": { - "hashes": [ - "sha256:1634eea42ab371d3d346309b93df7870a88610f0725d47528be902a0d95ecc55", - "sha256:a59dc181727e95d25f781f0eb4fd1825ff45590ec8ff49eadfd7f1a537cc0232" - ], - "version": "==2.0.1" - }, - "terminado": { - "hashes": [ - "sha256:4804a774f802306a7d9af7322193c5390f1da0abb429e082a10ef1d46e6fb2c2", - "sha256:a43dcb3e353bc680dd0783b1d9c3fc28d529f190bc54ba9a229f72fe6e7a54d7" - ], - "version": "==0.8.3" - }, - "testpath": { - "hashes": [ - "sha256:60e0a3261c149755f4399a1fff7d37523179a70fdc3abdf78de9fc2604aeec7e", - "sha256:bfcf9411ef4bf3db7579063e0546938b1edda3d69f4e1fb8756991f5951f85d4" - ], - "version": "==0.4.4" - }, - "tornado": { - "hashes": [ - "sha256:0fe2d45ba43b00a41cd73f8be321a44936dc1aba233dee979f17a042b83eb6dc", - "sha256:22aed82c2ea340c3771e3babc5ef220272f6fd06b5108a53b4976d0d722bcd52", - "sha256:2c027eb2a393d964b22b5c154d1a23a5f8727db6fda837118a776b29e2b8ebc6", - "sha256:5217e601700f24e966ddab689f90b7ea4bd91ff3357c3600fa1045e26d68e55d", - "sha256:5618f72e947533832cbc3dec54e1dffc1747a5cb17d1fd91577ed14fa0dc081b", - "sha256:5f6a07e62e799be5d2330e68d808c8ac41d4a259b9cea61da4101b83cb5dc673", - "sha256:c58d56003daf1b616336781b26d184023ea4af13ae143d9dda65e31e534940b9", - "sha256:c952975c8ba74f546ae6de2e226ab3cc3cc11ae47baf607459a6728585bb542a", - "sha256:c98232a3ac391f5faea6821b53db8db461157baa788f5d6222a193e9456e1740" - ], - "version": "==6.0.4" - }, - "traitlets": { - "hashes": [ - "sha256:70b4c6a1d9019d7b4f6846832288f86998aa3b9207c6821f3578a6a6a467fe44", - "sha256:d023ee369ddd2763310e4c3eae1ff649689440d4ae59d7485eb4cfbbe3e359f7" - ], - "version": "==4.3.3" - }, - "urllib3": { - "hashes": [ - "sha256:3018294ebefce6572a474f0604c2021e33b3fd8006ecd11d62107a5d2a963527", - "sha256:88206b0eb87e6d677d424843ac5209e3fb9d0190d0ee169599165ec25e9d9115" - ], - "version": "==1.25.9" - }, - "wcwidth": { - "hashes": [ - "sha256:79375666b9954d4a1a10739315816324c3e73110af9d0e102d906fdb0aec009f", - "sha256:8c6b5b6ee1360b842645f336d9e5d68c55817c26d3050f46b235ef2bc650e48f" - ], - "version": "==0.2.4" - }, - "webencodings": { - "hashes": [ - "sha256:a0af1213f3c2226497a97e2b3aa01a7e4bee4f403f95be16fc9acd2947514a78", - "sha256:b36a1c245f2d304965eb4e0a82848379241dc04b865afcc4aab16748587e1923" - ], - "version": "==0.5.1" - } - }, - "develop": {} -} diff --git a/build-a-web-scraper/README.md b/build-a-web-scraper/README.md index 28e03e772e..533eab9c4b 100644 --- a/build-a-web-scraper/README.md +++ b/build-a-web-scraper/README.md @@ -1,16 +1,52 @@ # Code Repository for Web Scraping Course -This repository contains Jupyter Notebooks with code examples relating to the Real Python video course on Building a Web Scraper with `requests` and Beautiful Soup. +This repository contains Jupyter Notebooks with code examples relating to the Real Python video course on [Building a Web Scraper with `requests` and Beautiful Soup](https://realpython.com/courses/web-scraping-beautiful-soup/). -The notebooks 01-03 represent the **web scraping pipeline** discussed in the course: +## Setup + +Create and activate a virtual environment, then install `requests`, `beautifulsoup4`, and `jupyter`: + +```bash +$ python -m venv venv +$ source venv/bin/activate +# PS> venv\Scripts\activate # on Windows +(venv) $ python -m pip install -r requirements.txt +``` + +Once all the dependencies are installed, you can start the Jupyter notebook server: + +```bash +(venv) $ jupyter notebook +``` + +Now you can open the notebook that you want to work on. + +## Notebook Files + +The notebooks 01--03 represent the **web scraping pipeline** discussed in the course: - **Part 1: Inspect** `01_inspect.ipynb` - **Part 2: Scrape** `02_scrape.ipynb` - **Part 3: Parse** `03_parse.ipynb` -The notebooks 04-05 contain tasks to work on individually for each learner to keep practicing the discussed concepts and personalize the project for themselves: +The notebook 04 contains tasks to work on individually so you can keep practicing the discussed concepts and personalize the project for yourself: - **Tasks** `04_pipeline.ipynb` -- **Solution** `05_pipeline_solution.ipynb` -Attempt to build out your individual pipeline by yourself and use the solution document only if you get stuck. All the best, and keep learning! :) +Attempt to build out your individual pipeline by yourself. When you're done with the suggested practice website, try to repeat the process with a different website. All the best, and keep learning! :) + +## ⚠️ Durabilty Warning ⚠️ + +Like [mentioned in the course](https://realpython.com/lessons/challenge-of-durability/), websites frequently change. Unfortunately the job board that you'll see in the course, indeed.com, has started to block scraping of their site since the recording of the course. + +Just like in the associated written tutorial on [web scraping with beautiful soup](https://realpython.com/beautiful-soup-web-scraper-python/#scrape-the-fake-python-job-site), you can instead use [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) to practice scraping a static website. + +All the concepts discussed in the course lessons are still accurate. Translating what you see onto a different website will be a good learning opportunity where you'll have to synthesize the information and apply it practically. + +## About the Author + +Martin Breuss - Email: martin@realpython.com + +## License + +Distributed under the MIT license. See `LICENSE` in the root directory of this `materials` repo for more information. diff --git a/build-a-web-scraper/requirements.in b/build-a-web-scraper/requirements.in new file mode 100644 index 0000000000..1d6208858a --- /dev/null +++ b/build-a-web-scraper/requirements.in @@ -0,0 +1,3 @@ +jupyter +requests +beautifulsoup4 \ No newline at end of file diff --git a/build-a-web-scraper/requirements.txt b/build-a-web-scraper/requirements.txt index 41dca1cd1b..969d7ffaa5 100644 --- a/build-a-web-scraper/requirements.txt +++ b/build-a-web-scraper/requirements.txt @@ -1,51 +1,303 @@ -appnope==0.1.0 -attrs==19.3.0 -backcall==0.1.0 +# +# This file is autogenerated by pip-compile with Python 3.10 +# by the following command: +# +# pip-compile requirements.in +# +anyio==3.6.2 + # via jupyter-server +appnope==0.1.3 + # via + # ipykernel + # ipython +argon2-cffi==21.3.0 + # via + # jupyter-server + # nbclassic + # notebook +argon2-cffi-bindings==21.2.0 + # via argon2-cffi +arrow==1.2.3 + # via isoduration +asttokens==2.2.1 + # via stack-data +attrs==22.2.0 + # via jsonschema +backcall==0.2.0 + # via ipython beautifulsoup4==4.9.1 -bleach==3.1.5 + # via + # -r requirements.in + # nbconvert +bleach==5.0.1 + # via nbconvert certifi==2020.4.5.2 + # via requests +cffi==1.15.1 + # via argon2-cffi-bindings chardet==3.0.4 -decorator==4.4.2 -defusedxml==0.6.0 -entrypoints==0.3 + # via requests +comm==0.1.2 + # via ipykernel +debugpy==1.6.4 + # via ipykernel +decorator==5.1.1 + # via ipython +defusedxml==0.7.1 + # via nbconvert +entrypoints==0.4 + # via jupyter-client +executing==1.2.0 + # via stack-data +fastjsonschema==2.16.2 + # via nbformat +fqdn==1.5.1 + # via jsonschema idna==2.9 -ipykernel==5.3.0 -ipython==7.15.0 + # via + # anyio + # jsonschema + # requests +ipykernel==6.19.4 + # via + # ipywidgets + # jupyter + # jupyter-console + # nbclassic + # notebook + # qtconsole +ipython==8.7.0 + # via + # ipykernel + # ipywidgets + # jupyter-console ipython-genutils==0.2.0 -jedi==0.17.0 -Jinja2==2.11.2 -json5==0.9.5 -jsonschema==3.2.0 -jupyter-client==6.1.3 -jupyter-core==4.6.3 -jupyterlab==2.1.4 -jupyterlab-server==1.1.5 -MarkupSafe==1.1.1 -mistune==0.8.4 -nbconvert==5.6.1 -nbformat==5.0.6 -notebook==6.0.3 -packaging==20.4 -pandocfilters==1.4.2 -parso==0.7.0 + # via + # nbclassic + # notebook + # qtconsole +ipywidgets==8.0.3 + # via jupyter +isoduration==20.11.0 + # via jsonschema +jedi==0.18.2 + # via ipython +jinja2==3.1.2 + # via + # jupyter-server + # nbclassic + # nbconvert + # notebook +jsonpointer==2.3 + # via jsonschema +jsonschema[format-nongpl]==4.17.3 + # via + # jupyter-events + # nbformat +jupyter==1.0.0 + # via -r requirements.in +jupyter-client==7.4.8 + # via + # ipykernel + # jupyter-console + # jupyter-server + # nbclassic + # nbclient + # notebook + # qtconsole +jupyter-console==6.4.4 + # via jupyter +jupyter-core==5.1.0 + # via + # jupyter-client + # jupyter-server + # nbclassic + # nbclient + # nbconvert + # nbformat + # notebook + # qtconsole +jupyter-events==0.5.0 + # via jupyter-server +jupyter-server==2.0.2 + # via + # nbclassic + # notebook-shim +jupyter-server-terminals==0.4.3 + # via jupyter-server +jupyterlab-pygments==0.2.2 + # via nbconvert +jupyterlab-widgets==3.0.4 + # via ipywidgets +markupsafe==2.1.1 + # via + # jinja2 + # nbconvert +matplotlib-inline==0.1.6 + # via + # ipykernel + # ipython +mistune==2.0.4 + # via nbconvert +nbclassic==0.4.8 + # via notebook +nbclient==0.7.2 + # via nbconvert +nbconvert==7.2.7 + # via + # jupyter + # jupyter-server + # nbclassic + # notebook +nbformat==5.7.1 + # via + # jupyter-server + # nbclassic + # nbclient + # nbconvert + # notebook +nest-asyncio==1.5.6 + # via + # ipykernel + # jupyter-client + # nbclassic + # notebook +notebook==6.5.2 + # via jupyter +notebook-shim==0.2.2 + # via nbclassic +packaging==22.0 + # via + # ipykernel + # jupyter-server + # nbconvert + # qtpy +pandocfilters==1.5.0 + # via nbconvert +parso==0.8.3 + # via jedi pexpect==4.8.0 + # via ipython pickleshare==0.7.5 -prometheus-client==0.8.0 -prompt-toolkit==3.0.5 -ptyprocess==0.6.0 -Pygments==2.6.1 -pyparsing==2.4.7 -pyrsistent==0.16.0 -python-dateutil==2.8.1 -pyzmq==19.0.1 + # via ipython +platformdirs==2.6.0 + # via jupyter-core +prometheus-client==0.15.0 + # via + # jupyter-server + # nbclassic + # notebook +prompt-toolkit==3.0.36 + # via + # ipython + # jupyter-console +psutil==5.9.4 + # via ipykernel +ptyprocess==0.7.0 + # via + # pexpect + # terminado +pure-eval==0.2.2 + # via stack-data +pycparser==2.21 + # via cffi +pygments==2.13.0 + # via + # ipython + # jupyter-console + # nbconvert + # qtconsole +pyrsistent==0.19.2 + # via jsonschema +python-dateutil==2.8.2 + # via + # arrow + # jupyter-client +python-json-logger==2.0.4 + # via jupyter-events +pyyaml==6.0 + # via jupyter-events +pyzmq==24.0.1 + # via + # ipykernel + # jupyter-client + # jupyter-server + # nbclassic + # notebook + # qtconsole +qtconsole==5.4.0 + # via jupyter +qtpy==2.3.0 + # via qtconsole requests==2.23.0 -Send2Trash==1.5.0 -six==1.15.0 + # via -r requirements.in +rfc3339-validator==0.1.4 + # via jsonschema +rfc3986-validator==0.1.1 + # via jsonschema +send2trash==1.8.0 + # via + # jupyter-server + # nbclassic + # notebook +six==1.16.0 + # via + # asttokens + # bleach + # python-dateutil + # rfc3339-validator +sniffio==1.3.0 + # via anyio soupsieve==2.0.1 -terminado==0.8.3 -testpath==0.4.4 -tornado==6.0.4 -traitlets==4.3.3 + # via beautifulsoup4 +stack-data==0.6.2 + # via ipython +terminado==0.17.1 + # via + # jupyter-server + # jupyter-server-terminals + # nbclassic + # notebook +tinycss2==1.2.1 + # via nbconvert +tornado==6.2 + # via + # ipykernel + # jupyter-client + # jupyter-server + # nbclassic + # notebook + # terminado +traitlets==5.8.0 + # via + # comm + # ipykernel + # ipython + # ipywidgets + # jupyter-client + # jupyter-core + # jupyter-events + # jupyter-server + # matplotlib-inline + # nbclassic + # nbclient + # nbconvert + # nbformat + # notebook + # qtconsole +uri-template==1.2.0 + # via jsonschema urllib3==1.25.9 -wcwidth==0.2.4 + # via requests +wcwidth==0.2.5 + # via prompt-toolkit +webcolors==1.12 + # via jsonschema webencodings==0.5.1 + # via + # bleach + # tinycss2 +websocket-client==1.4.2 + # via jupyter-server +widgetsnbextension==4.0.4 + # via ipywidgets