diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml index 787034b..631ef3b 100644 --- a/.github/workflows/python-package.yml +++ b/.github/workflows/python-package.yml @@ -56,4 +56,4 @@ jobs: UDEMY_PASSWORD: ${{ secrets.UDEMY_PASSWORD }} CI_TEST: "True" run: | - poetry run python udemy_enroller_chrome.py + poetry run python udemy_enroller.py --browser=chrome diff --git a/CHANGELOG.md b/CHANGELOG.md index 0913438..12eb3ab 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,18 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [2.0.0] - 2021-01-19 + +### Added + +- New coupon source from discudemy.com +- Refactored to have generic scrapers and manager +- Improved performance (asyncio) +- Packaged and published to PyPI +- Added cli args --debug, --tutorialbar, --discudemy +- Removed unpopular cli arg -> --cache-hits +- Write settings/cache to home folder so we can persist settings between versions (installed from PyPI) + ## [1.0.0] - 2020-12-09 ### Added @@ -52,6 +64,8 @@ can continue as normal project running locally. Suitable for users who are not looking forward to contribute. +[2.0.0]: + https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/releases/tag/v2.0.0 [1.0.0]: https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/releases/tag/v1.0.0 [0.3]: diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 0000000..582d6a5 --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1 @@ +recursive-exclude dir-pattern tests \ No newline at end of file diff --git a/README.md b/README.md index 3dd22ac..5564aa7 100644 --- a/README.md +++ b/README.md @@ -10,13 +10,16 @@ web-scraping and automation, this script will find the necessary Udemy Coupons **NOTE: THIS PROJECT IS NOT AFFILIATED WITH UDEMY.** -The code scrapes course links and coupons from -[tutorialbar.com](https://tutorialbar.com) +The code scrapes course links and coupons from: + - [tutorialbar.com](https://tutorialbar.com) + - [discudemy.com](https://discudemy.com) In case of any bugs or issues, please open an issue in github. Also, don't forget to **Fork & Star the repository if you like it!** +***We are also on [GitLab](https://gitlab.com/the-automators/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE)*** + **_Video Proof:_** [![Udemy Auto-Course-Enroller](https://img.youtube.com/vi/IW8CCtv2k2A/0.jpg)](https://www.youtube.com/watch?v=IW8CCtv2k2A "GET PAID UDEMY Courses for FREE, Automatically with this Python Script!") @@ -69,63 +72,55 @@ Props to Davidd Sargent for making a super simple video tutorial. If you prefer [![GET Udemy Courses for FREE with Python | 2 Minute Tuesday](https://i.ytimg.com/vi/tdLsVoraMxw/hq720.jpg)](https://www.youtube.com/watch?v=tdLsVoraMxw "GET Udemy Courses for FREE with Python | 2 Minute Tuesday") -1 . Make sure to install all the requirements above. +1 . Install from PyPI `pip install udemy_enroller` - Run the script and the cli will guide you through the settings required -- Otherwise you can rename the following file - [sample_settings.yaml](sample_settings.yaml) to **settings.yaml** and edit it - using a text editor and insert your **Udemy registered email in the email - section**, your **Udemy password in the password section**, and the **ZIP Code - in the zipcode section (if you reside in the United States or any other region - where Udemy asks for ZIP Code as Billing Info, else enter a random number)** - Additionally you can add your preferred languages and course categories. +- If you decide to save the settings they will be stored in your home directory:
+**Windows**: + C:/Users/CurrentUserName/.udemy_enroller
+**Linux**: + /home/username/.udemy_enroller -2 . Choose the appropriate file for your browser (from the list below): +2 . Choose the appropriate command for your browser (from the list below): - **Tested and works perfectly:** - Chrome: - [udemy_enroller_chrome.py](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller_chrome.py) + [udemy_enroller --browser=chrome](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller.py) - Chromium: - [udemy_enroller_chromium.py](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller_chromium.py) + [udemy_enroller --browser=chromium](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller.py) - Edge: - [udemy_enroller_edge.py](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller_edge.py) + [udemy_enroller --browser=edge](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller.py) - **Has issues when run on custom kernel but works fine on vanilla OS:** - Firefox: - [udemy_enroller_firefox.py(might require manual driver installation)](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller_firefox.py) + [udemy_enroller --browser=firefox (might require manual driver installation)](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller.py) - **Untested:** - Opera: - [udemy_enroller_opera.py](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller_opera.py) - -- **Experimentation or other Browsers (especially Safari):** - - - [aka the old bot- requires manual driver setup](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller_vanilla.py) - + [udemy_enroller --browser=opera](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller.py) + - **Use at your own risk:** - - Vanilla - Internet Explorer: - [udemy_enroller_internet_explorer.py](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller_internet_explorer.py) + [udemy_enroller --browser=internet_explorer](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller.py) 3 . The script can be passed arguments: - `--help`: View full list of arguments available -- `--max-pages=`: Max number of pages to scrape from tutorialbar.com before exiting the script - `--browser=`: Run with a specific browser -- `--cache-hits=`: If we hit the cache this number of times in a row we will exit the script +- `--discudemy`: Run the discudemy scraper only +- `--tutorialbar`: Run the tutorialbar scraper only +- `--max-pages=`: Max number of pages to scrape from sites before exiting the script (default is 5) +- `--debug`: Enable debug logging 4 . Run the chosen script in terminal like so: -- `python udemy_enroller_firefox.py` - - Or by using the generic script: -- `python udemy_enroller.py --browser=firefox` +- `udemy_enroller --browser=firefox` 5 . The bot starts scraping the course links from the first **All Courses** page -on [Tutorial Bar](https://www.tutorialbar.com/all-courses/page/1) and starts +on [Tutorial Bar](https://www.tutorialbar.com/all-courses/page/1) and [DiscUdemy](https://www.discudemy.com/all) and starts enrolling you to Udemy courses. After it has enrolled you to courses from the -first page, it then moves to the next Tutorial Bar page and the cycle continues. +first page, it then moves to the next site page and the cycle continues. - Stop the script by pressing ctrl+c in terminal to stop the enrollment process. @@ -145,7 +140,7 @@ which of course I got for free! :) ### 2. How does the bot work? -The bot retrieves coupon links from Tutorial Bar's list to cut the prices and +The bot retrieves coupon links from Tutorial Bar's and DiscUdemy list to cut the prices and then uses Selenium's Browser automation features to login and enroll to the courses. Think of it this way: Epic Games & other clients like Steam provide you a handful of games each week, for free; Only in this case, we need a coupon code @@ -209,6 +204,12 @@ Thanks to [JetBrains](https://jetbrains.com/?from=udemy-free-course-enroller) fo ### GitBook -[![Gitbook](https://i.imgur.com/OkuB14I.jpg)](https://gitbook.com) +[![GitBook](https://i.imgur.com/OkuB14I.jpg)](https://gitbook.com) + +Thanks to [GitBook](https://gitbook.com) for supporting us. GitBook is the best place to track personal notes and ideas for teams. If you think their product might help you, please support them. + +### GitLab + +[![GitLab](https://i.imgur.com/aUWtSn4.png)](https://gitlab.com) -Thanks to [Gitbook](https://gitbook.com) for supporting us. Gitbook is the best place to track personal notes and ideas for teams. If you think their product might help you, please support them. +Thanks to [GitLab](https://gitlab.com) for supporting us. GitLab is one of the main code hosting and CI/CD providers out there. They support the open source community through their GitLab for [Open Source program](https://about.gitlab.com/solutions/open-source/). Please check them out. diff --git a/core/tutorialbar.py b/core/tutorialbar.py deleted file mode 100644 index 9eea724..0000000 --- a/core/tutorialbar.py +++ /dev/null @@ -1,130 +0,0 @@ -import logging -from multiprocessing.dummy import Pool -from typing import List - -import requests -from bs4 import BeautifulSoup - -logger = logging.getLogger("udemy_enroller") - - -class TutorialBarScraper: - """ - Contains any logic related to scraping of data from tutorialbar.com - """ - - DOMAIN = "https://www.tutorialbar.com" - AD_DOMAINS = ("https://amzn",) - - def __init__(self, max_pages=None): - self.current_page = 0 - self.last_page = None - self.links_per_page = 12 - self.max_pages = max_pages - - def run(self) -> List: - """ - Runs the steps to scrape links from tutorialbar.com - - :return: list of udemy coupon links - """ - self.current_page += 1 - logger.info("Please Wait: Getting the course list from tutorialbar.com...") - course_links = self.get_course_links( - f"{self.DOMAIN}/all-courses/page/{self.current_page}/" - ) - - logger.info(f"Page: {self.current_page} of {self.last_page} scraped") - udemy_links = self.gather_udemy_course_links(course_links) - filtered_udemy_links = self._filter_ad_domains(udemy_links) - - for counter, course in enumerate(filtered_udemy_links): - logger.info(f"Received Link {counter + 1} : {course}") - - return filtered_udemy_links - - def script_should_run(self) -> bool: - """ - Returns boolean of whether or not we should continue checking tutorialbar.com - - :return: - """ - - should_run = True - if self.max_pages is not None: - should_run = self.max_pages > self.current_page - if not should_run: - logger.info( - f"Stopping loop. We have reached max number of pages to scrape: {self.max_pages}" - ) - return should_run - - def is_first_loop(self) -> bool: - """ - Simple check to see if this is the first time we have executed - - :return: boolean showing if this is the first loop of the script - """ - return self.current_page == 1 - - def _filter_ad_domains(self, udemy_links) -> List: - """ - Filter out any known ad domains from the links scraped - - :param list udemy_links: List of urls to filter ad domains from - :return: A list of filtered urls - """ - ad_links = set() - for link in udemy_links: - for ad_domain in self.AD_DOMAINS: - if link.startswith(ad_domain): - ad_links.add(link) - if ad_links: - logger.info(f"Removing ad links from courses: {ad_links}") - return list(set(udemy_links) - ad_links) - - def get_course_links(self, url: str) -> List: - """ - Gets the url of pages which contain the udemy link we want to get - - :param str url: The url to scrape data from - :return: list of pages on tutorialbar.com that contain Udemy coupons - """ - response = requests.get(url=url) - - soup = BeautifulSoup(response.content, "html.parser") - - links = soup.find_all("h3") - course_links = [link.find("a").get("href") for link in links] - - self.last_page = ( - soup.find("li", class_="next_paginate_link").find_previous_sibling().text - ) - - return course_links - - @staticmethod - def get_udemy_course_link(url: str) -> str: - """ - Gets the udemy course link - - :param str url: The url to scrape data from - :return: Coupon link of the udemy course - """ - response = requests.get(url=url) - soup = BeautifulSoup(response.content, "html.parser") - udemy_link = soup.find("span", class_="rh_button_wrapper").find("a").get("href") - return udemy_link - - def gather_udemy_course_links(self, courses: List[str]) -> List: - """ - Threaded fetching of the udemy course links from tutorialbar.com - - :param list courses: A list of tutorialbar.com course links we want to fetch the udemy links for - :return: list of udemy links - """ - thread_pool = Pool() - results = thread_pool.map(self.get_udemy_course_link, courses) - thread_pool.close() - thread_pool.join() - return results diff --git a/core/utils.py b/core/utils.py deleted file mode 100644 index 6383083..0000000 --- a/core/utils.py +++ /dev/null @@ -1,116 +0,0 @@ -import logging -from typing import Union - -from selenium.common.exceptions import ( - NoSuchElementException, - TimeoutException, - WebDriverException, -) -from selenium.webdriver.remote.webdriver import WebDriver - -from core import CourseCache, Settings, TutorialBarScraper, UdemyActions, exceptions - -logger = logging.getLogger("udemy_enroller") - - -def _redeem_courses( - driver: WebDriver, - settings: Settings, - max_pages: Union[int, None], - cache_hit_limit: int, -) -> None: - """ - Method to scrape courses from tutorialbar.com and enroll in them on udemy - - :param WebDriver driver: Webdriver used to enroll in Udemy courses - :param Settings settings: Core settings used for Udemy - :param int max_pages: Max pages to scrape from tutorialbar.com - :param int cache_hit_limit: If we hit the cache this many times in a row we exit the script - :return: - """ - cache = CourseCache() - tb_scraper = TutorialBarScraper(max_pages) - udemy_actions = UdemyActions(driver, settings) - udemy_actions.login() # login once outside while loop - - current_cache_hits = 0 - - while True: - # Check if we should exit the loop - if not tb_scraper.script_should_run(): - break - udemy_course_links = tb_scraper.run() - - for course_link in udemy_course_links: - try: - if course_link not in cache: - status = udemy_actions.redeem(course_link) - cache.add(course_link, status) - # Reset cache hit count as we haven't scraped this page before - current_cache_hits = 0 - else: - logger.info(f"In cache: {course_link}") - - # Increment the cache hit count since this link is in the cache - current_cache_hits += 1 - - # Exit the loop if we have reached the cache hit limit - if _reached_cache_hit_limit(cache_hit_limit, current_cache_hits): - return - except NoSuchElementException as e: - logger.error(e) - except TimeoutException: - logger.error(f"Timeout on link: {course_link}") - except WebDriverException: - logger.error(f"Webdriver exception on link: {course_link}") - except KeyboardInterrupt: - logger.error("Exiting the script") - raise - except exceptions.RobotException as e: - logger.error(e) - raise - except Exception as e: - logger.error(f"Unexpected exception: {e}") - finally: - if settings.is_ci_build: - logger.info("We have attempted to subscribe to 1 udemy course") - logger.info("Ending test") - return - - logger.info("Moving on to the next page of the course list on tutorialbar.com") - - -def _reached_cache_hit_limit(cache_hit_limit, cache_hits) -> bool: - """ - Check if we have reached the cache hit limit - - :param int cache_hit_limit: Limit on the number of cache hits in a row to allow - :param int cache_hits: Current number of cache hits in a row - :return: - """ - reached_hit_limit = cache_hit_limit <= cache_hits - if reached_hit_limit: - logger.info(f"Hit cache {cache_hits} times in a row. Exiting script") - return reached_hit_limit - - -def redeem_courses( - driver: WebDriver, - settings: Settings, - max_pages: Union[int, None], - cache_hit_limit: int, -) -> None: - """ - Wrapper of _redeem_courses so we always close browser on completion - - :param WebDriver driver: Webdriver used to enroll in Udemy courses - :param Settings settings: Core settings used for Udemy - :param int max_pages: Max pages to scrape from tutorialbar.com - :param int cache_hit_limit: If we hit the cache this many times in a row we exit the script - :return: - """ - try: - _redeem_courses(driver, settings, max_pages, cache_hit_limit) - finally: - logger.info("Closing browser") - driver.quit() diff --git a/logconfig.ini b/logconfig.ini deleted file mode 100644 index 5e628ec..0000000 --- a/logconfig.ini +++ /dev/null @@ -1,36 +0,0 @@ -[loggers] -keys=root,udemy_enroller - -[handlers] -keys=defaultHandler,consoleHandler - -[formatters] -keys=defaultFormatter,consoleFormatter - -[logger_root] -level=INFO -handlers=defaultHandler -qualname=root - -[logger_udemy_enroller] -level=INFO -handlers=defaultHandler,consoleHandler -qualname=udemy_enroller -propagate=0 - -[handler_defaultHandler] -class=FileHandler -formatter=defaultFormatter -args=("app.log", "a") - -[handler_consoleHandler] -class=StreamHandler -level=INFO -formatter=consoleFormatter -args=(sys.stdout,) - -[formatter_defaultFormatter] -format=%(asctime)s - %(name)s - %(levelname)s - %(module)s : %(message)s - -[formatter_consoleFormatter] -format=%(message)s \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index 54f9dc6..a5b9523 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,23 +1,24 @@ [tool.poetry] name = "automatic-udemy-course-enroller-get-paid-udemy-courses-for-free" -version = "0.3" +version = "2.0.0" description = "" authors = [""] [tool.poetry.dependencies] python = "^3.8" selenium = "^3.141.0" -requests = "^2.24.0" beautifulsoup4 = "^4.9.3" "ruamel.yaml" = "^0.16.12" webdriver-manager = "^3.2.2" +aiohttp = "^3.7.3" [tool.poetry.dev-dependencies] black = "^20.8b1" isort = "^5.6.4" pytest = "^6.1.2" pytest-cov = "^2.10.1" +pytest-asyncio = "^0.14.0" [build-system] -requires = ["poetry-core>=1.0.0a5"] -build-backend = "poetry.core.masonry.api" +requires = ["setuptools", "wheel"] +build-backend = "setuptools.build_meta" diff --git a/requirements.txt b/requirements.txt index 7f2e09c..2ff857f 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,4 +1,4 @@ -requests +aiohttp beautifulsoup4 ruamel.yaml selenium diff --git a/sample_settings.yaml b/sample_settings.yaml index 7b6acf1..94d90f3 100644 --- a/sample_settings.yaml +++ b/sample_settings.yaml @@ -3,3 +3,4 @@ udemy: password: "ExamplePa$$w0rd" # Enter your Udemy password here zipcode: "12345" # If Udemy requires a zipcode for your country, enter it here. languages: [] # If you want to limit the languages of courses to claim e.g ["French", "Spanish"] + categories: [] # If you want to limit the categories of courses to claim \ No newline at end of file diff --git a/setup.py b/setup.py new file mode 100644 index 0000000..1d30fb2 --- /dev/null +++ b/setup.py @@ -0,0 +1,46 @@ +import pathlib + +from setuptools import find_packages, setup + +here = pathlib.Path(__file__).parent.resolve() + +long_description = (here / "README.md").read_text(encoding="utf-8") + +setup( + name="udemy-enroller", + version="2.0.0", + long_description=long_description, + long_description_content_type="text/markdown", + author="aapatre", + author_email="udemyenroller@gmail.com", + maintainer="fakeid cullzie", + url="https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE", + classifiers=[ + "Development Status :: 4 - Beta", + "Intended Audience :: Education", + "License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)", + "Programming Language :: Python :: 3.8", + ], + keywords="udemy, education, enroll", + packages=find_packages( + exclude=["*tests*"], + ), + python_requires=">=3.8, <4", + install_requires=[ + "aiohttp", + "beautifulsoup4", + "ruamel.yaml", + "selenium", + "webdriver-manager", + ], + setup_requires=["pytest-runner"], + extras_require={ + "dev": ["black", "isort"], + "test": ["pytest", "pytest-cov"], + }, + entry_points={ + "console_scripts": [ + "udemy_enroller=udemy_enroller.cli:main", + ], + }, +) diff --git a/tests/conftest.py b/tests/conftest.py index 0f306f9..0c44aa3 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -3,17 +3,21 @@ import pytest +from udemy_enroller.utils import get_app_dir + @pytest.fixture(scope="session", autouse=True) def test_file_dir(): + app_dir = get_app_dir() test_file_dir = "test_tmp" + full_dir = os.path.join(app_dir, test_file_dir) # Try to delete directory in case it wasn't deleted after last test run - if os.path.isdir(test_file_dir): - shutil.rmtree(test_file_dir) - yield os.mkdir(test_file_dir) + if os.path.isdir(full_dir): + shutil.rmtree(full_dir) + yield os.mkdir(full_dir) # Delete directory after all tests completed - if os.path.isdir(test_file_dir): - shutil.rmtree(test_file_dir) + if os.path.isdir(full_dir): + shutil.rmtree(full_dir) @pytest.fixture() diff --git a/tests/core/scrapers/__init__.py b/tests/core/scrapers/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tests/core/test_tutorialbar.py b/tests/core/scrapers/test_tutorialbar.py similarity index 71% rename from tests/core/test_tutorialbar.py rename to tests/core/scrapers/test_tutorialbar.py index 0cc3929..2c9f902 100644 --- a/tests/core/test_tutorialbar.py +++ b/tests/core/scrapers/test_tutorialbar.py @@ -2,9 +2,28 @@ import pytest -from core import TutorialBarScraper +from udemy_enroller.scrapers.tutorialbar import TutorialBarScraper +class MockResponse: + def __init__(self, data, status): + self._data = data + self.status = status + + async def read(self): + return self._data + + async def json(self): + return self._data + + async def __aexit__(self, exc_type, exc, tb): + pass + + async def __aenter__(self): + return self + + +@pytest.mark.asyncio @pytest.mark.parametrize( "tutorialbar_course_page_link,tutorialbar_links,udemy_links", [ @@ -25,7 +44,7 @@ ) @mock.patch.object(TutorialBarScraper, "gather_udemy_course_links") @mock.patch.object(TutorialBarScraper, "get_course_links") -def test_run( +async def test_run( mock_get_course_links, mock_gather_udemy_course_links, tutorialbar_course_page_link, @@ -34,8 +53,8 @@ def test_run( ): mock_get_course_links.return_value = tutorialbar_links mock_gather_udemy_course_links.return_value = udemy_links - tbs = TutorialBarScraper() - links = tbs.run() + tbs = TutorialBarScraper(enabled=True) + links = await tbs.run() mock_get_course_links.assert_called_with(tutorialbar_course_page_link) mock_gather_udemy_course_links.assert_called_with(tutorialbar_links) @@ -43,29 +62,15 @@ def test_run( assert link in udemy_links -@pytest.mark.parametrize( - "page_number,is_first_page", - [(1, True), (2, False)], - ids=( - "First Page", - "Not first page", - ), -) -def test_check_page_number(page_number, is_first_page): - tbs = TutorialBarScraper() - tbs.current_page = page_number - assert tbs.is_first_loop() == is_first_page - - -@mock.patch("core.tutorialbar.requests") -def test_get_course_links(mock_requests, tutorialbar_main_page): +@pytest.mark.asyncio +@mock.patch("aiohttp.ClientSession.get") +async def test_get_course_links(mock_get, tutorialbar_main_page): url = "https://www.tutorialbar.com/main" - requests_response = mock.Mock() - requests_response.content = tutorialbar_main_page - mock_requests.get.return_value = requests_response - tbs = TutorialBarScraper() + + mock_get.return_value = MockResponse(tutorialbar_main_page, 200) + tbs = TutorialBarScraper(enabled=True) tbs.current_page = 1 - links = tbs.get_course_links(url) + links = await tbs.get_course_links(url) assert tbs.last_page == "601" assert links == [ @@ -82,3 +87,19 @@ def test_get_course_links(mock_requests, tutorialbar_main_page): "https://www.tutorialbar.com/quickbooks-pro-desktop-bookkeeping-business-easy-way/", "https://www.tutorialbar.com/quickbooks-online-bank-feeds-credit-card-feeds-2020/", ] + + +@pytest.mark.parametrize( + "enabled", + [ + (True,), + (False,), + ], + ids=("Test enabled", "Test disabled"), +) +def test_enable_status( + enabled, +): + + tbs = TutorialBarScraper(enabled=enabled) + assert tbs.is_disabled() is not enabled diff --git a/tests/core/test_cache.py b/tests/core/test_cache.py index a2e04ba..7500488 100644 --- a/tests/core/test_cache.py +++ b/tests/core/test_cache.py @@ -3,8 +3,8 @@ import pytest -from core import CourseCache -from core.udemy import UdemyStatus +from udemy_enroller import CourseCache +from udemy_enroller.udemy import UdemyStatus @pytest.mark.parametrize( @@ -78,7 +78,7 @@ ], ids=("Initialize cache and add data",), ) -@mock.patch("core.cache.datetime") +@mock.patch("udemy_enroller.cache.datetime") def test_cache( mock_dt, cache_file_name, @@ -171,7 +171,7 @@ def test_cache( ], ids=("Initialize cache and add data",), ) -@mock.patch("core.cache.datetime") +@mock.patch("udemy_enroller.cache.datetime") def test_cache_load( mock_dt, cache_file_name, diff --git a/tests/core/test_driver_manager.py b/tests/core/test_driver_manager.py index 50d5823..35ec783 100644 --- a/tests/core/test_driver_manager.py +++ b/tests/core/test_driver_manager.py @@ -2,8 +2,8 @@ import pytest -from core import DriverManager -from core.driver_manager import ( +from udemy_enroller import DriverManager +from udemy_enroller.driver_manager import ( ALL_VALID_BROWSER_STRINGS, VALID_EDGE_STRINGS, VALID_FIREFOX_STRINGS, @@ -33,13 +33,13 @@ "unsupported browser", ), ) -@mock.patch("core.driver_manager.webdriver") -@mock.patch("core.driver_manager.ChromeDriverManager") -@mock.patch("core.driver_manager.GeckoDriverManager") -@mock.patch("core.driver_manager.EdgeChromiumDriverManager") -@mock.patch("core.driver_manager.IEDriverManager") -@mock.patch("core.driver_manager.OperaDriverManager") -@mock.patch("core.driver_manager.ChromeType") +@mock.patch("udemy_enroller.driver_manager.webdriver") +@mock.patch("udemy_enroller.driver_manager.ChromeDriverManager") +@mock.patch("udemy_enroller.driver_manager.GeckoDriverManager") +@mock.patch("udemy_enroller.driver_manager.EdgeChromiumDriverManager") +@mock.patch("udemy_enroller.driver_manager.IEDriverManager") +@mock.patch("udemy_enroller.driver_manager.OperaDriverManager") +@mock.patch("udemy_enroller.driver_manager.ChromeType") def test_driver_manager_init( _, mock_opera_driver_manager, @@ -95,10 +95,10 @@ def test_driver_manager_init( ], ids=("chrome is ci build", "chrome is not ci build"), ) -@mock.patch("core.driver_manager.webdriver") -@mock.patch("core.driver_manager.ChromeOptions") -@mock.patch("core.driver_manager.ChromeDriverManager") -@mock.patch("core.driver_manager.ChromeType") +@mock.patch("udemy_enroller.driver_manager.webdriver") +@mock.patch("udemy_enroller.driver_manager.ChromeOptions") +@mock.patch("udemy_enroller.driver_manager.ChromeDriverManager") +@mock.patch("udemy_enroller.driver_manager.ChromeType") def test_driver_manager_ci_build( _, mock_chrome_driver_manager, diff --git a/tests/core/test_settings.py b/tests/core/test_settings.py index 5fdcd12..b0a680a 100644 --- a/tests/core/test_settings.py +++ b/tests/core/test_settings.py @@ -4,7 +4,8 @@ import pytest from ruamel.yaml import YAML -from core import Settings +from udemy_enroller import Settings +from udemy_enroller.utils import get_app_dir @pytest.mark.parametrize( @@ -59,7 +60,7 @@ def test_settings(email, password, zip_code, languages, categories, save, file_n "builtins.input", side_effect=[email, zip_code, languages, categories, save] ): with mock.patch("getpass.getpass", return_value=password): - settings_path = f"test_tmp/{file_name}" + settings_path = os.path.join(get_app_dir(), f"test_tmp/{file_name}") settings = Settings(settings_path) assert settings.email == email assert settings.password == password diff --git a/tests/test_udemy_enroller.py b/tests/test_udemy_enroller.py index 9e478ca..e88bad1 100644 --- a/tests/test_udemy_enroller.py +++ b/tests/test_udemy_enroller.py @@ -3,7 +3,7 @@ import pytest -from udemy_enroller import parse_args +from udemy_enroller.cli import parse_args @pytest.mark.parametrize( diff --git a/udemy_enroller.py b/udemy_enroller.py index 5e68366..a870b46 100644 --- a/udemy_enroller.py +++ b/udemy_enroller.py @@ -1,77 +1,4 @@ -# Install all the requirements by running requirements.py in IDLE or follow the alternate instructions at -# https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/ Make sure you have -# cleared all saved payment details on your Udemy account & the browser! -import argparse -from argparse import Namespace -from typing import Union - -from selenium.webdriver.remote.webdriver import WebDriver - -from core import ALL_VALID_BROWSER_STRINGS, DriverManager, Settings -from core.utils import redeem_courses - - -def run( - browser: str, - max_pages: Union[int, None], - cache_hit_limit: int, - driver: WebDriver = None, -): - """ - Run the udemy enroller script - - :param str browser: Name of the browser we want to create a driver for - :param int or None max_pages: Max number of pages to scrape from tutorialbar.com - :param int cache_hit_limit: If we hit the cache this many times in a row we exit the script - :param WebDriver driver: - :return: - """ - settings = Settings() - if driver is None: - dm = DriverManager(browser=browser, is_ci_build=settings.is_ci_build) - driver = dm.driver - redeem_courses(driver, settings, max_pages, cache_hit_limit) - - -def parse_args(browser=None, use_manual_driver=False) -> Namespace: - """ - Parse args from the CLI or use the args passed in - - :param str browser: Name of the browser we want to create a driver for - :param bool use_manual_driver: If True don't create a web driver using web driver manager - :return: Args to be used in the script - """ - parser = argparse.ArgumentParser(description="Udemy Enroller") - - parser.add_argument( - "--browser", - type=str, - default=browser, - choices=ALL_VALID_BROWSER_STRINGS, - help="Browser to use for Udemy Enroller", - ) - parser.add_argument( - "--max-pages", - type=int, - default=None, - help="Max pages to scrape from tutorialbar.com", - ) - parser.add_argument( - "--cache-hits", - type=int, - default=12, - help="If we hit the cache this number of times in a row we will exit the script", - ) - - args = parser.parse_args() - - if args.browser is None and not use_manual_driver: - parser.print_help() - else: - return args - +from udemy_enroller.cli import main if __name__ == "__main__": - args = parse_args() - if args: - run(args.browser, args.max_pages, args.cache_hits) + main() diff --git a/core/__init__.py b/udemy_enroller/__init__.py similarity index 53% rename from core/__init__.py rename to udemy_enroller/__init__.py index cb59e9a..b231ff8 100644 --- a/core/__init__.py +++ b/udemy_enroller/__init__.py @@ -1,9 +1,8 @@ -import logging.config - from .cache import CourseCache from .driver_manager import ALL_VALID_BROWSER_STRINGS, DriverManager +from .logging import load_logging_config +from .scrapers.manager import ScraperManager from .settings import Settings -from .tutorialbar import TutorialBarScraper from .udemy import UdemyActions -logging.config.fileConfig("logconfig.ini", disable_existing_loggers=False) +load_logging_config() diff --git a/core/cache.py b/udemy_enroller/cache.py similarity index 93% rename from core/cache.py rename to udemy_enroller/cache.py index c3683e0..6b648ed 100644 --- a/core/cache.py +++ b/udemy_enroller/cache.py @@ -2,6 +2,8 @@ import json import os +from udemy_enroller.utils import get_app_dir + class CourseCache: """ @@ -9,7 +11,7 @@ class CourseCache: """ def __init__(self, file_name=".course_cache"): - self._file_name = file_name + self._file_name = os.path.join(get_app_dir(), file_name) self._cache = [] self._load_cache() diff --git a/udemy_enroller/cli.py b/udemy_enroller/cli.py new file mode 100644 index 0000000..dc8fb1c --- /dev/null +++ b/udemy_enroller/cli.py @@ -0,0 +1,118 @@ +import argparse +import logging +from argparse import Namespace +from typing import Tuple, Union + +from udemy_enroller import ALL_VALID_BROWSER_STRINGS, DriverManager, Settings +from udemy_enroller.logging import get_logger +from udemy_enroller.runner import redeem_courses + +logger = get_logger() + + +def enable_debug_logging() -> None: + """ + Enable debug logging for the scripts + + :return: None + """ + logger.setLevel(logging.DEBUG) + for handler in logger.handlers: + handler.setLevel(logging.DEBUG) + logger.info(f"Enabled debug logging") + + +def determine_if_scraper_enabled( + tutorialbar_enabled: bool, + discudemy_enabled: bool, +) -> Tuple[bool, bool]: + """ + Determine what scrapers should be enabled and disabled + + :return: tuple containing boolean of what scrapers should run + """ + if not tutorialbar_enabled and not discudemy_enabled: + # Set both to True since user has not enabled a specific scraper i.e Run all scrapers + tutorialbar_enabled, discudemy_enabled = True, True + return tutorialbar_enabled, discudemy_enabled + + +def run( + browser: str, + tutorialbar_enabled: bool, + discudemy_enabled: bool, + max_pages: Union[int, None], +): + """ + Run the udemy enroller script + + :param str browser: Name of the browser we want to create a driver for + :param bool tutorialbar_enabled: + :param bool discudemy_enabled: + :param int max_pages: Max pages to scrape from sites (if pagination exists) + :return: + """ + settings = Settings() + dm = DriverManager(browser=browser, is_ci_build=settings.is_ci_build) + redeem_courses( + dm.driver, settings, tutorialbar_enabled, discudemy_enabled, max_pages + ) + + +def parse_args(browser=None) -> Namespace: + """ + Parse args from the CLI or use the args passed in + + :param str browser: Name of the browser we want to create a driver for + :return: Args to be used in the script + """ + parser = argparse.ArgumentParser(description="Udemy Enroller") + + parser.add_argument( + "--browser", + type=str, + default=browser, + choices=ALL_VALID_BROWSER_STRINGS, + help="Browser to use for Udemy Enroller", + ) + parser.add_argument( + "--tutorialbar", + action="store_true", + default=False, + help="Run tutorialbar scraper", + ) + parser.add_argument( + "--discudemy", + action="store_true", + default=False, + help="Run discudemy scraper", + ) + parser.add_argument( + "--max-pages", + type=int, + default=5, + help=f"Max pages to scrape from sites (if pagination exists) (Default is 5)", + ) + parser.add_argument( + "--debug", + action="store_true", + help="Enable debug logging", + ) + + args = parser.parse_args() + + if args.browser is None: + parser.print_help() + else: + return args + + +def main(): + args = parse_args() + if args: + if args.debug: + enable_debug_logging() + tutorialbar_enabled, discudemy_enabled = determine_if_scraper_enabled( + args.tutorialbar, args.discudemy + ) + run(args.browser, tutorialbar_enabled, discudemy_enabled, args.max_pages) diff --git a/core/driver_manager.py b/udemy_enroller/driver_manager.py similarity index 95% rename from core/driver_manager.py rename to udemy_enroller/driver_manager.py index b2b7506..c8542ea 100644 --- a/core/driver_manager.py +++ b/udemy_enroller/driver_manager.py @@ -1,5 +1,3 @@ -import logging - from selenium import webdriver from selenium.webdriver.chrome.options import Options as ChromeOptions from webdriver_manager.chrome import ChromeDriverManager @@ -8,6 +6,10 @@ from webdriver_manager.opera import OperaDriverManager from webdriver_manager.utils import ChromeType +from udemy_enroller.logging import get_logger + +logger = get_logger() + VALID_FIREFOX_STRINGS = {"ff", "firefox"} VALID_CHROME_STRINGS = {"chrome", "google-chrome"} VALID_CHROMIUM_STRINGS = {"chromium"} @@ -25,9 +27,6 @@ ) -logger = logging.getLogger("udemy_enroller") - - class DriverManager: def __init__(self, browser: str, is_ci_build: bool = False): self.driver = None @@ -87,6 +86,7 @@ def _build_ci_options_chrome(): # We need to run headless when using github CI options.add_argument("--headless") options.add_argument("user-agent={0}".format(user_agent)) + options.add_argument("accept-language=en-GB,en-US;q=0.9,en;q=0.8") options.add_argument("--window-size=1325x744") logger.info("This is a CI run") return options diff --git a/core/exceptions.py b/udemy_enroller/exceptions.py similarity index 51% rename from core/exceptions.py rename to udemy_enroller/exceptions.py index 68821fb..ba632a2 100644 --- a/core/exceptions.py +++ b/udemy_enroller/exceptions.py @@ -4,3 +4,11 @@ class RobotException(Exception): """ pass + + +class LoginException(Exception): + """ + You have failed to login to the Udemy site + """ + + pass diff --git a/udemy_enroller/http.py b/udemy_enroller/http.py new file mode 100644 index 0000000..5ea7f95 --- /dev/null +++ b/udemy_enroller/http.py @@ -0,0 +1,22 @@ +import aiohttp + +from udemy_enroller.logging import get_logger + +logger = get_logger() + + +async def get(url, headers={}): + """ + Send REST get request to the url passed in + + :param url: The Url to get call get request on + :param headers: The headers to pass with the get request + :return: data if any exists + """ + try: + async with aiohttp.ClientSession() as session: + async with session.get(url, headers=headers) as response: + text = await response.read() + return text + except Exception as e: + logger.error(f"Error in get request: {e}") diff --git a/udemy_enroller/logging.py b/udemy_enroller/logging.py new file mode 100644 index 0000000..f0e5eac --- /dev/null +++ b/udemy_enroller/logging.py @@ -0,0 +1,48 @@ +import logging +import logging.config +import os + +from udemy_enroller.utils import get_app_dir + + +class CustomFileHandler(logging.FileHandler): + """ + Allows us to log to the app directory + """ + + def __init__(self, file_name="app.log", mode="a"): + log_file_path = os.path.join(get_app_dir(), file_name) + super(CustomFileHandler, self).__init__(log_file_path, mode) + + +def load_logging_config() -> None: + """ + Load logging configuration + + :return: None + """ + + my_logger = logging.getLogger("udemy_enroller") + my_logger.setLevel(logging.INFO) + + # File handler + file_handler = CustomFileHandler() + log_format = "%(asctime)s - %(name)s - %(levelname)s - %(module)s : %(message)s" + formatter = logging.Formatter(fmt=log_format) + file_handler.setFormatter(formatter) + my_logger.addHandler(file_handler) + + # Basic format for streamhandler + stream_handler = logging.StreamHandler() + simple_format = logging.Formatter(fmt="%(message)s") + stream_handler.setFormatter(simple_format) + my_logger.addHandler(stream_handler) + + +def get_logger() -> logging.Logger: + """ + Convenience method to load the app logger + + :return: An instance of the app logger + """ + return logging.getLogger("udemy_enroller") diff --git a/udemy_enroller/runner.py b/udemy_enroller/runner.py new file mode 100644 index 0000000..c09e1ac --- /dev/null +++ b/udemy_enroller/runner.py @@ -0,0 +1,100 @@ +import asyncio +from typing import Union + +from selenium.common.exceptions import ( + NoSuchElementException, + TimeoutException, + WebDriverException, +) +from selenium.webdriver.remote.webdriver import WebDriver + +from udemy_enroller import ( + CourseCache, + ScraperManager, + Settings, + UdemyActions, + exceptions, +) +from udemy_enroller.logging import get_logger + +logger = get_logger() + + +def _redeem_courses( + driver: WebDriver, + settings: Settings, + scrapers: ScraperManager, +) -> None: + """ + Method to scrape courses from tutorialbar.com and enroll in them on udemy + + :param WebDriver driver: Webdriver used to enroll in Udemy courses + :param Settings settings: Core settings used for Udemy + :param ScraperManager scrapers: + :return: + """ + cache = CourseCache() + udemy_actions = UdemyActions(driver, settings) + udemy_actions.login() # login once outside while loop + loop = asyncio.get_event_loop() + + while True: + udemy_course_links = loop.run_until_complete(scrapers.run()) + + if udemy_course_links: + for course_link in udemy_course_links: + try: + if course_link not in cache: + status = udemy_actions.redeem(course_link) + cache.add(course_link, status) + else: + logger.debug(f"In cache: {course_link}") + except NoSuchElementException as e: + logger.error(e) + except TimeoutException: + logger.error(f"Timeout on link: {course_link}") + except WebDriverException: + logger.error(f"Webdriver exception on link: {course_link}") + except KeyboardInterrupt: + logger.error("Exiting the script") + return + except exceptions.RobotException as e: + logger.error(e) + return + except Exception as e: + logger.error(f"Unexpected exception: {e}") + finally: + if settings.is_ci_build: + logger.info("We have attempted to subscribe to 1 udemy course") + logger.info("Ending test") + return + else: + logger.info("All scrapers complete") + return + + +def redeem_courses( + driver: WebDriver, + settings: Settings, + tutorialbar_enabled: bool, + discudemy_enabled: bool, + max_pages: Union[int, None], +) -> None: + """ + Wrapper of _redeem_courses so we always close browser on completion + + :param WebDriver driver: Webdriver used to enroll in Udemy courses + :param Settings settings: Core settings used for Udemy + :param bool tutorialbar_enabled: Boolean signifying if tutorialbar scraper should run + :param bool discudemy_enabled: Boolean signifying if discudemy scraper should run + :param int max_pages: Max pages to scrape from sites (if pagination exists) + :return: + """ + try: + scrapers = ScraperManager(tutorialbar_enabled, discudemy_enabled, max_pages) + _redeem_courses(driver, settings, scrapers) + except exceptions.LoginException as e: + logger.error(str(e)) + finally: + logger.info("Closing browser") + driver.quit() diff --git a/udemy_enroller/scrapers/__init__.py b/udemy_enroller/scrapers/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/udemy_enroller/scrapers/base_scraper.py b/udemy_enroller/scrapers/base_scraper.py new file mode 100644 index 0000000..bc86d02 --- /dev/null +++ b/udemy_enroller/scrapers/base_scraper.py @@ -0,0 +1,123 @@ +import datetime +import logging +import re +from abc import ABC, abstractmethod +from enum import Enum +from typing import Optional + +logger = logging.getLogger("udemy_enroller") + + +class ScraperStates(Enum): + DISABLED = "DISABLED" + RUNNING = "RUNNING" + COMPLETE = "COMPLETE" + + +class BaseScraper(ABC): + def __init__(self): + self._state = None + self.scraper_name = None + self.max_pages = None + self.last_page = None + self.current_page = 0 + + @abstractmethod + async def run(self): + return + + @abstractmethod + async def get_links(self): + return + + @property + def state(self): + return self._state + + @state.setter + def state(self, value): + if any([ss for ss in ScraperStates if ss.value == value]): + self._state = value + + def set_state_disabled(self): + self.state = ScraperStates.DISABLED.value + logger.info(f"{self.scraper_name} scraper disabled") + + def set_state_running(self): + self.state = ScraperStates.RUNNING.value + logger.info(f"{self.scraper_name} scraper is running") + + def set_state_complete(self): + self.state = ScraperStates.COMPLETE.value + logger.info(f"{self.scraper_name} scraper complete") + + def is_disabled(self): + return self.state == ScraperStates.DISABLED.value + + def is_complete(self): + return self.state == ScraperStates.COMPLETE.value + + def should_run(self): + should_run = not self.is_disabled() and not self.is_complete() + if should_run: + self.set_state_running() + return should_run + + @staticmethod + def time_run(func): + async def wrapper(self): + start_time = datetime.datetime.utcnow() + try: + response = await func(self) + except Exception as e: + logger.error(f"Error while running {self.scraper_name} scrapper: {e}") + self.is_complete() + return [] + end_time = datetime.datetime.utcnow() + logger.info( + f"Got {len(response)} links from {self.DOMAIN} in {(end_time - start_time).total_seconds():.2f} seconds" + ) + return response + + return wrapper + + def max_pages_reached(self) -> bool: + """ + Returns boolean of whether or not we should continue checking tutorialbar.com + + :return: + """ + + should_run = True + + if self.max_pages is not None: + should_run = self.max_pages > self.current_page + + if not should_run: + logger.info( + f"Stopping loop. We have reached max number of pages to scrape: {self.max_pages}" + ) + self.set_state_complete() + + if self.last_page == self.current_page: + logger.info( + f"Stopping loop. We have reached the last page to scrape: {self.last_page}" + ) + self.set_state_complete() + + return should_run + + @staticmethod + def validate_coupon_url(url) -> Optional[str]: + """ + Validate the udemy coupon url passed in + If it matches the pattern it is returned else it returns None + + :param url: The url to check the udemy coupon pattern for + :return: The validated url or None + """ + url_pattern = r"^https:\/\/www.udemy.com.*couponCode=.*$" + matching = re.match(url_pattern, url) + if matching is not None: + matching = matching.group() + return matching diff --git a/udemy_enroller/scrapers/comidoc.py b/udemy_enroller/scrapers/comidoc.py new file mode 100644 index 0000000..2685d4b --- /dev/null +++ b/udemy_enroller/scrapers/comidoc.py @@ -0,0 +1,107 @@ +import asyncio +import logging +from typing import List + +from bs4 import BeautifulSoup + +from udemy_enroller.http import get +from udemy_enroller.scrapers.base_scraper import BaseScraper + +logger = logging.getLogger("udemy_enroller") + + +class ComidocScraper(BaseScraper): + """ + Contains any logic related to scraping of data from comidoc.net + """ + + DOMAIN = "https://comidoc.net" + + def __init__(self, enabled, max_pages=None): + super().__init__() + self.scraper_name = "comidoc" + if not enabled: + self.set_state_disabled() + self.max_pages = max_pages + + @BaseScraper.time_run + async def run(self) -> List: + """ + Called to gather the udemy links + + :return: List of udemy course links + """ + links = await self.get_links() + logger.info( + f"Page: {self.current_page} of {self.last_page} scraped from comidoc.net" + ) + self.max_pages_reached() + return links + + async def get_links(self) -> List: + """ + Scrape udemy links from comidoc.net + + :return: List of udemy course urls + """ + comidoc_links = [] + self.current_page += 1 + coupons_data = await get(f"{self.DOMAIN}/coupons?page={self.current_page}") + soup = BeautifulSoup(coupons_data.decode("utf-8"), "html.parser") + for course_card in soup.find_all("div", class_="MuiPaper-root"): + all_links = course_card.find_all("a") + if len(all_links) == 2: + comidoc_links.append(f"{self.DOMAIN}{all_links[1].get('href')}") + + links = await self.gather_udemy_course_links(comidoc_links) + self.last_page = self._get_last_page(soup) + + return links + + @classmethod + async def get_udemy_course_link(cls, url: str) -> str: + """ + Gets the udemy course link + + :param str url: The url to scrape data from + :return: Coupon link of the udemy course + """ + + data = await get(url) + soup = BeautifulSoup(data.decode("utf-8"), "html.parser") + for link in soup.find_all("a", href=True): + udemy_link = cls.validate_coupon_url(link["href"]) + if udemy_link is not None: + return udemy_link + + async def gather_udemy_course_links(self, courses: List[str]): + """ + Async fetching of the udemy course links from comidoc.net + + :param list courses: A list of comidoc.net course links we want to fetch the udemy links for + :return: list of udemy links + """ + return [ + link + for link in await asyncio.gather(*map(self.get_udemy_course_link, courses)) + if link is not None + ] + + @staticmethod + def _get_last_page(soup: BeautifulSoup) -> int: + """ + Extract the last page number to scrape + + :param soup: + :return: The last page number to scrape + """ + all_pages = [] + for page_link in soup.find("ul", class_="MuiPagination-ul").find_all("li"): + pagination = page_link.find("a") + + if pagination: + page_number = pagination["aria-label"].split()[-1] + if page_number.isdigit(): + all_pages.append(int(page_number)) + + return max(all_pages) diff --git a/udemy_enroller/scrapers/discudemy.py b/udemy_enroller/scrapers/discudemy.py new file mode 100644 index 0000000..fdb791f --- /dev/null +++ b/udemy_enroller/scrapers/discudemy.py @@ -0,0 +1,108 @@ +import asyncio +import logging +from typing import List + +from bs4 import BeautifulSoup + +from udemy_enroller.http import get +from udemy_enroller.scrapers.base_scraper import BaseScraper + +logger = logging.getLogger("udemy_enroller") + + +class DiscUdemyScraper(BaseScraper): + """ + Contains any logic related to scraping of data from discudemy.com + """ + + DOMAIN = "https://discudemy.com" + + def __init__(self, enabled, max_pages=None): + super().__init__() + self.scraper_name = "discudemy" + if not enabled: + self.set_state_disabled() + self.max_pages = max_pages + + @BaseScraper.time_run + async def run(self) -> List: + """ + Called to gather the udemy links + + :return: List of udemy course links + """ + links = await self.get_links() + logger.info( + f"Page: {self.current_page} of {self.last_page} scraped from discudemy.com" + ) + self.max_pages_reached() + return links + + async def get_links(self) -> List: + """ + Scrape udemy links from discudemy.com + + :return: List of udemy course urls + """ + discudemy_links = [] + self.current_page += 1 + coupons_data = await get(f"{self.DOMAIN}/all/{self.current_page}") + soup = BeautifulSoup(coupons_data.decode("utf-8"), "html.parser") + for course_card in soup.find_all("a", class_="card-header"): + url_end = course_card["href"].split("/")[-1] + discudemy_links.append(f"{self.DOMAIN}/go/{url_end}") + + links = await self.gather_udemy_course_links(discudemy_links) + + for counter, course in enumerate(links): + logger.debug(f"Received Link {counter + 1} : {course}") + + self.last_page = self._get_last_page(soup) + + return links + + @classmethod + async def get_udemy_course_link(cls, url: str) -> str: + """ + Gets the udemy course link + + :param str url: The url to scrape data from + :return: Coupon link of the udemy course + """ + + data = await get(url) + soup = BeautifulSoup(data.decode("utf-8"), "html.parser") + for link in soup.find_all("a", href=True): + udemy_link = cls.validate_coupon_url(link["href"]) + if udemy_link is not None: + return udemy_link + + async def gather_udemy_course_links(self, courses: List[str]): + """ + Async fetching of the udemy course links from discudemy.com + + :param list courses: A list of discudemy.com course links we want to fetch the udemy links for + :return: list of udemy links + """ + return [ + link + for link in await asyncio.gather(*map(self.get_udemy_course_link, courses)) + if link is not None + ] + + @staticmethod + def _get_last_page(soup: BeautifulSoup) -> int: + """ + Extract the last page number to scrape + + :param soup: + :return: The last page number to scrape + """ + + return max( + [ + int(i.text) + for i in soup.find("ul", class_="pagination3").find_all("li") + if i.text.isdigit() + ] + ) diff --git a/udemy_enroller/scrapers/manager.py b/udemy_enroller/scrapers/manager.py new file mode 100644 index 0000000..849b5b9 --- /dev/null +++ b/udemy_enroller/scrapers/manager.py @@ -0,0 +1,40 @@ +import asyncio +from functools import reduce +from typing import List + +from udemy_enroller.scrapers.discudemy import DiscUdemyScraper +from udemy_enroller.scrapers.tutorialbar import TutorialBarScraper + + +class ScraperManager: + def __init__(self, tutorialbar_enabled, discudemy_enabled, max_pages): + self.tutorialbar_scraper = TutorialBarScraper( + tutorialbar_enabled, max_pages=max_pages + ) + self.discudemy_scraper = DiscUdemyScraper( + discudemy_enabled, max_pages=max_pages + ) + self._scrapers = (self.tutorialbar_scraper, self.discudemy_scraper) + + async def run(self) -> List: + """ + Runs any enabled scrapers and returns a list of links + + :return: list + """ + urls = [] + enabled_scrapers = self._enabled_scrapers() + if enabled_scrapers: + urls = reduce( + list.__add__, + await asyncio.gather(*map(lambda sc: sc.run(), enabled_scrapers)), + ) + return urls + + def _enabled_scrapers(self) -> List: + """ + Returns a list of scrapers that should run + + :return: + """ + return list(filter(lambda sc: sc.should_run(), self._scrapers)) diff --git a/udemy_enroller/scrapers/tutorialbar.py b/udemy_enroller/scrapers/tutorialbar.py new file mode 100644 index 0000000..6158b6d --- /dev/null +++ b/udemy_enroller/scrapers/tutorialbar.py @@ -0,0 +1,128 @@ +import asyncio +import logging +from typing import List + +from bs4 import BeautifulSoup + +from udemy_enroller.http import get +from udemy_enroller.scrapers.base_scraper import BaseScraper + +logger = logging.getLogger("udemy_enroller") + + +class TutorialBarScraper(BaseScraper): + """ + Contains any logic related to scraping of data from tutorialbar.com + """ + + DOMAIN = "https://www.tutorialbar.com" + AD_DOMAINS = ("https://amzn",) + + def __init__(self, enabled, max_pages=None): + super().__init__() + self.scraper_name = "tutorialbar" + if not enabled: + self.set_state_disabled() + self.last_page = None + self.max_pages = max_pages + + @BaseScraper.time_run + async def run(self) -> List: + """ + Runs the steps to scrape links from tutorialbar.com + + :return: list of udemy coupon links + """ + links = await self.get_links() + self.max_pages_reached() + return links + + async def get_links(self): + """ + Scrape udemy links from tutorialbar.com + + :return: List of udemy course urls + """ + self.current_page += 1 + course_links = await self.get_course_links( + f"{self.DOMAIN}/all-courses/page/{self.current_page}/" + ) + + logger.info( + f"Page: {self.current_page} of {self.last_page} scraped from tutorialbar.com" + ) + udemy_links = await self.gather_udemy_course_links(course_links) + links = self._filter_ad_domains(udemy_links) + + for counter, course in enumerate(links): + logger.debug(f"Received Link {counter + 1} : {course}") + + return links + + def _filter_ad_domains(self, udemy_links) -> List: + """ + Filter out any known ad domains from the links scraped + + :param list udemy_links: List of urls to filter ad domains from + :return: A list of filtered urls + """ + ad_links = set() + for link in udemy_links: + for ad_domain in self.AD_DOMAINS: + if link.startswith(ad_domain): + ad_links.add(link) + if ad_links: + logger.info(f"Removing ad links from courses: {ad_links}") + return list(set(udemy_links) - ad_links) + + async def get_course_links(self, url: str) -> List: + """ + Gets the url of pages which contain the udemy link we want to get + + :param str url: The url to scrape data from + :return: list of pages on tutorialbar.com that contain Udemy coupons + """ + text = await get(url) + if text is not None: + soup = BeautifulSoup(text.decode("utf-8"), "html.parser") + + links = soup.find_all("h3") + course_links = [link.find("a").get("href") for link in links] + + self.last_page = ( + soup.find("li", class_="next_paginate_link") + .find_previous_sibling() + .text + ) + + return course_links + + @staticmethod + async def get_udemy_course_link(url: str) -> str: + """ + Gets the udemy course link + + :param str url: The url to scrape data from + :return: Coupon link of the udemy course + """ + + text = await get(url) + if text is not None: + soup = BeautifulSoup(text.decode("utf-8"), "html.parser") + udemy_link = ( + soup.find("span", class_="rh_button_wrapper").find("a").get("href") + ) + return udemy_link + + async def gather_udemy_course_links(self, courses: List[str]): + """ + Async fetching of the udemy course links from tutorialbar.com + + :param list courses: A list of tutorialbar.com course links we want to fetch the udemy links for + :return: list of udemy links + """ + return [ + link + for link in await asyncio.gather(*map(self.get_udemy_course_link, courses)) + if link is not None + ] diff --git a/core/settings.py b/udemy_enroller/settings.py similarity index 96% rename from core/settings.py rename to udemy_enroller/settings.py index a3bcf9b..6c9f51b 100644 --- a/core/settings.py +++ b/udemy_enroller/settings.py @@ -1,12 +1,14 @@ import getpass -import logging import os.path from distutils.util import strtobool from typing import Dict, List from ruamel.yaml import YAML, dump -logger = logging.getLogger("udemy_enroller") +from udemy_enroller.logging import get_logger +from udemy_enroller.utils import get_app_dir + +logger = get_logger() class Settings: @@ -21,7 +23,7 @@ def __init__(self, settings_path="settings.yaml"): self.languages = [] self.categories = [] - self._settings_path = settings_path + self._settings_path = os.path.join(get_app_dir(), settings_path) self.is_ci_build = strtobool(os.environ.get("CI_TEST", "False")) self._init_settings() diff --git a/core/udemy.py b/udemy_enroller/udemy.py similarity index 75% rename from core/udemy.py rename to udemy_enroller/udemy.py index ea5a886..a3a2908 100644 --- a/core/udemy.py +++ b/udemy_enroller/udemy.py @@ -1,17 +1,16 @@ -import logging -import time from enum import Enum -from selenium.common.exceptions import NoSuchElementException +from selenium.common.exceptions import NoSuchElementException, TimeoutException from selenium.webdriver.common.by import By from selenium.webdriver.remote.webdriver import WebDriver, WebElement from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import WebDriverWait -from core.exceptions import RobotException -from core.settings import Settings +from udemy_enroller.exceptions import LoginException, RobotException +from udemy_enroller.logging import get_logger +from udemy_enroller.settings import Settings -logger = logging.getLogger("udemy_enroller") +logger = get_logger() class UdemyStatus(Enum): @@ -59,7 +58,7 @@ def login(self, is_retry=False) -> None: is_robot = self._check_if_robot() if is_robot and not is_retry: input( - "Please solve the captcha before proceeding. Hit enter once solved " + "Before login. Please solve the captcha before proceeding. Hit enter once solved " ) self.login(is_retry=True) return @@ -67,7 +66,22 @@ def login(self, is_retry=False) -> None: raise RobotException("I am a bot!") raise e else: - # TODO: Verify successful login + user_dropdown_xpath = "//a[@data-purpose='user-dropdown']" + try: + WebDriverWait(self.driver, 10).until( + EC.presence_of_element_located((By.XPATH, user_dropdown_xpath)) + ) + except TimeoutException: + is_robot = self._check_if_robot() + if is_robot and not is_retry: + input( + "After login. Please solve the captcha before proceeding. Hit enter once solved " + ) + if self._check_if_robot(): + raise RobotException("I am a bot!") + self.logged_in = True + return + raise LoginException("Udemy user failed to login") self.logged_in = True def redeem(self, url: str) -> str: @@ -91,7 +105,7 @@ def redeem(self, url: str) -> str: ) if element_text not in self.settings.languages: - logger.info(f"Course language not wanted: {element_text}") + logger.debug(f"Course language not wanted: {element_text}") return UdemyStatus.UNWANTED_LANGUAGE.value if self.settings.categories: @@ -110,7 +124,7 @@ def redeem(self, url: str) -> str: if category in breadcrumbs: break else: - logger.info("Skipping course as it does not have a wanted category") + logger.debug("Skipping course as it does not have a wanted category") return UdemyStatus.UNWANTED_CATEGORY.value # Enroll Now 1 @@ -120,12 +134,13 @@ def redeem(self, url: str) -> str: EC.element_to_be_clickable((By.XPATH, buy_course_button_xpath)) ) - # Check if already enrolled - already_purchased_xpath = ( - "//div[starts-with(@class, 'buy-box--purchased-text-banner')]" - ) - if self.driver.find_elements_by_xpath(already_purchased_xpath): - logger.info(f"Already enrolled in {course_name}") + # Check if already enrolled. If add to cart is available we have not yet enrolled + add_to_cart_xpath = "//div[@data-purpose='add-to-cart']" + add_to_cart_elements = self.driver.find_elements_by_xpath(add_to_cart_xpath) + if not add_to_cart_elements or ( + add_to_cart_elements and not add_to_cart_elements[0].is_displayed() + ): + logger.debug(f"Already enrolled in {course_name}") return UdemyStatus.ENROLLED.value # Click to enroll in the course @@ -146,17 +161,25 @@ def redeem(self, url: str) -> str: # Check if zipcode exists before doing this if self.settings.zip_code: - # Assume sometimes zip is not required because script was originally pushed without this + # zipcode is only required in certain regions (e.g USA) try: - zipcode_element = self.driver.find_element_by_id( - "billingAddressSecondaryInput" + element_present = EC.presence_of_element_located( + ( + By.ID, + "billingAddressSecondaryInput", + ) + ) + WebDriverWait(self.driver, 5).until(element_present).send_keys( + self.settings.zip_code ) - zipcode_element.send_keys(self.settings.zip_code) # After you put the zip code in, the page refreshes itself and disables the enroll button for a split # second. - time.sleep(1) - except NoSuchElementException: + enroll_button_is_clickable = EC.element_to_be_clickable( + (By.XPATH, enroll_button_xpath) + ) + WebDriverWait(self.driver, 5).until(enroll_button_is_clickable) + except (TimeoutException, NoSuchElementException): pass # Make sure the price has loaded @@ -178,7 +201,7 @@ def redeem(self, url: str) -> str: # This logic should work for different locales and currencies _numbers = "".join(filter(lambda x: x if x.isdigit() else None, _price)) if _numbers.isdigit() and int(_numbers) > 0: - logger.info( + logger.debug( f"Skipping course as it now costs {_price}: {course_name}" ) return UdemyStatus.EXPIRED.value diff --git a/udemy_enroller/utils.py b/udemy_enroller/utils.py new file mode 100644 index 0000000..4f183da --- /dev/null +++ b/udemy_enroller/utils.py @@ -0,0 +1,15 @@ +import os + + +def get_app_dir() -> str: + """ + Gets the app directory where all data related to the script is stored + + :return: + """ + app_dir = os.path.join(os.path.expanduser("~"), ".udemy_enroller") + + if not os.path.isdir(app_dir): + # If the app data dir does not exist create it + os.mkdir(app_dir) + return app_dir diff --git a/udemy_enroller_chrome.py b/udemy_enroller_chrome.py deleted file mode 100644 index a10a265..0000000 --- a/udemy_enroller_chrome.py +++ /dev/null @@ -1,15 +0,0 @@ -# Install all the requirements by running requirements.py in IDLE or follow the alternate instructions at -# https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/ Make sure you have -# cleared all saved payment details on your Udemy account & the browser! -import warnings - -from udemy_enroller import parse_args, run - -if __name__ == "__main__": - browser = "chrome" - warnings.warn( - f"Please use `udemy_enroller.py --browser={browser}` as this script will be removed soon", - DeprecationWarning, - ) - args = parse_args(browser) - run(args.browser, args.max_pages, args.cache_hits) diff --git a/udemy_enroller_chromium.py b/udemy_enroller_chromium.py deleted file mode 100644 index bc21d33..0000000 --- a/udemy_enroller_chromium.py +++ /dev/null @@ -1,15 +0,0 @@ -# Install all the requirements by running requirements.py in IDLE or follow the alternate instructions at -# https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/ Make sure you have -# cleared all saved payment details on your Udemy account & the browser! -import warnings - -from udemy_enroller import parse_args, run - -if __name__ == "__main__": - browser = "chromium" - warnings.warn( - f"Please use `udemy_enroller.py --browser={browser}` as this script will be removed soon", - DeprecationWarning, - ) - args = parse_args(browser) - run(args.browser, args.max_pages, args.cache_hits) diff --git a/udemy_enroller_edge.py b/udemy_enroller_edge.py deleted file mode 100644 index e33eecf..0000000 --- a/udemy_enroller_edge.py +++ /dev/null @@ -1,15 +0,0 @@ -# Install all the requirements by running requirements.py in IDLE or follow the alternate instructions at -# https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/ Make sure you have -# cleared all saved payment details on your Udemy account & the browser! -import warnings - -from udemy_enroller import parse_args, run - -if __name__ == "__main__": - browser = "edge" - warnings.warn( - f"Please use `udemy_enroller.py --browser={browser}` as this script will be removed soon", - DeprecationWarning, - ) - args = parse_args(browser) - run(args.browser, args.max_pages, args.cache_hits) diff --git a/udemy_enroller_firefox.py b/udemy_enroller_firefox.py deleted file mode 100644 index 5d5d770..0000000 --- a/udemy_enroller_firefox.py +++ /dev/null @@ -1,16 +0,0 @@ -# Install all the requirements by running requirements.py in IDLE or follow the alternate instructions at -# https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/ Make sure you have -# cleared all saved payment details on your Udemy account & the browser! For firefox you need to manually install the -# driver on Arch Linux (sudo pacman -S geckodriver). Untested on other platforms. -import warnings - -from udemy_enroller import parse_args, run - -if __name__ == "__main__": - browser = "firefox" - warnings.warn( - f"Please use `udemy_enroller.py --browser={browser}` as this script will be removed soon", - DeprecationWarning, - ) - args = parse_args(browser) - run(args.browser, args.max_pages, args.cache_hits) diff --git a/udemy_enroller_internet_explorer.py b/udemy_enroller_internet_explorer.py deleted file mode 100644 index d1c8fa5..0000000 --- a/udemy_enroller_internet_explorer.py +++ /dev/null @@ -1,15 +0,0 @@ -# Install all the requirements by running requirements.py in IDLE or follow the alternate instructions at -# https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/ Make sure you have -# cleared all saved payment details on your Udemy account & the browser! -import warnings - -from udemy_enroller import parse_args, run - -if __name__ == "__main__": - browser = "internet_explorer" - warnings.warn( - f"Please use `udemy_enroller.py --browser={browser}` as this script will be removed soon", - DeprecationWarning, - ) - args = parse_args(browser) - run(args.browser, args.max_pages, args.cache_hits) diff --git a/udemy_enroller_opera.py b/udemy_enroller_opera.py deleted file mode 100644 index 8970d4a..0000000 --- a/udemy_enroller_opera.py +++ /dev/null @@ -1,15 +0,0 @@ -# Install all the requirements by running requirements.py in IDLE or follow the alternate instructions at -# https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/ Make sure you have -# cleared all saved payment details on your Udemy account & the browser! -import warnings - -from udemy_enroller import parse_args, run - -if __name__ == "__main__": - browser = "opera" - warnings.warn( - f"Please use `udemy_enroller.py --browser={browser}` as this script will be removed soon", - DeprecationWarning, - ) - args = parse_args(browser) - run(args.browser, args.max_pages, args.cache_hits) diff --git a/udemy_enroller_vanilla.py b/udemy_enroller_vanilla.py deleted file mode 100644 index 8e52539..0000000 --- a/udemy_enroller_vanilla.py +++ /dev/null @@ -1,34 +0,0 @@ -# Install all the requirements by running requirements.py in IDLE or follow the alternate instructions at -# https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/ Make sure you have -# cleared all saved payment details on your Udemy account & the browser! -from selenium import webdriver - -from core import Settings -from udemy_enroller import parse_args, run - -"""### **Enter the path/location of your webdriver** -By default, the webdriver for Microsoft Edge browser has been chosen in the code below. - -Also, enter the location of your webdriver. -""" - - -if __name__ == "__main__": - args = parse_args(use_manual_driver=True) - - settings = Settings() - # On windows you need the r (raw string) in front of the string to deal with backslashes. - # Replace this string with the path for your webdriver - - path = r"..location\msedgedriver.exe" - driver = webdriver.Edge(path) - # driver = webdriver.Chrome(path) # Uncomment for Google Chrome driver - # driver = webdriver.Firefox(path) # Uncomment for Mozilla Firefox driver - # driver = webdriver.Edge(path) # Uncomment for Microsoft Edge driver - # driver = webdriver.Safari(path) # Uncomment for Apple Safari driver - - # Maximizes the browser window since Udemy has a responsive design and the code only works - # in the maximized layout - driver.maximize_window() - - run(args.browser, args.max_pages, args.cache_hits, driver=driver)