Merge pull request #119 from aapatre/develop

Release 0.3 PR
aapatre · Nov 26, 2020 · d3b37be · d3b37be
2 parents 7782977 + cae22ab
commit d3b37be
Show file tree

Hide file tree

Showing 15 changed files with 362 additions and 320 deletions.
diff --git a/.gitignore b/.gitignore
@@ -224,4 +224,10 @@ Pipfile.lock
 
 settings.yaml
 
-pyproject.toml
+poetry.lock
+
+# Editor specific
+.vscode
+
+# Cache files
+.course_cache
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,15 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to
 [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.3] - 2020-11-26
+
+### Added
+
+- Add configuration to choose the categories you would like your free courses to be under
+- Better handling of enrolled courses, invalid coupons, unwanted categories and unwanted languages
+- Basic caching of courses which we have previously tried to enroll in. Improves speed of subsequent runs
+- Give control back to user when we have a robot check on udemy login. Once solved the user can hit enter and the script 
+can continue as normal
 
 ## [0.2] - 2020-11-05
 
@@ -31,6 +40,8 @@ and this project adheres to
   project running locally. Suitable for users who are not looking forward to
   contribute.
 
+[0.3]:
+  https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/releases/tag/v0.3
 [0.2]:
   https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/releases/tag/v0.2
 [0.1]:

diff --git a/README.md b/README.md
@@ -75,7 +75,8 @@ get all the requirements installed in one go.
   using a text editor and insert your **Udemy registered email in the email
   section**, your **Udemy password in the password section**, and the **ZIP Code
   in the zipcode section (if you reside in the United States or any other region
-  where Udemy asks for ZIP Code as Billing Info, else enter a random number)**.
+  where Udemy asks for ZIP Code as Billing Info, else enter a random number)**
+  Additionally you can add your preferred languages and course categories.
 
 2 . Choose the appropriate file for your browser (from the list below):
 
@@ -88,7 +89,7 @@ get all the requirements installed in one go.
   - Edge:
     [udemy_enroller_edge.py](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller_edge.py)
 
-- **Has issues:**
+- **Has issues when run on custom kernel but works fine on vanilla OS:**
 
   - Firefox:
     [udemy_enroller_firefox.py(requires manual driver installation)](https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE/blob/master/udemy_enroller_firefox.py)
@@ -124,7 +125,7 @@ first page, it then moves to the next Tutorial Bar page and the cycle continues.
 ### 1. Can I get a specific course for free with this script?
 
 Unfortunately no, but let me assure you that you may be lucky enough to get a
-particular course for free when the instructor posts it's coupon code in order
+particular course for free when the instructor posts its coupon code in order
 to promote it. Also, over time you would build a library of courses by running
 the script often and have all the required courses in your collection. In fact,
 I made this course after completing a
@@ -161,8 +162,9 @@ it will save your precious time too! :)
 ### 5. Udemy has detected that I'm using automation tools to browse the website! What should I do?
 
 ![](https://i.imgur.com/pwseilE.jpg) Relax! This happens when you run the script
-several times in a short interval of time. Solve the captcha, close the browser,
-and simply re-run the script. Easy peasy lemon squeezy! 🍋🙃 <br /><br />
+several times in a short interval of time. Solve the captcha, hit enter in the terminal window you are running 
+the script from and allow the script to continue as normal.
+Easy peasy lemon squeezy! 🍋🙃 <br /><br />
 
 ### 6. The code compiles successfully but it's taking too long to work! IS there any way to fix that?
 
@@ -189,3 +191,9 @@ Take a look at our
 and help us on what you want or talk to us about your proposed changes.
 
 ---
+
+## Supporter
+
+[![JetBrains](https://i.imgur.com/h2R018M.jpg)](https://jetbrains.com/?from=udemy-free-course-enroller)
+
+Thanks to [JetBrains](https://jetbrains.com/?from=udemy-free-course-enroller) for supporting us. They are the maker of world class IDE and developer tooling. If you think their product might help you, please support them. 
diff --git a/core/cache.py b/core/cache.py
@@ -0,0 +1,61 @@
+import datetime
+import json
+import os
+
+
+class CourseCache(object):
+    """
+    Basic cache to keep details on courses already scraped
+    """
+
+    def __init__(self):
+        self._file_name = ".course_cache"
+        self._cache = []
+        self._load_cache()
+
+    def __contains__(self, url: str) -> bool:
+        """
+        Simply check if the url is already in the cache
+
+        :param str url: URL to check the cache for
+        :return:
+        """
+        return url in [c["url"] for c in self._cache]
+
+    def _load_cache(self) -> None:
+        """
+        Load the cache into memory when we initialize
+
+        :return:
+        """
+        file_mode = "r" if os.path.isfile(self._file_name) else "w+"
+        with open(self._file_name, file_mode) as f:
+            cached_data = f.read().splitlines()
+            if cached_data:
+                self._cache = list(map(json.loads, cached_data))
+
+    def _append_cache(self, data: str) -> None:
+        """
+        Append the new data to the cache
+
+        :param str data: Data to append to the cache
+        :return:
+        """
+        with open(self._file_name, "a") as f:
+            f.write(f"{data}\n")
+
+    def add(self, url: str, status: str) -> None:
+        """
+        Add a result our cache
+
+        :param str url: URL of the udemy course to cache
+        :param str status: The status of the course determined by the script
+        :return:
+        """
+        _data_to_cache = {
+            "url": url,
+            "status": status,
+            "date": datetime.datetime.utcnow().isoformat(),
+        }
+        self._cache.append(_data_to_cache)
+        self._append_cache(json.dumps(_data_to_cache))
diff --git a/core/exceptions.py b/core/exceptions.py
@@ -0,0 +1,6 @@
+class RobotException(Exception):
+    """
+    You have been identified as a robot on Udemy site
+    """
+
+    pass
diff --git a/core/settings.py b/core/settings.py
@@ -1,11 +1,9 @@
 import getpass
 import os.path
 from distutils.util import strtobool
-from typing import Dict
-from typing import List
+from typing import Dict, List
 
-from ruamel.yaml import dump
-from ruamel.yaml import YAML
+from ruamel.yaml import YAML, dump
 
 
 class Settings:
@@ -18,6 +16,7 @@ def __init__(self):
         self.password = None
         self.zip_code = None
         self.languages = []
+        self.categories = []
 
         self._settings_path = "settings.yaml"
         self.is_ci_build = strtobool(os.environ.get("CI", "False"))
@@ -65,6 +64,8 @@ def _load_user_settings(self) -> Dict:
             self.password = udemy_settings["password"]
             self.zip_code = udemy_settings.get("zipcode")
             self.languages = udemy_settings.get("languages")
+            self.categories = udemy_settings.get("categories")
+
         return settings
 
     def _generate_settings(self) -> None:
@@ -77,6 +78,7 @@ def _generate_settings(self) -> None:
         self.password = self._get_password()
         self.zip_code = self._get_zip_code()
         self.languages = self._get_languages()
+        self.categories = self._get_categories()
 
     def _get_email(self) -> str:
         """
@@ -109,12 +111,11 @@ def _get_zip_code() -> str:
 
         :return: The users udemy zip code
         """
-        zip_code = input(
-            "Please enter your zipcode (Not necessary in some regions): ")
+        zip_code = input("Please enter your zipcode (Not necessary in some regions): ")
         return zip_code
 
     @staticmethod
-    def _get_languages() -> List:
+    def _get_languages() -> List[str]:
         """
         Get input from user on the languages they want to get courses in
 
@@ -123,8 +124,23 @@ def _get_languages() -> List:
         languages = input(
             "Please enter your language preferences (comma separated list e.g. English,German): "
         )
-        return [lang.strip()
-                for lang in languages.split(",")] if languages else []
+        return [lang.strip() for lang in languages.split(",")] if languages else []
+
+    @staticmethod
+    def _get_categories() -> List[str]:
+        """Gets the categories the user wants.
+
+        :return: list of categories the user wants."""
+        categories = input(
+            "Please enter in a list of comma separated values of"
+            " the course categories you like, for example:\n"
+            "Development, Design\n> "
+        )
+        return (
+            [category.strip() for category in categories.split(",")]
+            if categories
+            else []
+        )
 
     def _save_settings(self) -> None:
         """
@@ -133,14 +149,14 @@ def _save_settings(self) -> None:
         :return:
         """
         yaml_structure = dict()
-        save_settings = input(
-            "Do you want to save settings for future use (Y/N): ")
+        save_settings = input("Do you want to save settings for future use (Y/N): ")
         if save_settings.lower() == "y":
             yaml_structure["udemy"] = {
                 "email": str(self.email),
                 "password": str(self.password),
                 "zipcode": str(self.zip_code),
                 "languages": self.languages,
+                "categories": self.categories,
             }
 
             with open(self._settings_path, "w+") as f:

diff --git a/core/tutorialbar.py b/core/tutorialbar.py
@@ -11,6 +11,7 @@ class TutorialBarScraper:
     """
 
     DOMAIN = "https://www.tutorialbar.com"
+    AD_DOMAINS = ("https://amzn",)
 
     def __init__(self):
         self.current_page = 0
@@ -31,11 +32,12 @@ def run(self) -> List:
 
         print(f"Page: {self.current_page} of {self.last_page} scraped")
         udemy_links = self.gather_udemy_course_links(course_links)
+        filtered_udemy_links = self._filter_ad_domains(udemy_links)
 
-        for counter, course in enumerate(udemy_links):
+        for counter, course in enumerate(filtered_udemy_links):
             print(f"Received Link {counter + 1} : {course}")
 
-        return udemy_links
+        return filtered_udemy_links
 
     def is_first_loop(self) -> bool:
         """
@@ -45,6 +47,22 @@ def is_first_loop(self) -> bool:
         """
         return self.current_page == 1
 
+    def _filter_ad_domains(self, udemy_links) -> List:
+        """
+        Filter out any known ad domains from the links scraped
+
+        :param list udemy_links: List of urls to filter ad domains from
+        :return: A list of filtered urls
+        """
+        ad_links = set()
+        for link in udemy_links:
+            for ad_domain in self.AD_DOMAINS:
+                if link.startswith(ad_domain):
+                    ad_links.add(link)
+        if ad_links:
+            print(f"Removing ad links from courses: {ad_links}")
+        return list(set(udemy_links) - ad_links)
+
     def get_course_links(self, url: str) -> List:
         """
         Gets the url of pages which contain the udemy link we want to get
@@ -53,17 +71,17 @@ def get_course_links(self, url: str) -> List:
         :return: list of pages on tutorialbar.com that contain Udemy coupons
         """
         response = requests.get(url=url)
+
         soup = BeautifulSoup(response.content, "html.parser")
-        links = soup.find("div", class_="rh-post-wrapper").find_all("a")
-        self.last_page = links[-2].text
-        courses = []
 
-        x = 0
-        for _ in range(self.links_per_page):
-            courses.append(links[x].get("href"))
-            x += 3
+        links = soup.find_all("h3")
+        course_links = [link.find("a").get("href") for link in links]
+
+        self.last_page = (
+            soup.find("li", class_="next_paginate_link").find_previous_sibling().text
+        )
 
-        return courses
+        return course_links
 
     @staticmethod
     def get_udemy_course_link(url: str) -> str: