This project consists of Python scripts designed to crawl and extract data from the Credly platform. The main components of the project are:
- crawl-by-arg.py
- crawl-by-search-terms.py
- crawl-by-skills.py
- get-badges.py
- helper.py
Python 3.x requests library Install the requirements using the following command:
pip install requests
This script crawls the Credly platform using a single search term passed as a command-line argument.
Usage:
python crawl-by-arg.py <search_term>
This script crawls the Credly platform using a list of search terms specified in the data/search-terms.json
file.
python crawl-by-search-terms.py
This script crawls the Credly platform using a list of skills that are retrieved from the data/skills.json file.
python crawl-by-skills.py
This script retrieves all badges for each organization specified in the data/organizations.json file. The badges are then saved to the data/badges.json file.
python get-badges.py
This script contains helper functions used by the other scripts in this project. Functions include:
- get_skills_file()
- get_organizations_file()
- get_badges_file()
- get_search_terms_file()
- get_items_by_search_term(search_term)
- search_terms()
- get_items_from_file(file_name)
- set_items_from_file(file_name, items)
- crawl_search_terms(terms)
Before running the scripts, make sure to create the necessary data files in the data directory:
- skills.json
- organizations.json
- badges.json
- search-terms.json
Each of these files should contain an empty JSON object {} if there is no initial data.