This Flask application retrieves a list of pages within a specified category of a Wikimedia Project and identifies which external links within each page are dead. It was developed to fulfill a request from the Lusophone technological wishlist and was further enhanced during an Outreachy project.
- Retrieves pages in a specified category from a Wikimedia Project;
- Checks each page for dead external links;
- Displays the results indicating which links are dead for each page.
- Clone the repository:
git clone https://github.com/WikiMovimentoBrasil/deadlinkscanner.git
- Set up a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows use venv\Scripts\activate
- Install the dependencies:
pip install -r requirements.txt
- Set up a
config.yaml
file with the following configuration:
SECRET_KEY: "YOUR_SECRET_KEY"
BABEL_DEFAULT_LOCALE: "DEFAULT_LANGUAGE"
APPLICATION_ROOT: "APPLICATION_ROOT/"
OAUTH_MWURI: "https://meta.wikimedia.org/w/index.php"
CONSUMER_KEY: "YOUR CONSUMER KEY"
CONSUMER_SECRET: "YOUR CONSUMER SECRET"
LANGUAGES: ["pt","en", "<OTHER LANGUAGES>]
- Run the application :
flask run
- Navigate to
http://localhost:5000
in your web browser to access the application.
- Enter the url of the page of a Wikimedia project category.
- Define the depth of subcategories you want to search.
- Click in 'Submit' and wait for the application to retrieve and process the information.
- View the results in the CSV file showing which external links are dead for each page.
- This application was inspired by a request from the Lusophone technological wishlist.
- Development of this application was partially supported by an Outreachy project.
- Inspiration of the background code was extracted from the work of Alwoch Sophia in the DeadLinkChecker project.
This project is licensed under the MIT License - see the LICENSE file for details.