headless and completely automated scraping of the following sites:
- aniworld.to
- s.to
- bs.to -> uses headless browser for scraping and solving captchas
- If you are familiar with docker
-
just use:
docker pull speedyconzales/series-scraper
-
and then:
-
either docker run
docker run \ --rm \ -e PUID=[YOUR_USER_ID] \ -e PGID=[YOUR_GROUP_ID] \ -v [PATH_TO_YOUR_ANIME_FOLDER]:/app/anime \ -v [PATH_TO_YOUR_SERIES_FOLDER]:/app/series \ speedyconzales/series-scraper \ s6-setuidgid abc \ python3 main.py
followed by the arguments you want to provide
-
or docker compose
docker-compose.yml:services: series-scraper: image: speedyconzales/series-scraper container_name: series-scraper volumes: - [PATH_TO_YOUR_ANIME_FOLDER]:/app/anime - [PATH_TO_YOUR_SERIES_FOLDER]:/app/series environment: - PUID=[YOUR_USER_ID] - PGID=[YOUR_GROUP_ID] - TZ=Europe/Berlin
and run
docker compose run --rm s6-setuidgid abc python3 main.py
followed by the arguments you want to provide
-
-
- If you don't want to use docker or there is no suitable docker image available for your architecture, you can use the following steps to run the scraper:
-
clone the repository
git clone https://github.com/speedyconzales/series-scraper.git
-
install the dependencies
-
run the scraper
python3 main.py
followed by the arguments you want to provide
-
- install ffmpeg and make sure it is in PATH ->
ffmpeg -version
- install chrome or chromium
- install required python packages. Given: you have python3.12 installed
- either via
pipenv
and Pipfilepipenv install pipenv shell
- or create a virtual environment and install them via
pip
and requirements.txtpython3 -m venv venv source venv/bin/activate pip install -r requirements.txt
- either via
- copy and rename
template.yml
toconfig.yml
and fill in the required folder paths for the respective type of the content - if you want to download from bs.to you need to make sure that the recaptcha-solver.crx binary file is present in the src/extensions folder. Either download it with
git lfs
or download it as a raw file from the github repo
Argument | Function |
---|---|
<url> |
Provide the <url> of the series. The series-name has to be present in the url. That means: navigate to one of the supported sites. Search for the series you want to download and simply copy/paste the url. The url should look like https://aniworld.to/anime/stream/<series-name> or https://s.to/serie/stream/<series-name> or https://bs.to/serie/<series-name> |
--help |
get a list of all available arguments |
-l, --language |
Default: Deutsch . Choose the language of the content being either Ger-Sub , Eng-Sub or English |
-s, --season |
Default: All seasons will be scraped but not the movies or specials. Choose the season number. -> Providing 0 as season number scrapes the respective movies or specials of that series |
-e, --episode |
Default: All episodes of the season will be scraped. Choose either one episode or a list of episodes separated by spaces. You can also specify a range of episodes e.g.: -e 2 3 10-15 17 |
-t, --threads |
Default: 2. Specify the number of threads or concurrent downloads. Do not choose too high numbers as the server might block too frequent requests |
-p, --provider |
Default: Downloads will follow this priority: Vidoza > VOE > Doodstream > Streamtape > Vidmoly > SpeedFiles. If the episode is not available on the hoster it will try the next. Specify the hoster/provider you want to download from |
-a, --anime |
Declare this content as anime. Only useful for bs.to as it does not distinguish between series and anime on the site |
- wolfswolke for the continuous implementation of aniworld_scraper