Skip to content

Commit

Permalink
enhances quality of documentation (#19)
Browse files Browse the repository at this point in the history
  • Loading branch information
speedyconzales authored May 8, 2024
1 parent 4afc9b2 commit e6b7d8d
Show file tree
Hide file tree
Showing 2 changed files with 81 additions and 22 deletions.
9 changes: 6 additions & 3 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
.git/
.DS_Store
*.mp4
**/*.DS_Store
**/*.mp4
.idea/
.vscode/
downloaded_files/
.github/
__pycache__/
**/__pycache__/
*.py[cod]
*$py.class
test/
config.yml
src/extensions/recaptcha-solver/
Dockerfile
.gitignore
.gitattributes
94 changes: 75 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Series-Scraper

## Supported Sites
- headless and completely automated scraping of the following sites:
headless and completely automated scraping of the following sites:
- [aniworld.to](https://aniworld.to)
- [s.to](https://s.to)
- [bs.to](https://bs.to) -> uses headless browser for scraping and solving captchas
Expand All @@ -12,34 +12,90 @@
- [Streamtape](https://streamtape.com)

## Usage
1. if you are familiar with docker just use:
1. `docker pull speedyconzales/series-scraper`
2. `docker run --rm -e PUID=<your_user_id> -e PGID=<your_group_id> -v <path-to-your-anime-folder>:/app/anime -v <path-to-your-series-folder>:/app/series speedyconzales/series-scraper s6-setuidgid abc python3 main.py` followed by the arguments you want to provide
2. if you don't want to use docker or there is no suitable docker image available for your architecture, you can use the following steps to run the scraper:
* If you are familiar with docker
1. just use:
```bash
docker pull speedyconzales/series-scraper
```
2. and then:

* either docker run
```bash
docker run \
--rm \
-e PUID=[YOUR_USER_ID] \
-e PGID=[YOUR_GROPU_ID] \
-v [PATH_TO_YOUR_ANIME_FOLDER]:/app/anime \
-v [PATH_TO_YOUR_SERIES_FOLDER]:/app/series \
speedyconzales/series-scraper \
s6-setuidgid abc \
python3 main.py
```
followed by the [arguments](#arguments) you want to provide

* or docker compose
docker-compose.yml:
```yaml
services:
series-scraper:
image: speedyconzales/series-scraper
container_name: series-scraper
volumes:
- [PATH_TO_YOUR_ANIME_FOLDER]:/app/anime
- [PATH_TO_YOUR_SERIES_FOLDER]:/app/series
environment:
- PUID=[YOUR_USER_ID]
- PGID=[YOUR_GROPU_ID]
- TZ=Europe/Berlin
```
and run
```bash
docker compose run --rm s6-setuidgid abc python3 main.py
```
followed by the [arguments](#arguments) you want to provide
* If you don't want to use docker or there is no suitable docker image available for your architecture, you can use the following steps to run the scraper:
1. clone the repository
2. install the dependencies
3. run the scraper `python3 main.py` with the provided arguments
```bash
git clone https://github.com/speedyconzales/series-scraper.git
```
2. install the [dependencies](#dependencies)
3. run the scraper
```bash
python3 main.py
```
followed by the [arguments](#arguments) you want to provide
## Dependencies
1. install [ffmpeg](https://ffmpeg.org/download.html) and make sure it is in PATH -> `ffmpeg -version`
2. install [chrome](https://www.google.com/chrome/) or [chromium](https://www.chromium.org/getting-involved/download-chromium/)
3. install requirements via `pipenv` and Pipfile OR create a virtual environment and install them via `pip` and requirements.txt
3. install required python packages. Given: you have python3.12 installed
* either via `pipenv` and Pipfile
```bash
pipenv install
pipenv shell
```
* or create a virtual environment and install them via `pip` and requirements.txt
```bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
4. copy and rename `template.yml` to `config.yml` and fill in the required folder paths for the respective type of the content
5. if you want to download from [bs.to](?plain=1#L7) you need to make sure that the recaptcha-solver.crx binary file is present in the src/extensions folder. Either download it with `git lfs` or download it as a raw file from the [github repo](https://github.com/speedyconzales/series-scraper/blob/main/src/extensions/recaptcha-solver.crx)
## Arguments
1. use `--help` to get a list of all available arguments
2. provide the `<url>` of the series. The `series-name` has to be present in the url.
That means: navigate to one of the supported sites. Search for the series you want to download and simply copy/paste the url. The url should look like
`https://aniworld.to/anime/stream/<series-name>` or
`https://s.to/serie/stream/<series-name>` or
`https://bs.to/serie/<series-name>`
3. use optional argument `-l` to choose the language of the content being either `Ger-Sub`, `Eng-Sub` or `English`. Default is `Deutsch`
4. use optional argument `-s` for the season number. If not specified all seasons will be scraped but not the movies or specials. -> Providing `0` as season number scrapes the respective movies or specials of that series
5. use optional argument `-e` for the episode number. If not specified all episodes of the season will be scraped
6. use optional argument `-t` to specify the number of threads or concurrent downloads. Default is 2. Do not choose too high numbers as the server might block too frequent requests
7. use optional argument `-a` to declare this content as anime. Only useful for `bs.to` as it does not distinguish between series and anime on the site
| Argument | Function |
| :--------------: | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `<url>` | Provide the `<url>` of the series. The `series-name` has to be present in the url. That means: navigate to one of the supported sites. Search for the series you want to download and simply copy/paste the url. The url should look like `https://aniworld.to/anime/stream/<series-name>` or `https://s.to/serie/stream/<series-name>` or `https://bs.to/serie/<series-name>` |
| `--help` | get a list of all available arguments |
| `-l, --language` | **Default:** `Deutsch`. Choose the language of the content being either `Ger-Sub`, `Eng-Sub` or `English` |
| `-s, --season` | Choose the season number. If not specified all seasons will be scraped but not the movies or specials. -> Providing `0` as season number scrapes the respective movies or specials of that series |
| `-e, --episode` | Choose the episode number. If not specified all episodes of the season will be scraped |
| `-t, --threads` | **Default:** 2. Specify the number of threads or concurrent downloads. Do not choose too high numbers as the server might block too frequent requests |
| `-a, --anime` | Declare this content as anime. Only useful for `bs.to` as it does not distinguish between series and anime on the site |
## Credits
- [wolfswolke](https://github.com/wolfswolke) for the continuous implementation of [aniworld_scraper](https://github.com/wolfswolke/aniworld_scraper)

0 comments on commit e6b7d8d

Please sign in to comment.