Skip to content

Commit

Permalink
Merge branch 'params-change'
Browse files Browse the repository at this point in the history
  • Loading branch information
bigfarofa committed Sep 18, 2022
2 parents 02227b4 + c0b95fe commit 99c6a89
Show file tree
Hide file tree
Showing 13 changed files with 918 additions and 113 deletions.
23 changes: 14 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Web scrapper built with [Puppeteer](https://developers.google.com/web/tools/pupp
I made this because I needed to download a lot of resources and automating the process seemed a fun way to learn about web scraping.

## DISCLOSURE
This scrapper was developed based on the interface of a education institution that I'm part of. So it might not work on yours.
This scrapper was developed based on the interface of a education institution that I was part of. So it might not work on yours.

## Pre-Requisites
Before installing the scrapper, you will need the following:
Expand All @@ -23,20 +23,25 @@ You can download here: [https://nodejs.org/en/](https://nodejs.org/en/)
- Build the project: `npx tsc`

## Run configuration
You can run the scrapper by simply using `npm run start` or `node ./dist/main.js`.
You can run the scrapper by simply using `npm run start` or `node ./dist/main.js`. Remember to pay attention to the command line for inputs.

There are some run configurations you can set up. Either via command line and/or configuration file (`scrapper-config.json`).
There are some run configurations you need to set up. Either via command line and/or configuration file (`scrapper-config.json`).


There is no `scrapper-config.json` because private information might be present there, so create one based on the file: `template-scrapper-config.json`.

### Config
If there there is a option that is present in both command line and configuration file, the CLI flag and JSON parameter will be seperated by a `|`.
If there there is a option that is present in both command line and configuration file, the CLI flag and JSON parameter will be seperated by a `|`. The required parameters are `authorizeUrl`, `waitPageAfterLogin`, `modulesListPage`.

- `--username <moodle-login-username>` | `username` The username you use to login
- `--download-path <path>` | `downloadPath` The path where the resources will be downloaded(By default it's the `./downloads` folder located in the root of the project)
- `--no-headless` When this flag is on, Puppeteer will be executed with the option with the headless mode deactivated. Headless mode allows the scrapper to run without displaying the UI.

- `--authorize-url` | `authorizeUrl` The usual moodle link you use to enter it. Include the protocol(e.g http, https, https://mymoodle.com)
- `--wait-page-after-login | waitPageAfterLogin` - What page should the scrapper wait after authenticating.
- `--modules-list-page | modulesListPage` - What page contains the list of modules. This scrapper uses the Dashboard as basis to obtain the modules list.
- `--username <moodle-login-username>` | `username` The username you use to login.
- `--download-path <path>` | `downloadPath` The path where the resources will be downloaded(By default it's the `./downloads` folder located in the root of the project).

- `--headless` | `headless` When this parameter is set to `true`, Puppeteer will be executed with the option with the [headless mode](https://pptr.dev/api/puppeteer.browserlaunchargumentoptions.headless/) activated. Headless mode allows the scrapper to run without displaying the UI. Default value is `false`. Optional.

- `--auth-method` | `authMethod` - It can have one of the following values: `"user-control"`, `"terminal-user-passw"`. Default `user-control`


Expand All @@ -45,8 +50,8 @@ It's useful if you don't want to input your credentials in the terminal, to see
Only disadvantage is that the UI will need to be displayed, which requires the headless mode to be disabled.


- - `terminal-user-passw` The username and password will be prompted by the terminal. Doing this way allows the scrapper/puppeteer to run in headless mode(or not with the flag `--no-headless`)
- - `terminal-user-passw` The username and password will be prompted by the terminal. Doing this way allows the scrapper/puppeteer to run in headless mode.

- - If you want to use `terminal-user-passw`, it can also be run with the shortcut: `npm run start:auth-terminal`


- - I prefer using `terminal-user-passw` because the UI does not provide much information, it will be mostly noise. Microsoft 2FA does not require extra input, just confirmation on the phone.
Loading

0 comments on commit 99c6a89

Please sign in to comment.