Skip to content

Commit

Permalink
Merge branch 'release-0.4.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
bigfarofa committed May 24, 2021
2 parents 97e6767 + e949671 commit 02227b4
Show file tree
Hide file tree
Showing 12 changed files with 278 additions and 135 deletions.
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,12 @@ I made this because I needed to download a lot of resources and automating the p
## DISCLOSURE
This scrapper was developed based on the interface of a education institution that I'm part of. So it might not work on yours.

## Pre-Requisites
Before installing the scrapper, you will need the following:


You will need to have Node.js installed. A version of grater or equal to `v12.x`.
You can download here: [https://nodejs.org/en/](https://nodejs.org/en/)
## Installation

- Install the dependencies: `npm install`
Expand All @@ -29,7 +34,19 @@ There is no `scrapper-config.json` because private information might be present
If there there is a option that is present in both command line and configuration file, the CLI flag and JSON parameter will be seperated by a `|`.

- `--username <moodle-login-username>` | `username` The username you use to login
- `--download-path <path>` | `downloadPath` The path where the resources will be downloaded(By default it's the `./downloads` folder)
- `--no-headless` When this flag is on, Puppeteer will be executed with the option with the headless mode deactivated.
- `--download-path <path>` | `downloadPath` The path where the resources will be downloaded(By default it's the `./downloads` folder located in the root of the project)
- `--no-headless` When this flag is on, Puppeteer will be executed with the option with the headless mode deactivated. Headless mode allows the scrapper to run without displaying the UI.
- `--wait-page-after-login | waitPageAfterLogin` - What page should the scrapper wait after authenticating.
- `--auth-method` | `authMethod` - It can have one of the following values: `"user-control"`, `"terminal-user-passw"`. Default `user-control`


- - `user-control` allows you to insert your username and password in the browser's page like an usual login procedure.
It's useful if you don't want to input your credentials in the terminal, to see the scrapper steps, and if for some reason the authentication requires more input than just password and username.
Only disadvantage is that the UI will need to be displayed, which requires the headless mode to be disabled.


- - `terminal-user-passw` The username and password will be prompted by the terminal. Doing this way allows the scrapper/puppeteer to run in headless mode(or not with the flag `--no-headless`)
- - If you want to use `terminal-user-passw`, it can also be run with the shortcut: `npm run start:auth-terminal`


- - I prefer using `terminal-user-passw` because the UI does not provide much information, it will be mostly noise. Microsoft 2FA does not require extra input, just confirmation on the phone.
5 changes: 3 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
{
"name": "mooodle-resource-downloader",
"version": "0.3.0",
"version": "0.4.0",
"description": "Puppeteer scrapper to download moodle's resources",
"main": "./dist/main.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1",
"build": "npx tsc",
"start": "node ./dist/main.js"
"start": "node ./dist/main.js",
"start:auth-terminal": "node ./dist/main.js --auth-method terminal-user-passw"
},
"author": "Lucas Gomes",
"license": "MIT",
Expand Down
11 changes: 11 additions & 0 deletions src/IScrapperConfig.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
export enum EnumAuthMethod {
TERMINAL_USER_PASSW = "terminal-user-passw",
USER_CONTROL = "user-control"
}
export interface IScrapperConfig {
username?: string;
downloadPath?: string;
authorizeUrl?: string;
waitPageAfterLogin?: string;
authMethod?: EnumAuthMethod;
}
118 changes: 75 additions & 43 deletions src/main.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,16 @@ import path from 'path';
import mime from 'mime-types';
import prompts from 'prompts';

import {ScrapperConfig} from './types';
import {EnumAuthMethod, IScrapperConfig} from './IScrapperConfig';
import {escapeRegExpSpecialCharacters} from './utils/string-utils';
import * as scriptUtils from './utils/script-utils';
import ScrapperNavigator from './scrapper-navigator';
import UserPasswAuth from './scrapper-navigator/authenticators/UserPasswAuth';
import UserControlAuth from './scrapper-navigator/authenticators/UserControlAuth';
import { ModulePage } from './scrapper-navigator/module-page';
import IAuthProcess from './scrapper-navigator/authenticators/IAuthProcess';

let scrapperConfig: ScrapperConfig = {};
let scrapperConfig: IScrapperConfig = {};

let scrapperConfigDir = "./scrapper-config.json";

Expand All @@ -29,65 +32,94 @@ if(fs.existsSync(scrapperConfigDir)){


async function execute() : Promise<void>{
let userEmail = "";
let userEmail: string | undefined = undefined;

let defaultDownloadPath = __dirname + "/../downloads";
let downloadPathInConfig = scrapperConfig.downloadPath || undefined;
let resourcesDownloadPath = getArgParam("--download-path") || downloadPathInConfig || defaultDownloadPath;

if (scrapperConfig.username) {
userEmail = scrapperConfig.username;
} else if (getArgParam("--username")) {
userEmail = getArgParam("--username") as string;
} else {
let userResponse = await prompts({
type: 'text',
name: 'username',
message: `Username:`
})
userEmail = userResponse.username;
let authMethod = getArgParam("--auth-method") || scrapperConfig.authMethod || EnumAuthMethod.USER_CONTROL;
let authorizeUrl = getArgParam("--authorize-url") || scrapperConfig.authorizeUrl;
let waitForPageAfterLogin = getArgParam("--wait-page-after-login") || scrapperConfig.waitPageAfterLogin;
if (!authorizeUrl) {
throw new Error("Authorization URL not provided.")
}

let passwdResponse = await prompts({
type: 'password',
name: 'password',
message: 'Password'
let isHeadless = true;
if (hasArg("--no-headless")) {
isHeadless = false;
} else if (authMethod === EnumAuthMethod.USER_CONTROL) {
isHeadless = false;
}
const browser = await puppeteer.launch({
headless: isHeadless,
product: "chrome",
timeout: 90000
});


let userPassword = passwdResponse.password;

if (!userEmail) {
throw new Error("USERNAME IS REQUIRED");
}
const page = await browser.newPage();

if (!userPassword) {
throw new Error("PASSWORD IS REQUIRED");
}
let authenticator: IAuthProcess | undefined = undefined;


if (authMethod === EnumAuthMethod.TERMINAL_USER_PASSW) {
if (scrapperConfig.username) {
userEmail = scrapperConfig.username;
} else if (getArgParam("--username")) {
userEmail = getArgParam("--username") as string;
} else {
let userResponse = await prompts({
type: 'text',
name: 'username',
message: `Username:`
})
userEmail = userResponse.username;
}

let passwdResponse = await prompts({
type: 'password',
name: 'password',
message: 'Password'
});

const browser = await puppeteer.launch({
headless: !hasArg("--no-headless"),
product: "chrome"
});

let userPassword = passwdResponse.password;

if (!userEmail) {
throw new Error("USERNAME IS REQUIRED");
}

if (!userPassword) {
throw new Error("PASSWORD IS REQUIRED");
}




const page = await browser.newPage();
authenticator = new UserPasswAuth(page, {
authorizeUrl: authorizeUrl,
username: userEmail,
password: userPassword,
waitForPageAfterLogin: waitForPageAfterLogin
});

let scrapperNavigator = new ScrapperNavigator(page);
} else {

let authorizeUrl = getArgParam("--authorize-url") || scrapperConfig.authorizeUrl;
let waitForPageAfterLogin = getArgParam("--wait-page-after-login") || scrapperConfig.waitPageAfterLogin;
if (!authorizeUrl) {
throw new Error("Authorization URL not provided.")
userEmail = getArgParam("--username") || scrapperConfig.username;
authenticator = new UserControlAuth(page, {
username: userEmail,
authorizeUrl: authorizeUrl,
waitForPageAfterLogin: waitForPageAfterLogin
});
}
await scrapperNavigator.authenticate({
authorizeUrl: authorizeUrl,
username: userEmail,
password: userPassword,
waitForPageAfterLogin: waitForPageAfterLogin
});





let scrapperNavigator = new ScrapperNavigator(page, {authenticator: authenticator});

await scrapperNavigator.authenticate();

console.log("AUTHENTICATION DONE")

Expand Down
4 changes: 4 additions & 0 deletions src/scrapper-navigator/authenticators/IAuthProcess.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

export default interface IAuthProcess {
authenticate() : Promise<void>;
}
56 changes: 56 additions & 0 deletions src/scrapper-navigator/authenticators/UserControlAuth.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import IAuthProcess from './IAuthProcess';
import {AuthenticateConfig} from '../types';
import { Page } from 'puppeteer';
import asyncTimeout from '../../utils/async-timeout';

export interface IUserControlAuthConfig {
authorizeUrl: string;
goToPageAfterLogin?: string;
waitForPageAfterLogin?: string;
username?: string;
}



export default class UserControlAuth implements IAuthProcess {
page: Page;
config: IUserControlAuthConfig;
constructor(page: Page, config: IUserControlAuthConfig) {
this.page = page;
this.config = config;
}

async authenticate(){
let config = this.config;
await this.page.goto(config.authorizeUrl);

if (config.username) {
await this.page.type('[name="UserName"]', config.username);
await asyncTimeout(1000);
await this.page.click("#nextButton");
}
if (config.waitForPageAfterLogin) {

await this.page.waitForResponse(config.waitForPageAfterLogin);
//await this.procedureWaitForPageAfterLogin(this.page, config.waitForPageAfterLogin);
}
if (config.goToPageAfterLogin) {
await this.page.goto(config.goToPageAfterLogin);
}
console.log("MOODLE PAGE YEAH!");
}

procedureWaitForPageAfterLogin(page: Page, waitForPageAfterLoginUrl: string) : Promise<void>{
return new Promise((resolve, reject) => {
page.once("requestfinished", function(e){
if(e.url() === waitForPageAfterLoginUrl) {
resolve();
}
})
})



}

}
81 changes: 81 additions & 0 deletions src/scrapper-navigator/authenticators/UserPasswAuth.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import IAuthProcess from './IAuthProcess';
import {AuthenticateConfig} from '../types';
import { Page } from 'puppeteer';
import asyncTimeout from '../../utils/async-timeout';


export interface IUserPasswAuthenticateConfig extends AuthenticateConfig {
username: string;
password: string;
goToPageAfterLogin?: string;
waitForPageAfterLogin?: string;
}



/**
* Inputs user and password in the name of the user.
* If the login requires 2 Factor authenticatication,
* it might be useful to run the script in non-headless mode.
*
*/
export default class UserPasswAuth implements IAuthProcess {
page: Page;
config: IUserPasswAuthenticateConfig;
constructor(page: Page, config: IUserPasswAuthenticateConfig) {
this.page = page;
this.config = config;
}
async authenticate(){
let username = "";
let config = this.config;
if (config && config.username) {
username = config.username;
} else if (this.config && this.config.username) {
username = this.config.username;
} else {
throw new Error("USERNAME_REQUIRED");
}

if (!config.password) {
throw new Error("PASSWORD_REQUIRED");
}

if (!config.authorizeUrl) {
throw new Error("AUTHORIZE_URL_REQUIRED");
}

let password = config.password;

console.log("[WARNING] If you're using 2 Factor Authentication, remember to check your phone for any prompts.");
await this.page.goto(config.authorizeUrl);
await this.page.screenshot({path: './screenshots/login.png'});
console.log("Login Page entered")
await this.page.type('[name="UserName"]', username);
console.log("Username typed");
await this.page.screenshot({path: './screenshots/email_inserted.png'});
await asyncTimeout(1000);
await this.page.click("#nextButton");
await this.page.waitForSelector("#passwordInput", {
visible: true
});
await asyncTimeout(1000);
console.log("PAssword input is visible");
await this.page.screenshot({path: './screenshots/submitted_username.png'});
await this.page.type('#passwordInput', password);
console.log("PAssword input inputted");
await this.page.screenshot({path: './screenshots/password_filled.png'});
await asyncTimeout(1000);
await this.page.click("#submitButton");
console.log("Form submitted");

if (config.waitForPageAfterLogin) {
await this.page.waitForResponse(config.waitForPageAfterLogin);
}
if (config.goToPageAfterLogin) {
await this.page.goto(config.goToPageAfterLogin);
}
console.log("MOODLE PAGE YEAH!");
return;
}
}
Loading

0 comments on commit 02227b4

Please sign in to comment.