Skip to content

CLI-utility for downloading html-pages and resources

Notifications You must be signed in to change notification settings

zluuba/page-loader

Repository files navigation

Page Loader

Actions Status page-loader-ci Maintainability Test Coverage

Page Loader is a library that knows how to download pages over the network and save them to the local drive.
If you want to try it, you can use the instructions below.

Requirements

Installation

Clone this repo or download it with pip:

git clone https://github.com/zluuba/page-loader.git
pip install --user git+https://github.com/zluuba/page-loader.git

Install package and dependencies:

cd page-loader
make install

Commands

Options

page-loader [-h] [-o OUTPUT] url

-h, --help                # print help text
-o, --output              # set output directory
-d, --debug               # shows debug messages

Page loader commands

Outputs brief documentation for how to invoke the program.

page-loader --help

Load the html page (from url) and all available resources (img, link and script tags).
The html page is loaded into the current working directory. Then creates a folder url_files
for resources and all resources are loaded there.

page-loader <url>

// file tree before:                file tree after:
// root/                                root/
//  |__mydir/                            |__mydir/
//     |__file.txt                          |__file.txt
//                                       |__google-com.html              ← loaded html page
//                                       |__google_files/                ← resources folder
//                                          |__google-logo.png           ← resource

Download html-page and all available resources to the specified directory.

page-loader -o <dir> <url>

// file tree before:                 file tree after:
// root/                                 root/
//  |__mydir/                             |__mydir/
//     |__file.txt                           |__file.txt
//                                           |__google-com.html          ← loaded html page
//                                           |__google_files/            ← resources folder
//                                              |__google-logo.png       ← resource

Debug option shows all debug messages.
P.S. without the debug option, you will only see messages about resources being loaded.

page-loader -d <url>

// What you will see in the terminal:
// $ page-loader -d https://google.com
//
//   Recieved URL: "https://google.com"
//   Recieved Path: "/users/human/cd"
//   Getting data from "https://google.com"
//   Resource was find: "https://google.com/textinputassistant/tia.png"
//   Writing HTML-file to: "/users/human/cd/google-com.html"
//   Creating resource dir: "/users/human/cd/google-com_files"
//   Downloading resource: "https://google.com/textinputassistant/tia.png"
//   Finishing program...
//
//   Page successfully loaded: "google-com.html"

Additional

To check this project for compliance with the JS coding standards, use this command:

make lint

To make sure the project is working correctly:

make test

This command shows the percentage of test coverage:

make test-coverage

If you have changed some code in this project, you should apply the changes using the:

make reinstall

Demos

Package setup

page-loader-setup.mp4

Usage

page-loader-usage.mp4

Page Loader | by zluuba

About

CLI-utility for downloading html-pages and resources

Resources

Stars

Watchers

Forks