wayback-machine-auto-save

A worker to save web pages on list to the Internet Archive's Wayback Machine (WM).

Limitation

WM seems to allow about 240 successful requests per day per client, whether the user is logged in or not. This counter resets at 00:00 UTC.

Quick Start

File of URLs

Prepare a file containing a list of urls to save. Both txt and json are accepted.

TXT format

Script loads one line as one url. Example:

https://www.gnu.org/fun/
https://www.gnu.org/fun/jokes/10-kinds-of-people.html

JSON format

Script loads all url attr value from JSON. Example:

[
{"url": "https://www.gnu.org/fun/"},
{"url": "https://www.gnu.org/fun/jokes/10-kinds-of-people.html"}
]

Save to Wayback Machine

Save urls in urls.txt to WM. Command:

python main.py urls.txt

Example output:

cookies: None
proxies: None
WaybackMachineAPI inited.
urls: ['https://www.gnu.org/fun/', 'https://www.gnu.org/fun/jokes/10-kinds-of-people.html']

Http post: https://web.archive.org/save/https://www.gnu.org/fun/
status_code: 200
WM accept saving https://www.gnu.org/fun/, job_id: spn2-51ef937fdcccbcf485e2d092417ee320a2043b52
Save page successful: https://www.gnu.org/fun/, job_id: spn2-51ef937fdcccbcf485e2d092417ee320a2043b52

Http post: https://web.archive.org/save/https://www.gnu.org/fun/jokes/10-kinds-of-people.html
status_code: 200
WM accept saving https://www.gnu.org/fun/jokes/10-kinds-of-people.html, job_id: spn2-60a192c5877dd50c7eb416a0565cfc345e6003c0
Save page successful: https://www.gnu.org/fun/jokes/10-kinds-of-people.html, job_id: spn2-60a192c5877dd50c7eb416a0565cfc345e6003c0

Optional Arguments

Cookies

Simply provide the value of the logged-in-sig of cookies. The length of this value is about 300.

Argument: -c COOKIES.

Example: -c "123456 XXXXXXXXXX"

Proxy

Set proxy for http and https.

Argument: -p PROXY

Example: -p http://127.0.0.1:8888

Example of Save with Cookies And Proxy

python main.py urls.txt -p http://127.0.0.1:8888 -c "123456 XXXXXXXXXX"

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app_args_loader.py		app_args_loader.py
job_result.py		job_result.py
main.py		main.py
save_page_options.py		save_page_options.py
save_page_options_test.py		save_page_options_test.py
test_http.py		test_http.py
urls.txt		urls.txt
urls_list_loader.py		urls_list_loader.py
utils.py		utils.py
wayback_machine_api.py		wayback_machine_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wayback-machine-auto-save

Limitation

Quick Start

File of URLs

Save to Wayback Machine

Optional Arguments

Cookies

Proxy

Example of Save with Cookies And Proxy

About

Languages

License

bac0id/wayback-machine-auto-save

Folders and files

Latest commit

History

Repository files navigation

wayback-machine-auto-save

Limitation

Quick Start

File of URLs

Save to Wayback Machine

Optional Arguments

Cookies

Proxy

Example of Save with Cookies And Proxy

About

Topics

Resources

License

Stars

Watchers

Forks

Languages