tweet-archive

Goal: Provide a functional workflow to create an offline archive of scraped tweets- uninhibited by the platform's API restrictions- as a more complete alternative to the native archive feature available from a platform account.

Note: The majority of the value here lies in the hacky shell scripts which assemble the tweet data into a static frontend that you can enjoy. After intial setup, if you do not want the tweets pooled together (regardless of whether they are from your Likes, Bookmarks, etc.) and sorted by date posted, or you want to customize the pages' appearances, you would be best to fork this repo and customize the shell scripts to your liking.

Dependencies:

a twitter account (can be a temporary account)
WFDownloader
Docker Desktop
At the moment, unfortunately Chrome
A Unix-like shell

Part 1 - Pulling Down Tweet Data

clone down this repository and unzip it
open WFDownloader
- on macOS, you will likely need to allow the applets through Gatekeeper, and launch via EnableStart.command
with WFDownloader open, navigate to Tasks → Login via inbuilt browser
load the following url into the inbuilt browser and log into your account: https://twitter.com/i/flow/login
- you may now close the inbuilt browser window if you'd prefer, since it has your current session login (or you can continue using it for retrieving urls in the following procedure if you prefer)
for each page's tweets you would like to archive (e.g. the url to your likes, to your bookmarks, to your profile, to another user's profile, etc.) you will need to copy out the url of that page and follow this procedure:
1. from the main WFDownloader window, select Add
  - paste the url into the Link Address field
  - browse to, or set the path to, the recently downloaded and decompressed repository directory
  - select Config and ensure that the Fetch_Mode field is set to Fetch media only; then select Accept
  - select Confirm and wait for all resolveable tweets' media links to be found
  - select Confirm again to close this window, and from the main window, select Resume (the new batch should already be selected)
2. wait for the downloads in the batch to complete, and then once again select Add
  - your previous url and folder path should already be present in the fields; this time, select Config and set the Fetch_Mode field to Fetch tweet urls for export; then select Accept
  - select Confirm and wait for all resolveable tweets' links to be found
  - select Confirm again to close this window, and from the main window, select Resume (the new batch should already be selected)
  - at this point, you should have both the media and json data, for the tweets served at the current url, saved locally

Part 2 - Generating a Static Frontend

open Docker Desktop so that it is running in the background
open your terminal emulator and navigate to the repository directory
run docker build -t tweet-archive .; docker run -it --rm -v "$(pwd):/scripts" tweet-archive (repeat these two commands any time you want to re-generate the archive after making your own changes)
- if you have the gnu core utilities installed, you can try running bash create_pages.sh instead of using Docker (errors may persist due to different flags available from the BSD package equivalents; this is why I suggest using Docker)

Viewing Your Tweet Archive

quit Chrome if it is running
in your terminal emulator, launch a Chrome session with reduced security to disable CORS restrictions: (for you convenience, you may want to save the below command to a .command file, a .cmd file, or an alias, per your platform)
- macOS: open /Applications/Google\ Chrome.app --args --user-data-dir="/var/tmp/Chrome dev session" --disable-web-security
- Linux: google-chrome --disable-web-security
- Windows: chrome.exe --user-data-dir="C:/Chrome dev session" --disable-web-security
in your file manager, navigate to the top level of the repository directory and then right-click open the index.html file with this active Chrome session
on the index page, you should see that the tweets are divided into chronological subgroups of 15 tweets per hyperlinked page (this prevents long load times for media)
- each page is hyperlinked with the earliest tweet's post date per subgrouping
when you're finished viewing your archive, quit Chrome to end this session with reduced security

Known Issues

intermittent playback issues without the autoplay attribute
intermittent playback issues on local servers, partially due to using autoplay
inability to open archive on alternative browsers (disabling CORS restrictions is increasingly difficult on Safari, Firefox, and others)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
create_pages.sh		create_pages.sh
gen_index.sh		gen_index.sh
page_index.html		page_index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tweet-archive

Part 1 - Pulling Down Tweet Data

Part 2 - Generating a Static Frontend

Viewing Your Tweet Archive

Known Issues

About

Languages

License

mcthomas/tweet-archive

Folders and files

Latest commit

History

Repository files navigation

tweet-archive

Part 1 - Pulling Down Tweet Data

Part 2 - Generating a Static Frontend

Viewing Your Tweet Archive

Known Issues

About

Topics

Resources

License

Stars

Watchers

Forks

Languages