Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

space efficient storage for a million EDG binaries #200

Open
milahu opened this issue May 25, 2022 · 0 comments
Open

space efficient storage for a million EDG binaries #200

milahu opened this issue May 25, 2022 · 0 comments

Comments

@milahu
Copy link

milahu commented May 25, 2022

the "million EDG binaries" (30MB zipped, 140MB raw) would compress well with git

transfer size would stay the same, but storage size would be much smaller = no need for amazon S3 server

migrate tarballs to git:

#!/bin/sh

if [ -d gitrepo ]; then
  echo "error: folder exists: gitrepo. to run test again, run: rm -rf gitrepo"
  exit 1
fi

mkdir gitrepo

git -C gitrepo init

# https://github.com/rose-compiler/rose/blob/weekly/src/frontend/CxxFrontend/EDG_VERSION
release_list="$(cat <<EOF
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.77.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.78.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.79.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.80.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.81.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.1
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.2
roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.3
EOF
)"

for release in $release_list
do
  echo adding $release
  [ -e $release.tar.gz ] || wget http://edg-binaries.rosecompiler.org/$release.tar.gz
  [ -d $release ] || tar -xf $release.tar.gz
  cp -r $release/* $release/.libs gitrepo/

  # TODO use release date for commit + tag
  git -C gitrepo add .
  git -C gitrepo commit -m "$release"
  git -C gitrepo tag "$release"

  rm -rf $release
done

echo raw size
du -sh gitrepo/.git
echo
echo compressing ...
time git -C gitrepo gc
echo
echo compressed size
du -sh gitrepo/.git
echo
echo total size of tarballs
du -shc roseBinaryEDG-*.tar.gz | tail -n1
raw size
247M	gitrepo/.git

compressing ...
Enumerating objects: 35, done.
Counting objects: 100% (35/35), done.
Delta compression using up to 4 threads
Compressing objects: 100% (34/34), done.
Writing objects: 100% (35/35), done.
Total 35 (delta 15), reused 0 (delta 0), pack-reused 0

real	0m57.203s
user	0m52.688s
sys	0m3.180s

compressed size
34M	gitrepo/.git

total size of tarballs
213M	total

fetching a tarball would be as simple as

wget https://github.com/rose-compiler/edg-binaries/archive/roseBinaryEDG-5-0-x86_64-pc-linux-gnu-gnu-10-5.0.11.82.3.tar.gz

compression can be optimized by

compiling object code with the -ffunction-sections and -fdata-sections compiler flags. This has the effect that if you 'insert' a function into a translation unit, the insertion does not cause all of the addresses to change across the whole object file.

https://github.com/elfshaker/elfshaker#applicability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant