Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finer reference trace #325

Merged
merged 1 commit into from
Dec 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/configs/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -748,4 +748,6 @@ Lazar
Cvetkovic
cvetkovic
ethz
lazar
lazar
xvzf
untar
2 changes: 1 addition & 1 deletion .github/workflows/integration_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
- name: Drawing samples
run: |
tar -xzvf $tpath/inputs/preprocessed.tar.gz -C $tpath/inputs/
python -m sampler sample --source_trace $tpath/inputs/preprocessed --output $tpath/sampled --min-size 10 --step-size=10 --max-size=50
python -m sampler sample --source_trace $tpath/inputs/preprocessed --original_trace $tpath/inputs/preprocessed --output $tpath/sampled --min-size 10 --step-size=10 --max-size=50

# - name: Plotting results
# run: |
Expand Down
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,7 @@ analysis
tmp
data/out
data/azure
data/traces/*
!data/traces/example/
data/traces/reference/*/*.csv
!data/traces/reference/
pkg/generator/*.png
pkg/generator/*.txt
pkg/driver/*.csv
Expand Down
1 change: 1 addition & 0 deletions data/traces/reference/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.csv
4 changes: 2 additions & 2 deletions data/traces/reference/sampled_150.tar.gz
Git LFS file not shown
4 changes: 4 additions & 0 deletions docs/loader.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,10 @@ For to configure the workload for load generator, please refer to `docs/configur
There are a couple of constants that should not be exposed to the users. They can be examined and changed
in `pkg/common/constants.go`.

Sample sizes appropriate for performance evaluation vary depending on the platform.
As a starting point for fine-tuning, we suggest at most 5 functions per core with SMT disabled.
For example, 80 functions for a 16-core node. With larger sample sizes, trace replaying may lead to failures in function invocations.

## Build the image for a synthetic function

The reason for existence of Firecracker and container version is because of different ports for gRPC server. Firecracker
Expand Down
33 changes: 21 additions & 12 deletions docs/sampler.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ git lfs install
cd sampler
git lfs fetch
git lfs checkout
pip install -r requirements.txt
pip install -r ../requirements.txt
```

## Pre-processing the original trace (mandatory)
Expand Down Expand Up @@ -91,9 +91,9 @@ monotonic load increase (in terms of resource usage) when sweeping the sample si
```console
python3 -m sampler sample -h

usage: sample [-h] -t path -o path [-min integer] [-st integer] [-max integer] [-tr integer]
usage: sample [-h] -t path -orig path -o path [-min integer] [-st integer] [-max integer] [-tr integer]

optional arguments:
options:
-h, --help show this help message and exit
-t path, --source_trace path
Path to trace to draw samples from
Expand All @@ -113,22 +113,31 @@ optional arguments:

## Reference traces

The reference traces are stored in `data/traces/reference` folder of this repository, as `preprocessed.tar.gz` and
`sampled.tar.gz` files stored in Git LFS.
The reference traces are stored in `data/traces/reference` folder of this repository, as `preprocessed_150.tar.gz` and
`sampled_150.tar.gz` files stored in Git LFS.

`preprocessed_150.tar.gz` contains the preprocessed traces for the original Azure trace for day 1, 09:00:00-11:30:00 (150
minutes total). 150 minutes trace captures approximately half of all functions from original Azure trace, but makes it
more suitable to run in shorter experiments (10 minutes - 2 hours).

`sampled_150.tar.gz` contains the sampled traces for preprocessed trace from `preprocessed_150.tar.gz`. Sample sizes are
10-200 functions with step 10, 200-3k with step 50, and 3k-24k with step 1k.

`preprocessed.tar.gz` contains the preprocessed traces for the original Azure trace for day 1, 09:00:00-11:30:00 (150
minutes total).
You can untar the tarballs with the following commands:

`sampled.tar.gz` contains the sampled traces for preprocessed trace from `preprocessed.tar.gz`. Sample sizes are 50-3k
functions with step 50 and 3k-24k with step 1k.
```console
tar -xvzf sampled_150.tar.gz
tar -xvzf preprocessed_150.tar.gz
JooyoungPark73 marked this conversation as resolved.
Show resolved Hide resolved
```

The reference traces were obtained by running the following commands:

```console
python3 -m preprocess -t data/azure/ -o data/reference/preprocessed_150 -s 00:09:00 -dur 150
python3 -m sampler preprocess -t data/azure/ -o data/traces/reference/preprocessed_150 -s 00:09:00 -dur 150

python3 -m sample -t data/reference/preprocessed_150 -o data/reference/sampled_150 -min 3000 -st 1000 -max 24000 -tr 16
python3 -m sample -t data/reference/sampled_150/samples/3000 -o data/reference/sampled_150 -min 50 -st 50 -max 3000 -tr 16
python3 -m sampler sample -t data/traces/reference/preprocessed_150 -orig data/traces/reference/preprocessed_150 -o data/traces/reference/sampled_150 -min 3000 -st 1000 -max 24000 -tr 16
python3 -m sampler sample -t data/traces/reference/sampled_150/samples/3000 -orig data/traces/reference/preprocessed_150 -o data/traces/reference/sampled_150 -min 200 -st 50 -max 3000 -tr 16
python3 -m sampler sample -t data/traces/reference/sampled_150/samples/200 -orig data/traces/reference/preprocessed_150 -o data/traces/reference/sampled_150 -min 10 -st 10 -max 200 -tr 16
```

## Tools
Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
matplotlib==3.7.2
numpy==1.26.1
numpy==1.24.4
pandas==1.3.5
scipy==1.11.2
scipy==1.10.1
Comment on lines +2 to +4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cvetkovic We noticed that this bump of numpy and scipy versions make it impossible to use sampler on Ubuntu 20.04 (I think that's the reason, but might be other factor). Do you know a proper way of fixing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is due to the Python version, but still, the Python version is linked to the Ubuntu version which is annoying, so I lowered the dependency versions. The numpy 1.26.1 and scipy 1.10.1 require Python 3.9, and we don't have that as default on Ubuntu 20.04

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now but we need to move to using ubuntu 22 for measurements. this is about time but we need to plan this upgrade vHive-wide.

pytest==7.4.0
cloudpickle==2.2.1
seaborn==0.13.0
Expand Down
1 change: 1 addition & 0 deletions sampler/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ def main():
sample_parser.add_argument(
'-orig',
'--original_trace',
required=True,
metavar='path',
default=None,
help='Path to the Azure (or other original) trace files, required to maximize the derived sample\'s representativity (WD from the original trace)'
Expand Down
Loading