Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a single-bucket design. #186

Merged
merged 49 commits into from
May 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
86c22bd
[system] Updating the resource definition in preparation for improved…
mcopik May 14, 2022
05532e7
[aws] Move towards single bucket design on AWS for benchmark invocation
mcopik Sep 23, 2023
9bcc6c0
[benchmarks] Single bucket for 210.thumbnailer Python
mcopik Sep 23, 2023
d1ffbe1
[aws] Fix generation of unique file name on S3 - remove typo
mcopik Sep 23, 2023
8330fe8
[aws] Fix generation of unique file name on S3 for Node.js, and updat…
mcopik Sep 23, 2023
1b30184
[system] Update caching of storage files to correctly handle initiali…
mcopik Sep 23, 2023
4fc9467
[benchmarks] Update video-processing and compression to handle single…
mcopik Sep 23, 2023
0e1cabf
[benchmarks] Fix 120.uploader and video-processing to single bucket d…
mcopik Sep 23, 2023
07b8587
[aws] [benchmarks] Use single bucket for code deployment and 411.imag…
mcopik Sep 23, 2023
7bdc95c
[benchmarks] Adapt the rest of benchmarks to a new design
mcopik Sep 23, 2023
7d124fa
[system] [aws] Change definition of list_buckets method for more gene…
mcopik Oct 18, 2023
d35d3df
[system] Move the resource initialization such that we can find exist…
mcopik Oct 18, 2023
ab0a40b
[system] Ensure the new resource ID follows a predefined prefix
mcopik Oct 18, 2023
dc9909c
[aws] Add remove bucket option for storage
mcopik Oct 18, 2023
f23c846
[system] Add CLI helper to clean and remove old buckets
mcopik Oct 18, 2023
d71f2cc
[regression] Update regression to use a single resource ID for deploy…
mcopik Oct 18, 2023
c968d1e
[gcp] adapt to new API
mcopik Apr 11, 2024
785c4c7
[azure] Update credentials generator to support accounts with multipl…
mcopik Apr 11, 2024
71f2db4
[aws] Avoid infinite loop in pypy install
mcopik Apr 11, 2024
0352d6d
[aws] Correctly throw an error on duplicated S3 buckets
mcopik Apr 12, 2024
fa46bf0
[system] Verify buckets do not exists before creating
mcopik Apr 12, 2024
e241ffb
[azure] Support new mode of reusing resource groups and data storage …
mcopik Apr 12, 2024
3bb1ca2
[system] Pass resource prefix as an option
mcopik Apr 12, 2024
a29a5b8
[azure] Linting
mcopik Apr 12, 2024
67731c2
[azure] Add listing of resource groups
mcopik Apr 12, 2024
36ea134
[azure] Add deleting of resource groups
mcopik Apr 12, 2024
23293c4
[azure] Support sharing CLI instance
mcopik Apr 12, 2024
5a1c140
[azure] Fix incorrect paths in storage
mcopik Apr 12, 2024
84c45e9
[system] Ensure input data is reuploaded
mcopik Apr 12, 2024
64e0bf6
[azure] Improved regression handling of CLI instance
mcopik Apr 12, 2024
fb9d0e8
[system] Correctly recognize buckets without uploaded data
mcopik Apr 12, 2024
d4da66e
[azure] Linting issues
mcopik Apr 12, 2024
82f15d3
[gcp] Final adaptions to the new storage interface
mcopik Apr 12, 2024
191966b
[azure] Fix output path of Python benchmarks
mcopik Apr 12, 2024
4e60e22
[gcp] Fix paths in the Node.js version
mcopik Apr 12, 2024
d192034
[azure] Fix paths in the Node.js version
mcopik Apr 12, 2024
9508893
[gcp] Fix creation of promises for upload
mcopik Apr 12, 2024
55a895e
[local] Update to the single bucket design
mcopik Apr 12, 2024
18642f2
[system] Make resource id optional
mcopik Apr 12, 2024
217ee4c
[local] Adapt minio resources to be compatible with openwhisk
mcopik Apr 13, 2024
b765cd0
[whisk] fix incorrect path in storage
mcopik May 3, 2024
c6129a5
[whisk] Fix caching logic
mcopik May 3, 2024
56135c1
[whisk] Update documentations on OpenWhisk
mcopik May 4, 2024
5786975
[whisk] Fix bug in sharing a list across many processes
mcopik May 4, 2024
296c30f
[whisk] Adapt regression to a single initialization
mcopik May 4, 2024
0d3986d
[whisk] Fix minio version to ensure compatibility with older Node ver…
mcopik May 5, 2024
387b3ec
[whisk] Fix path of outputs for nodejs
mcopik May 5, 2024
4056b69
[system] Fix linting issues
mcopik May 5, 2024
59f6c5a
[system] Add necessary mypy package
mcopik May 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion benchmarks/100.webapps/110.dynamic-html/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
def buckets_count():
return (0, 0)

def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):
input_config = {'username': 'testname'}
input_config['random_len'] = size_generators[size]
return input_config
3 changes: 2 additions & 1 deletion benchmarks/100.webapps/120.uploader/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@
def buckets_count():
return (0, 1)

def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
def generate_input(data_dir, size, benchmarks_bucket, input_buckets, output_buckets, upload_func):
input_config = {'object': {}, 'bucket': {}}
input_config['object']['url'] = url_generators[size]
input_config['bucket']['bucket'] = benchmarks_bucket
input_config['bucket']['output'] = output_buckets[0]
return input_config
7 changes: 4 additions & 3 deletions benchmarks/100.webapps/120.uploader/nodejs/function.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ function streamToPromise(stream) {
}

exports.handler = async function(event) {
let output_bucket = event.bucket.output
let bucket = event.bucket.bucket
let output_prefix = event.bucket.output
let url = event.object.url
let upload_key = path.basename(url)
let download_path = path.join('/tmp', upload_key)
Expand All @@ -26,10 +27,10 @@ exports.handler = async function(event) {
var keyName;
let upload = promise.then(
async () => {
[keyName, promise] = storage_handler.upload(output_bucket, upload_key, download_path);
[keyName, promise] = storage_handler.upload(bucket, path.join(output_prefix, upload_key), download_path);
await promise;
}
);
await upload;
return {bucket: output_bucket, url: url, key: keyName}
return {bucket: bucket, url: url, key: keyName}
};
7 changes: 4 additions & 3 deletions benchmarks/100.webapps/120.uploader/python/function.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@

def handler(event):

output_bucket = event.get('bucket').get('output')
bucket = event.get('bucket').get('bucket')
output_prefix = event.get('bucket').get('output')
url = event.get('object').get('url')
name = os.path.basename(url)
download_path = '/tmp/{}'.format(name)
Expand All @@ -22,14 +23,14 @@ def handler(event):
process_end = datetime.datetime.now()

upload_begin = datetime.datetime.now()
key_name = client.upload(output_bucket, name, download_path)
key_name = client.upload(bucket, os.path.join(output_prefix, name), download_path)
upload_end = datetime.datetime.now()

process_time = (process_end - process_begin) / datetime.timedelta(microseconds=1)
upload_time = (upload_end - upload_begin) / datetime.timedelta(microseconds=1)
return {
'result': {
'bucket': output_bucket,
'bucket': bucket,
'url': url,
'key': key_name
},
Expand Down
9 changes: 6 additions & 3 deletions benchmarks/200.multimedia/210.thumbnailer/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,18 @@ def buckets_count():
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
'''
def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):

for file in glob.glob(os.path.join(data_dir, '*.jpg')):
img = os.path.relpath(file, data_dir)
upload_func(0, img, file)

#TODO: multiple datasets
input_config = {'object': {}, 'bucket': {}}
input_config['object']['key'] = img
input_config['object']['width'] = 200
input_config['object']['height'] = 200
input_config['bucket']['input'] = input_buckets[0]
input_config['bucket']['output'] = output_buckets[0]
input_config['bucket']['bucket'] = benchmarks_bucket
input_config['bucket']['input'] = input_paths[0]
input_config['bucket']['output'] = output_paths[0]
return input_config
12 changes: 7 additions & 5 deletions benchmarks/200.multimedia/210.thumbnailer/nodejs/function.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,24 @@ const sharp = require('sharp'),
let storage_handler = new storage.storage();

exports.handler = async function(event) {
input_bucket = event.bucket.input
output_bucket = event.bucket.output

bucket = event.bucket.bucket
input_prefix = event.bucket.input
output_prefix = event.bucket.output
let key = event.object.key
width = event.object.width
height = event.object.height
let pos = key.lastIndexOf('.');
let upload_key = key.substr(0, pos < 0 ? key.length : pos) + '.png';

const sharp_resizer = sharp().resize(width, height).png();
let read_promise = storage_handler.downloadStream(input_bucket, key);
let [writeStream, promise, uploadName] = storage_handler.uploadStream(output_bucket, upload_key);
let read_promise = storage_handler.downloadStream(bucket, path.join(input_prefix, key));
let [writeStream, promise, uploadName] = storage_handler.uploadStream(bucket, path.join(output_prefix, upload_key));
read_promise.then(
(input_stream) => {
input_stream.pipe(sharp_resizer).pipe(writeStream);
}
);
await promise;
return {bucket: output_bucket, key: uploadName}
return {bucket: output_prefix, key: uploadName}
};
11 changes: 6 additions & 5 deletions benchmarks/200.multimedia/210.thumbnailer/python/function.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,9 @@ def resize_image(image_bytes, w, h):

def handler(event):

input_bucket = event.get('bucket').get('input')
output_bucket = event.get('bucket').get('output')
bucket = event.get('bucket').get('bucket')
input_prefix = event.get('bucket').get('input')
output_prefix = event.get('bucket').get('output')
key = unquote_plus(event.get('object').get('key'))
width = event.get('object').get('width')
height = event.get('object').get('height')
Expand All @@ -39,7 +40,7 @@ def handler(event):
#resize_image(download_path, upload_path, width, height)
#client.upload(output_bucket, key, upload_path)
download_begin = datetime.datetime.now()
img = client.download_stream(input_bucket, key)
img = client.download_stream(bucket, os.path.join(input_prefix, key))
download_end = datetime.datetime.now()

process_begin = datetime.datetime.now()
Expand All @@ -48,15 +49,15 @@ def handler(event):
process_end = datetime.datetime.now()

upload_begin = datetime.datetime.now()
key_name = client.upload_stream(output_bucket, key, resized)
key_name = client.upload_stream(bucket, os.path.join(output_prefix, key), resized)
upload_end = datetime.datetime.now()

download_time = (download_end - download_begin) / datetime.timedelta(microseconds=1)
upload_time = (upload_end - upload_begin) / datetime.timedelta(microseconds=1)
process_time = (process_end - process_begin) / datetime.timedelta(microseconds=1)
return {
'result': {
'bucket': output_bucket,
'bucket': bucket,
'key': key_name
},
'measurement': {
Expand Down
7 changes: 4 additions & 3 deletions benchmarks/200.multimedia/220.video-processing/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ def buckets_count():
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
'''
def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):
for file in glob.glob(os.path.join(data_dir, '*.mp4')):
img = os.path.relpath(file, data_dir)
upload_func(0, img, file)
Expand All @@ -21,6 +21,7 @@ def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
input_config['object']['key'] = img
input_config['object']['op'] = 'watermark'
input_config['object']['duration'] = 1
input_config['bucket']['input'] = input_buckets[0]
input_config['bucket']['output'] = output_buckets[0]
input_config['bucket']['bucket'] = benchmarks_bucket
input_config['bucket']['input'] = input_paths[0]
input_config['bucket']['output'] = output_paths[0]
return input_config
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,10 @@ def transcode_mp3(video, duration, event):
operations = { 'transcode' : transcode_mp3, 'extract-gif' : to_gif, 'watermark' : watermark }

def handler(event):
input_bucket = event.get('bucket').get('input')
output_bucket = event.get('bucket').get('output')

bucket = event.get('bucket').get('bucket')
input_prefix = event.get('bucket').get('input')
output_prefix = event.get('bucket').get('output')
key = event.get('object').get('key')
duration = event.get('object').get('duration')
op = event.get('object').get('op')
Expand All @@ -69,7 +71,7 @@ def handler(event):
pass

download_begin = datetime.datetime.now()
client.download(input_bucket, key, download_path)
client.download(bucket, os.path.join(input_prefix, key), download_path)
download_size = os.path.getsize(download_path)
download_stop = datetime.datetime.now()

Expand All @@ -80,16 +82,16 @@ def handler(event):
upload_begin = datetime.datetime.now()
filename = os.path.basename(upload_path)
upload_size = os.path.getsize(upload_path)
client.upload(output_bucket, filename, upload_path)
upload_key = client.upload(bucket, os.path.join(output_prefix, filename), upload_path)
upload_stop = datetime.datetime.now()

download_time = (download_stop - download_begin) / datetime.timedelta(microseconds=1)
upload_time = (upload_stop - upload_begin) / datetime.timedelta(microseconds=1)
process_time = (process_end - process_begin) / datetime.timedelta(microseconds=1)
return {
'result': {
'bucket': output_bucket,
'key': filename
'bucket': bucket,
'key': upload_key
},
'measurement': {
'download_time': download_time,
Expand Down
7 changes: 4 additions & 3 deletions benchmarks/300.utilities/311.compression/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ def upload_files(data_root, data_dir, upload_func):
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
'''
def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):

# upload different datasets
datasets = []
Expand All @@ -32,6 +32,7 @@ def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):

input_config = {'object': {}, 'bucket': {}}
input_config['object']['key'] = datasets[0]
input_config['bucket']['input'] = input_buckets[0]
input_config['bucket']['output'] = output_buckets[0]
input_config['bucket']['bucket'] = benchmarks_bucket
input_config['bucket']['input'] = input_paths[0]
input_config['bucket']['output'] = output_paths[0]
return input_config
11 changes: 6 additions & 5 deletions benchmarks/300.utilities/311.compression/python/function.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,15 @@ def parse_directory(directory):

def handler(event):

input_bucket = event.get('bucket').get('input')
output_bucket = event.get('bucket').get('output')
bucket = event.get('bucket').get('bucket')
input_prefix = event.get('bucket').get('input')
output_prefix = event.get('bucket').get('output')
key = event.get('object').get('key')
download_path = '/tmp/{}-{}'.format(key, uuid.uuid4())
os.makedirs(download_path)

s3_download_begin = datetime.datetime.now()
client.download_directory(input_bucket, key, download_path)
client.download_directory(bucket, os.path.join(input_prefix, key), download_path)
s3_download_stop = datetime.datetime.now()
size = parse_directory(download_path)

Expand All @@ -36,15 +37,15 @@ def handler(event):
s3_upload_begin = datetime.datetime.now()
archive_name = '{}.zip'.format(key)
archive_size = os.path.getsize(os.path.join(download_path, archive_name))
key_name = client.upload(output_bucket, archive_name, os.path.join(download_path, archive_name))
key_name = client.upload(bucket, os.path.join(output_prefix, archive_name), os.path.join(download_path, archive_name))
s3_upload_stop = datetime.datetime.now()

download_time = (s3_download_stop - s3_download_begin) / datetime.timedelta(microseconds=1)
upload_time = (s3_upload_stop - s3_upload_begin) / datetime.timedelta(microseconds=1)
process_time = (compress_end - compress_begin) / datetime.timedelta(microseconds=1)
return {
'result': {
'bucket': output_bucket,
'bucket': bucket,
'key': key_name
},
'measurement': {
Expand Down
7 changes: 4 additions & 3 deletions benchmarks/400.inference/411.image-recognition/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def upload_files(data_root, data_dir, upload_func):
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
'''
def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):

# upload model
model_name = 'resnet50-19c8e357.pth'
Expand All @@ -38,6 +38,7 @@ def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
input_config = {'object': {}, 'bucket': {}}
input_config['object']['model'] = model_name
input_config['object']['input'] = input_images[0][0]
input_config['bucket']['model'] = input_buckets[0]
input_config['bucket']['input'] = input_buckets[1]
input_config['bucket']['bucket'] = benchmarks_bucket
input_config['bucket']['input'] = input_paths[1]
input_config['bucket']['model'] = input_paths[0]
return input_config
Original file line number Diff line number Diff line change
Expand Up @@ -25,22 +25,23 @@

def handler(event):

model_bucket = event.get('bucket').get('model')
input_bucket = event.get('bucket').get('input')
bucket = event.get('bucket').get('bucket')
input_prefix = event.get('bucket').get('input')
model_prefix = event.get('bucket').get('model')
key = event.get('object').get('input')
model_key = event.get('object').get('model')
download_path = '/tmp/{}-{}'.format(key, uuid.uuid4())

image_download_begin = datetime.datetime.now()
image_path = download_path
client.download(input_bucket, key, download_path)
client.download(bucket, os.path.join(input_prefix, key), download_path)
image_download_end = datetime.datetime.now()

global model
if not model:
model_download_begin = datetime.datetime.now()
model_path = os.path.join('/tmp', model_key)
client.download(model_bucket, model_key, model_path)
client.download(bucket, os.path.join(model_prefix, model_key), model_path)
model_download_end = datetime.datetime.now()
model_process_begin = datetime.datetime.now()
model = resnet50(pretrained=False)
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/500.scientific/501.graph-pagerank/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
}

def buckets_count():
return (1, 1)
return (0, 0)

def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):
return { 'size': size_generators[size] }
4 changes: 2 additions & 2 deletions benchmarks/500.scientific/502.graph-mst/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
}

def buckets_count():
return (1, 1)
return (0, 0)

def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):
return { 'size': size_generators[size] }
4 changes: 2 additions & 2 deletions benchmarks/500.scientific/503.graph-bfs/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
}

def buckets_count():
return (1, 1)
return (0, 0)

def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):
return { 'size': size_generators[size] }
8 changes: 5 additions & 3 deletions benchmarks/500.scientific/504.dna-visualisation/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@
def buckets_count():
return (1, 1)

def generate_input(data_dir, size, input_buckets, output_buckets, upload_func):
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):

for file in glob.glob(os.path.join(data_dir, '*.fasta')):
data = os.path.relpath(file, data_dir)
upload_func(0, data, file)
input_config = {'object': {}, 'bucket': {}}
input_config['object']['key'] = data
input_config['bucket']['input'] = input_buckets[0]
input_config['bucket']['output'] = output_buckets[0]
input_config['bucket']['bucket'] = benchmarks_bucket
input_config['bucket']['input'] = input_paths[0]
input_config['bucket']['output'] = output_paths[0]
return input_config
Loading