Skip to content

Commit

Permalink
Merge pull request #83 from leseb/tax-s3
Browse files Browse the repository at this point in the history
data: request a taxonomy tree
  • Loading branch information
MichaelClifford authored Oct 10, 2024
2 parents 46a8374 + 8836097 commit acd41f3
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 6 deletions.
8 changes: 5 additions & 3 deletions standalone/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,13 +116,15 @@ In this scenario the name of the bucket is `sdg-data` and the tarball file is `d
```bash
ilab data generate
mv generated data
tar -czvf data.tar.gz data model
tar -czvf data.tar.gz data model taxonomy
aws cp data.tar.gz s3://sdg-data/data.tar.gz
```

> [!CAUTION]
> Ensures SDG data are in a directory called "data" and the model is in a directory called "model".
> The tarball must contain two top-level directories: `data` and `model`.
> Ensures SDG data are in a directory called "data".
> Ensures the model to train is in a directory called "model".
> Ensures that the taxonomy tree used to generate the SDG data is in a directory called "taxonomy".
> The tarball must contain three top-level directories: `data`, `model` and `taxonomy`.
> [!CAUTION]
> Make sure the tarball format is .tar.gz.
Expand Down
3 changes: 1 addition & 2 deletions standalone/standalone.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ def upload_s3_file():
top_level_dirs=$(tar --exclude='*/*' --list --file {data_pvc_mount_path}/data.tar.gz)
# Loop through the expected directories and check if they exist in the archive
for dir in data model; do
for dir in data model taxonomy; do
if ! echo "$top_level_dirs" | grep -q "^$dir/$"; then
echo "Archive does not contain a '$dir' directory"
exit 1
Expand Down Expand Up @@ -1268,7 +1268,6 @@ def data_processing(train_args: TrainingArgs) -> None:

container = kubernetes.client.V1Container(
name="sdg-preprocess",
# image="quay.io/tcoufal/ilab-sdg:latest",
image=RHELAI_IMAGE,
command=["/bin/sh", "-ce"],
args=[
Expand Down
2 changes: 1 addition & 1 deletion standalone/standalone.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ if [ "$STRATEGY" == "download" ]; then
top_level_dirs=$(tar --exclude='*/*' --list --file {data_pvc_mount_path}/data.tar.gz)
# Loop through the expected directories and check if they exist in the archive
for dir in data model; do
for dir in data model taxonomy; do
if ! echo "$top_level_dirs" | grep -q "^$dir/$"; then
echo "Archive does not contain a '$dir' directory"
exit 1
Expand Down

0 comments on commit acd41f3

Please sign in to comment.