Skip to content

Commit

Permalink
data: request a taxonomy tree
Browse files Browse the repository at this point in the history
As part of the data we pull from S3, we need another folder that
contains the taxonomy tree.

Signed-off-by: Sébastien Han <[email protected]>
  • Loading branch information
leseb committed Oct 10, 2024
1 parent 5c6483d commit 8836097
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 6 deletions.
8 changes: 5 additions & 3 deletions standalone/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,13 +116,15 @@ In this scenario the name of the bucket is `sdg-data` and the tarball file is `d
```bash
ilab data generate
mv generated data
tar -czvf data.tar.gz data model
tar -czvf data.tar.gz data model taxonomy
aws cp data.tar.gz s3://sdg-data/data.tar.gz
```

> [!CAUTION]
> Ensures SDG data are in a directory called "data" and the model is in a directory called "model".
> The tarball must contain two top-level directories: `data` and `model`.
> Ensures SDG data are in a directory called "data".
> Ensures the model to train is in a directory called "model".
> Ensures that the taxonomy tree used to generate the SDG data is in a directory called "taxonomy".
> The tarball must contain three top-level directories: `data`, `model` and `taxonomy`.
> [!CAUTION]
> Make sure the tarball format is .tar.gz.
Expand Down
3 changes: 1 addition & 2 deletions standalone/standalone.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ def upload_s3_file():
top_level_dirs=$(tar --exclude='*/*' --list --file {data_pvc_mount_path}/data.tar.gz)
# Loop through the expected directories and check if they exist in the archive
for dir in data model; do
for dir in data model taxonomy; do
if ! echo "$top_level_dirs" | grep -q "^$dir/$"; then
echo "Archive does not contain a '$dir' directory"
exit 1
Expand Down Expand Up @@ -1268,7 +1268,6 @@ def data_processing(train_args: TrainingArgs) -> None:

container = kubernetes.client.V1Container(
name="sdg-preprocess",
# image="quay.io/tcoufal/ilab-sdg:latest",
image=RHELAI_IMAGE,
command=["/bin/sh", "-ce"],
args=[
Expand Down
2 changes: 1 addition & 1 deletion standalone/standalone.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ if [ "$STRATEGY" == "download" ]; then
top_level_dirs=$(tar --exclude='*/*' --list --file {data_pvc_mount_path}/data.tar.gz)
# Loop through the expected directories and check if they exist in the archive
for dir in data model; do
for dir in data model taxonomy; do
if ! echo "$top_level_dirs" | grep -q "^$dir/$"; then
echo "Archive does not contain a '$dir' directory"
exit 1
Expand Down

0 comments on commit 8836097

Please sign in to comment.