Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reactivate the integration tests for remote file access (GCS, S3) #210

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

tramora
Copy link
Collaborator

@tramora tramora commented Jul 11, 2024

Docker is installed in the khiopspydev docker image so that it can start the local fake servers .
The container must be started with the --privileged option otherwise docker inside the container would not start.
The fake servers are pre-provisioned with sample files to be ready before the tests run

closes #175, #53

@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 5 times, most recently from da00cae to 1dc349c Compare July 13, 2024 21:21
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch from 1dc349c to 6593606 Compare August 14, 2024 16:18
@tramora
Copy link
Collaborator Author

tramora commented Aug 14, 2024

@popescu-v & @folmos-at-orange : a detail we forgot the S3 & GCS drivers are not released publicly and thus not available in the dev image. When trying to access to the remote files during the tests khiops complains

error : there is no driver available for the URI scheme 'gs://'
error : there is no driver available for the URI scheme 's3://'

@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch from cb7c126 to 35c3a6f Compare August 19, 2024 12:19
@popescu-v
Copy link
Collaborator

@popescu-v & @folmos-at-orange : a detail we forgot the S3 & GCS drivers are not released publicly and thus not available in the dev image. When trying to access to the remote files during the tests khiops complains

error : there is no driver available for the URI scheme 'gs://'
error : there is no driver available for the URI scheme 's3://'

Yes, AFAIU we need to wait for the drivers to be publicly released before the tests can work.

@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 5 times, most recently from 18cf7e9 to a750996 Compare September 4, 2024 15:33
@tramora tramora self-assigned this Sep 24, 2024
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 2 times, most recently from ed8cb75 to 571f402 Compare September 25, 2024 12:07
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 5 times, most recently from beab51e to 04fbd2b Compare October 9, 2024 15:04
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 2 times, most recently from e9011a8 to 3351a38 Compare October 21, 2024 09:56
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 6 times, most recently from 3e90f62 to 53fdbc3 Compare December 12, 2024 00:22
@tramora
Copy link
Collaborator Author

tramora commented Dec 16, 2024

There is an issue remaining that breaks the tests and that I cannot fix right now.

It seems that khiops under the conda env cannot use the remote files drivers installed system wide. They cannot be installed via conda as there is no conda release yet.

This execution

/root/miniforge3/bin/conda run -n py3.12_conda  coverage run -m xmlrunner -o "reports/py3.12_conda" tests.test_remote_access.KhiopsS3RemoteFileTests.test_khiops_classifier_with_remote_access

fails with

khiops.core.exceptions.KhiopsRuntimeError: khiops execution had errors (return code 1):
Contents of stdout:
error : there is no driver available for the URI scheme 's3://'
error : Input command file s3://s3-bucket/khiops-cicd/tmp/KhiopsClassifier_fit_224bc1d9-4374-4f5e-a665-4e577ad107e6/build_dictionary_from_data_table_7ff01b03-06af-46d1-90ad-1564340c2d83._kh : Unable to open file
error : there is no driver available for the URI scheme 's3://'

@popescu-v
Copy link
Collaborator

There is an issue remaining that breaks the tests and that I cannot fix right now.

It seems that khiops under the conda env cannot use the remote files drivers installed system wide. They cannot be installed via conda as there is no conda release yet.

This is true. Hence, IMHO the way to go for the current PR is to test the remote access only in the native case (Conda environments for test, but with the native khiops-core and drivers packages installed).

Then, as soon as the Conda packages for the drivers are available, a new PR should activate these integration tests in full Conda environments.

@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch 3 times, most recently from 2e88e35 to 4602e44 Compare December 17, 2024 17:14
.github/workflows/dev-docker.yml Outdated Show resolved Hide resolved
.github/workflows/dev-docker.yml Outdated Show resolved Hide resolved
@@ -43,7 +43,7 @@ jobs:
# because the `env` context is only accessible at the step level;
# hence, it is hard-coded
image: |-
ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:${{ inputs.image-tag || 'latest' }}
ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:${{ inputs.image-tag || '10.2.3.0.s3-gcs-remote-files' }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it isn't latest here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have misunderstood the process agreed with @popescu-v.
As long as the PR is in review, the specific image build with this branch is not set yet as the latest.
Just before the merge then I can change to the latest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. To fix beforem merging then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; in this case we need specific tag, as the Dockerfiles have been changed in the current PR. If the PR hadn't changed the Dockerfiles, then latest could have been used.

Hence, the process would be:

  1. (before merging) Manually launch the Docker build workflow (cannot be set to latest on the PR!) to check that everything is OK
  2. (before merging) Change tag to 'latest' in the Test workflow (hence, the tests would fail! Their workflow should be canceled from the outset)
  3. Merge the PR into dev
  4. Manually launch Docker build workflow on dev, while setting Set to latest and Publish; thus, the Docker modifications from the (just merged PR) would be tagged latest.
  5. (Just in case) Manually launch all tests (including long ones) on dev to check that everything is OK.

.github/workflows/tests.yml Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
tests/test_remote_access.py Outdated Show resolved Hide resolved
Comment on lines 106 to 107
Temporary skip the S3 and GCS tests under a conda environment
until the drivers for remote files access
are released as conda packages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> Temporarily skip the S3 and GCS tests in a conda environment ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo fixed

tests/test_remote_access.py Outdated Show resolved Hide resolved
tests/test_remote_access.py Outdated Show resolved Hide resolved
Comment on lines 134 to +150
def setUp(self):
self.skip_if_no_config()
if self.is_in_a_conda_env() and self.should_skip_in_a_conda_env():
# TODO : Temporary skip the test under a conda environment
# until the drivers for remote files access
# are released as conda packages
self.skipTest(
f"Remote test case {self.remote_access_test_case()} "
"in a conda environment is currently skipped"
)
self.print_test_title()

def tearDown(self):
# Cleanup the output dir (the files within and the folder)
if hasattr(self, "folder_name_to_clean_in_teardown"):
for filename in fs.list_dir(self.folder_name_to_clean_in_teardown):
fs.remove(
fs.get_child_path(
self.folder_name_to_clean_in_teardown, filename
)
)
fs.remove(self.folder_name_to_clean_in_teardown)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid using hasattr and setting up your own temporary urls, instead of a per test setup/tear-down, I'd use one for the whole class.

  • In the setupClass method you simply set the env var KHIOPS_TMP_DIR with a fresh runner.
  • In the tearDownClass method you clean up everything and restore the original temporary directory and runner.

Something like this example

import unittest
import os
import subprocess
from khiops import core as kh
from khiops.core.internals.runner import KhiopsLocalRunner


class TestWithAltRunner(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        # Save the original runner and temp dir
        cls.original_runner = kh.get_runner()
        cls.original_khiops_temp_dir = os.environ.get("KHIOPS_TMP_DIR")

        # Set the test suite temp dir
        os.environ["KHIOPS_TMP_DIR"] = "/tmp/mytmpdir/" # replace here with the URL
        cls.runner = KhiopsLocalRunner()
    @classmethod
    def tearDownClass(cls):
        # Restore the original runner and temp dir
        kh.set_runner(cls.original_runner)
        if cls.original_khiops_temp_dir is None:
            del os.environ["KHIOPS_TMP_DIR"]
        else:
            os.environ["KHIOPS_TMP_DIR"] = cls.original_khiops_temp_dir

        # Clean the whole alternative temporary file
        # ...

    def test(self):
        print("\n======= Current test runner =========")
        kh.get_runner().print_status()

Thierry RAMORASOAVINA added 4 commits December 20, 2024 19:05
… non-conda environments

- 2 fake servers (fakes3 and fake-gcs-server) are used locally instead of a real S3 and GCS servers.
- The fake servers are provisioned with sample files before the tests run
- During the construction of the khiopsdev docker image we need to install the s3/gcs drivers required by khiops
- As these drivers are not released yet as conda packages we must disable the tests in conda environments (except when using a docker runner)
…s, upload_file and download_file do not return a response but raises an exception in case of error
@tramora tramora force-pushed the 175-integrationtests-gcs-s3-remote-files branch from 4602e44 to a5604b6 Compare December 20, 2024 18:05
)
# normalize the raised exception
except Exception as e:
Copy link
Collaborator

@popescu-v popescu-v Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would restrict the "catch" here to S3UploadFailedError:

except S3UploadFailedError as exc:
    raise RuntimeError(...) from exc

Thus, wenever there is a different (unexpected) exception, it would not be handled and whence would be propagated to the caller code so that we know about it (e.g. from inspecting the CI logs).

# Avoid resolving a fake s3-bucket.localhost hostname
# Alternate builders (buildx via moby buildkit) mount /etc/hosts read-only, the following command will fail
# echo "127.0.0.1 s3-bucket.localhost" >> /etc/hosts
# You will have to add the `add-hosts` input instead (https://github.com/docker/build-push-action/#inputs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we do this directly here? This Docker file is only useful in the CI context anyway.

@@ -79,17 +118,47 @@ def skip_if_no_config(self):
"has no configuration available"
)

@staticmethod
def is_in_a_conda_env():
# CONDA_DEFAULT_ENV and CONDA_PROMPT_MODIFIER
Copy link
Collaborator

@popescu-v popescu-v Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what we want here is to distinguish whether the khiops-core Khiops binaries package is installed in a Conda environment or in a native way (in the OS):

  • if installed natively, then these tests can be run
  • otherwise, they cannot be, until the Conda driver packages are shipped.

In order to detect whether we are running a Conda-installer khiops-core, we can rely on the fact that Conda-installed khiops-core has the Khiops executables in the same place as the Conda environment's bin directory. Hence, we can do something like this:

@staticmethod
def is_in_a_conda_env():
    """Detects whether this is run from a Conda environment and khiops-core is installed in the same env"""
   
   # Get path to the Khiops executable
    khiops_path = kh.get_runner()._khiops_path

    # If $(dirname khiops_path) is identical to $CONDA_PREFIX/bin, then return True
    conda_prefix = os.environ.get("CONDA_PREFIX")
    return conda_prefix is not None and os.path.join(conda_prefix, "bin") == os.path.dirname(khiops_path)

# CONDA_DEFAULT_ENV and CONDA_PROMPT_MODIFIER
# are the only usable env variables that let us know we are in a conda env
conda_default_env = os.environ.get("CONDA_DEFAULT_ENV")
return conda_default_env and conda_default_env.endswith("_conda")
Copy link
Collaborator

@popescu-v popescu-v Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .endswith leaks CI details in the tests (as the _conda env names for Conda-installed khiops-core are a CI detail).

The proposal from the previous comment avoids this.

def setUp(self):
self.skip_if_no_config()
if self.is_in_a_conda_env() and self.should_skip_in_a_conda_env():
# TODO : Temporary skip the test under a conda environment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase TODO comment to something like
# TODO: Temporary skip the test for Conda-installed khiops-core

@@ -177,17 +251,21 @@ def test_train_predictor_fail_and_log_with_remote_access(self):
log_file_path = fs.get_child_path(
self._khiops_temp_dir, f"khiops_log_{uuid.uuid4()}.log"
)

# no cleaning required as an error would raise without any result produced
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "an error would raise" to "an exception would be raised".


runner.samples_dir = f"s3://{bucket_name}/khiops-cicd/samples"
resources_directory = KhiopsTestHelper.get_resources_dir()
# WARNING : khiops temp files cannot be remote
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line before the comment (to have paragraphs).

resources_directory = KhiopsTestHelper.get_resources_dir()
# WARNING : khiops temp files cannot be remote
cls._khiops_temp_dir = f"{resources_directory}/tmp/khiops-cicd"
# whereas root_temp_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete "whereas".

Add empty line before the beginning of the comment (to have the commented block as a paragraph).

runner.samples_dir = f"gs://{bucket_name}/khiops-cicd/samples"
cls._khiops_temp_dir = f"gs://{bucket_name}/khiops-cicd/tmp"
resources_directory = KhiopsTestHelper.get_resources_dir()
# WARNING : khiops temp files cannot be remote
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line before comment.

resources_directory = KhiopsTestHelper.get_resources_dir()
# WARNING : khiops temp files cannot be remote
cls._khiops_temp_dir = f"{resources_directory}/tmp/khiops-cicd"
# whereas root_temp_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete "whereas". Add empty line before the first line of the comment.

Copy link
Collaborator

@popescu-v popescu-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reactivate the S3 and GCS filesystem integration tests GCS tests do not completely clean themselves
3 participants