Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes from previous PR on Running automated tests via CICD #67

Merged
merged 4 commits into from
Sep 18, 2024

Conversation

MukuFlash03
Copy link

The previous PR #62 commit history got messed up due to rebasing to handle the commits that had missing signoffs. Created a new PR to bring over the final changes from there.

The goal was to have automated tests run as a part of the CI/CD pipeline with GitHub Actions.

The commit history and detailed discussion can be seen in the previous PR #62

The previous PR commit history got messed up due to rebasing to handle the commits that had missing signoffs.
Created a new PR to bring over the final changes from there.

PR link: EVerest#62

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Copy link
Collaborator

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MukuFlash03

What if a matrix job B is dependent on matrix job A but matrix job A hasn’t finished building it’s image as yet, then matrix job B might use the older image which it would have pulled and not the locally built updated image.
In this case, the tests would be run on an older image, and could falsely pass the tests.

I don't understand this. The tests cannot run on an older image because every image has a new tag, and you specify the new tag while testing the image. The whole point of having unique tags is to ensure that every image is unique and the behavior of the image is reproducible. You need to ensure that the new image tag is actually passed in to the docker compose properly

So, the matrix job for the mqtt service uses the pulled image layers from the cache for the manager image (it pulled in the first place since the manager image was used as a part of the docker compose), and this pulled cached image did not have the test script file.

I am not sure what you mean by this. Since the manager image with the new tag has not been built, it will not exist to pull

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 17, 2024

I don't understand this. The tests cannot run on an older image because every image has a new tag, and you specify the new tag while testing the image. The whole point of having unique tags is to ensure that every image is unique and the behavior of the image is reproducible. You need to ensure that the new image tag is actually passed in to the docker compose properly

Do you mean to say that the existing workflow yaml file produces a new tag for each image?
If not, should I be manually tagging the image with a randomly generated tag within the workflow and then using it?

There is a step in the cicd.yaml, that uses the TAG defined in the .env file, but it just uses this TAG.

- name: Ensure Docker image version is not referencing an existing release

Or, should I be manually updating the TAG in the .env file ?
But this doesn't make sense as we can't keep updating the .env file for every commit pushed to main.


I am not sure what you mean by this. Since the manager image with the new tag has not been built, it will not exist to pull

Right now, the docker-compose.automated-tests.yml file uses the same TAG from the .env file.

  mqtt-server:
    image: ghcr.io/everest/everest-demo/mqtt-server:${TAG}
...
...
  manager:
    image: ghcr.io/everest/everest-demo/manager:${TAG}

So, as seen in this workflow run, it starts pulling the matching tagged image from GHCR if that image isn't available locally.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 17, 2024

I looked at the commit history specifically for the .env file to understand what you meant about having unique tags.
I also factored in what you had said earlier about building images in case of changes to non-demo files in this comment.

We don't need to rebuild the image if the only change is to the demo files. In this case, it is fine to run the demo script directly.
If there are changes to the non-demo files, we do need to rebuild the images. And we do. We just don't push them. We should be able to test against the locally built images from the previous jobs. Please read up on docker images pull and cache - similar to git repos, if you don't explicitly specify pull, docker will only pull the remote image if it does not exist locally. We have just built these images as part of the github action, so when you run the tests, if you do so in the same environment, they will use the newly built versions.


So, on seeing the commit history, I thought that perhaps, whenever changes were there to the Dockerfiles, or any contents of the service directory (e.g. manager/ , mosquitto/ , nodered/), then the .env file was updated as well.

This can be seen in this commit: 33767cc

But that is not always the case, there are commits where the TAG in .env is not updated on changes to the image build context: 0feea2d


@shankari
Copy link
Collaborator

Or, should I be manually updating the TAG in the .env file ?

This. We build on each commit but we only push when there is a new release, aka the .env file is updated.
Please look at the history of the file and the discussion in the related issue in the US-JOET repo.
We can potentially revisit this later, but not in this first step which was intended to be a small and easy expansion.
Just because we generate and push on every commit in e-mission doesn't mean that is the only way.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 17, 2024

I did a test workflow run by updating the TAG in .env file. I tried two scenarios, but both workflows FAILED.

  • Without the conditional execution of automated tests (workflow run)
  • With the conditional execution of automated tests for only manager matrix job (workflow run)

Both runs resorted to pulling the image with the TAG but did not find it.

  1. Without if condition failed in the mqtt-server matrix job as it couldn't find the manager image with new TAG.
    I initially thought this makes sense and also refers to what you said:

Since the manager image with the new tag has not been built, it will not exist to pull

Error in workflow run:

Run echo "Running docker compose up..."
Running docker compose up...
time="2024-07-17T22:04:22Z" level=warning msg="/home/runner/work/everest-demo/everest-demo/docker-compose.automated-tests.yml: `version` is obsolete"
 manager Pulling 
 manager Error manifest unknown
Error response from daemon: manifest unknown
Error: Process completed with exit code 18.

  1. But in the workflow run with the if condition to run automated tests only in the manager matrix job, it still failed.
    This time however, it failed in the manager matrix job and not the mqtt matrix job as was the case for (1) above.
    Additionally, it didn't find the mqtt image this time and not the manager image.

Error in workflow run:

Run echo "Running docker compose up..."
Running docker compose up...
time="2024-07-[1](https://github.com/MukuFlash03/everest-demo/actions/runs/9982058883/job/27587057199#step:9:1)7T22:02:12Z" level=warning msg="/home/runner/work/everest-demo/everest-demo/docker-compose.automated-tests.yml: `version` is obsolete"
 mqtt-server Pulling 
 mqtt-server Error manifest unknown
Error response from daemon: manifest unknown
Error: Process completed with exit code 18.

In this run, the mqtt image with the new TAG was built and exported to GitHub Actions cache in its matrix job but the manager matrix job wasn't able to access it.
From my previous workflow runs, I've seen that the cached images are able to be accessed across workflow runs.
But the mqtt image built in a parallel matrix job isn't being fetched in the same workflow run.

@MukuFlash03
Copy link
Author

Found some discussions on tags, going through those and found that they do touch on some of the points I've mentioned before:

@shankari
Copy link
Collaborator

It seems like there's a fairly trivial fix for this - have you tried changing the order of the jobs in the matrixed workflow so that the manager is built last?

@MukuFlash03
Copy link
Author

It seems like there's a fairly trivial fix for this - have you tried changing the order of the jobs in the matrixed workflow so that the manager is built last?

I don't think this would work as the documentation states that the order of jobs defines the order of jobs creation.
But given that it's a matrix strategy, all the jobs are still run in parallel.

I did test this out and the workflow still failed as expected, with the same results as above.

    strategy:
      matrix:
        include:
          - host_namespace: ghcr.io/everest/everest-demo
            image_name: mqtt-server
            context: ./mosquitto
          - host_namespace: ghcr.io/everest/everest-demo
            image_name: nodered
            context: ./nodered
          - host_namespace: ghcr.io/everest/everest-demo
            image_name: manager
            context: ./manager

I also tried setting the concurrency to 1 using max-parallel (documentation), so this ensures that each matrix job runs after the previous one has run. Despite this, the same results, workflow failed for manager job as it didn't find the latest mqtt image with the new TAG I added. It's building images in the respective matrix job but not able to access in other jobs.

    strategy:
      max-parallel: 1
      matrix:
        include:

I also tried deleting the cached data in my forked repo, hoping that it's then able to detect the freshly cached data.
But yet again, did not work.


@MukuFlash03
Copy link
Author

What I’m trying now:

1. Separating the Build -> Test -> Push steps into separate jobs.

I had hoped that the images tagged with the new TAG in .env that I manually updated would be found by the automated tests job.
I even cleared the cache so I can be sure that the first job itself builds the images from scratch.
But it failed again.

Workflow run failed.

My assumption is that docker compose is the unable to pull images from cache.
It always goes to ghcr to fetch the image.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 18, 2024

Confirmed: It’s the docker compose that’s unable to use latest built images from the GitHub Actions cache !

To confirm this, I’m going to continue on error or skip the automated tests job for now and see if the last Push job runs and is able to pull images with the latest tag from the cache.
This time I’m not clearing cache.

This will also confirm whether the 1st job uses the cached layers from the previous workflow runs or not.
It should use them since they are part of the GitHub Actions Cache which is available inside the Actions tab for a repository and is GitHub user-specific or organization-specific.


Yes, the Build and the Push jobs worked without the automated tests job included.
Workflow run passed.

Additionally, the both the initial Build job was able to use cached image layers from the previous workflow run.
And the following Push job was able to use the newly tagged image by importing from the GitHub Actions Cache.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 18, 2024

Finally got Docker compose to work without making any changes to the docker compose file.
And more importantly without rebuilding images !

Here's the workflow run


Problem

The problem as identified in the 2nd workflow run output as seen in this comment above, was that the mqtt image was not being found with the latest tag.

And this was surprising since the job before it had just built the images, cached the image layers to GitHub Actions cache. So why was the docker compose still pulling images from GitHub Container Registry?


GitHub Actions cache vs GitHub Container Registry

The answer was in the two GitHub locations I just mentioned above: GitHub Actions cache vs GitHub Container Registry.

So, the build jobs import from and export to the GitHub Actions Cache which is a 'local' storage, local in the sense that it is specific to the user or organization's repository (documentation). The images in our case are not stored as complete images but rather image layers are cached as blob objects.

GitHub Actions registry is a different container service altogether, that stores the final images instead. This is where the docker compose defaults to pulling the images from as it did not find images locally.

@MukuFlash03
Copy link
Author

Solution: How did I get the image to be fetched from GitHub Actions cache?

For the image with the latest tag to be available in the current job's runner instance, I re-used the docker build push action to load both the mqtt-server and manager service images that were built in the previous build job:

      - name: Import mqtt-server from GitHub Actions Cache to Docker
        uses: docker/build-push-action@v6
        with:
          load: true
          context: ./mosquitto
          push: false
          tags: ${{ steps.meta_mqtt.outputs.tags }}
          labels: ${{ steps.meta_mqtt.outputs.labels }}
          cache-from: type=gha,scope=mqtt-server
          cache-to: type=gha,mode=max,scope=mqtt-server

This looks exactly similar to the way we build images in the 1st job in the docker build push action. But a key difference is that, the earlier job was actually a set of matrix jobs. So, for each individual service (mqtt-server, nodered, manager), only their respective images were being fetched from the cache as the context and cache-from scope` key refer to the current matrix job's parameters.

This explains why in the mqtt server matrix job, manager image wasn't found and vice versa as observed in this comment.

@MukuFlash03
Copy link
Author

I got this working by separating the Build -> Test -> Push into different jobs as opposed to different steps.
The reason I did this just to isolate the steps and understand which one wasn't working specifically and why.

Now that we know this, we can get this to work by keeping them in a single job as well by making use of conditional statements to run steps only for certain matrix jobs (mostly, manager, for automated tests for now).

Doing this now.

@shankari
Copy link
Collaborator

Note that your issue is that the matrixed workflow creates multiple jobs, one for each iinput.
docker build push action has an example of how to share images across jobs. Maybe start with that instead of reinventing the wheel?!

@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 18, 2024

docker build push action has an example of how to share images across jobs.
Yes, that's what I am heading towards as well.

I did take a look at initially and mentioned it here as an approach.
At that point I was considering having a separate job for testing and that would need manager image to be downloaded / uploaded as an artifact as well wasn't feasible since manager's built image is around ~7.5 GB. Hence I didn't go ahead with it.

But now having narrowed the core issue down, I now understand I would only need the mqtt-server image to be made available in the manager's matrix job so the automated tests can run.

So, will finalize on that approach now.

…x job

Found out an issue with the latest tagged image not being fetched in the docker compose command.
The reason was that the matrix job only loaded its respective image from the cache.
But the docker compose command for running automated tests uses both manager and mqtt server images.

To solve this, initially I used docker build push action to reload the mqtt image in the matrix job for manager.
But then finally went with the approach of using artifacts to share the mqtt image between jobs - uploaded in mqtt matrix job, downloaded in manager matrix job.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
@MukuFlash03
Copy link
Author

MukuFlash03 commented Jul 19, 2024

Latest commit now using artifacts to share the mqtt image and make it accessible in the matrix job for manager.

This is based on the example to share images from the documentation.

Four new steps introduced:

  • In mqtt matrix job: 1) Saving docker image to .tar file 2) Uploading .tar file as artifact
  • In manager matrix job: 3) Downloading .tar file as artifact 4) Loading image from .tar file

Copy link
Collaborator

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MukuFlash03 I am fine with this PR with the changes below. Can you please apply them? I will then merge. We can move on polishing this; notably thinking about whether we actually need the MQTT server to be built every time.

cache-from: type=gha,scope=${{ matrix.image_name }}
cache-to: type=gha,mode=max,scope=${{ matrix.image_name }}

# Following fours steps are specifically for running automated steps which includes loading the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Following fours steps are specifically for running automated steps which includes loading the
# Following four steps are specifically for running automated steps which includes loading the

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Comment on lines +112 to +115
if: ${{ matrix.image_name == 'mqtt-server' }}
id: save-mqtt-image
shell: bash
run: |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not have to save the node-red container as well? I guess since this is an automated test, maybe not. Can you please add a comment here to clarify?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment.

run: |
docker images
echo "Running docker compose up..."
docker compose --project-name everest-ac-demo \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
docker compose --project-name everest-ac-demo \
docker compose --project-name everest-ac-automated-testing \

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
@MukuFlash03
Copy link
Author

I've committed the suggested changes.


As for the CI/CD failing, I haven't encountered this error before:

Error: buildx failed with: ERROR: failed to solve: process "/bin/sh -c git clone https://github.com/EVerest/everest-core.git         && cd everest-core         && git checkout ${EVEREST_VERSION}         && cd ..         && mkdir -p /ext/scripts         && mv install.sh /ext/scripts/install.sh         && mv everest-core /ext/source         && /entrypoint.sh run-script install" did not complete successfully: exit code: 1

I went ahead and tested it on my forked repo with the following configurations and still it failed.
This is surprising since all of these had passed (or failed for the failure test-case) as expected whenever I've tested them.

Different testing configurations used:

  1. Created new branch based off the PR branch (run-automated-tests); only changes being addition of test script that I've been using for success/failure test case. [ commit, workflow run ]

  2. Reused local branch I used for testing in my forked repo; used run-test.sh as entrypoint. [ commit, workflow run ]

  3. Reused local branch, used test script for success test case as entrypoint. [ commit, workflow run ]


Found some related public issues / discussions:
Buildx failure issue (link), Failed run in a repo (link)


I suspect the issue seems to be with either:

  • everest-core -> since this is the step no. 3 in the Dockerfile that clones the everest-core repo and the failure occurs
  • docker buildx -> since the related issues I've mentioned have encountered this when using buildx

@MukuFlash03
Copy link
Author

Also checked out the main branch after pulling in latest changes from remote origin after syncing with parent repo.
The workflow failed with the same error. [ commit, workflow run ]

@MukuFlash03
Copy link
Author

Investigated this more to narrow down root cause.
The steps I took included commenting out layers from manager/Dockerfile to confirm that the failing layer was the RUN layer:

# Cloning the repo now and copying files over
RUN git clone https://github.com/EVerest/everest-core.git \
        && cd everest-core \
        && git checkout ${EVEREST_VERSION} \
        && cd .. \
        && mkdir -p /ext/scripts \
        && mv install.sh /ext/scripts/install.sh \
        && mv everest-core /ext/source \
        # Don't run the test-and-install script since it deletes the build directory!
        && /entrypoint.sh run-script install

Next, within the RUN command I commented some layers and the next level was the entrypoint.sh script which is present in everest-ci. This script executes the manager/install.sh script as this is the argument passed to it.

Now in this install.sh script, I added some log statements:

#!/bin/sh
cd /ext/source \
&& mkdir build \
&& cd build \
&& echo "CI/CD Test: Running cmake" \
&& cmake -DBUILD_TESTING=ON ..\
&& echo "CI/CD Test: Running make j6" \
&& make install -j6 \
&& echo "CI/CD Test: Running make install_everest_testing" \
&& make install_everest_testing \


On running the workflow again, it only had one echo statement - CI/CD Test: Running cmake, meaning the install.sh script couldn't proceed past this command - cmake -DBUILD_TESTING=ON ..\

Workflow run [can search for "CI/CD Test: Running" in the GitHub Actions workflow logs"]

It's during this command for configuring and setting up the build, that the CI/CD fails.

@shankari
Copy link
Collaborator

shankari commented Aug 9, 2024

@MukuFlash03 I don't actually see that you have identified the root cause.

I understand that the CI/CD fails while configuring and setting up the build, but what is the cause of the failure in the configuration? We can't actually merge this PR while the workflow is broken.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Aug 9, 2024

Right, took a deeper look at the GitHub Action workflow logs.
I have listed the expected successful logs vs failure logs

The erroneous files are present in everest-core/cmake


  1. Expected Successful Logs
    Workflow run from the last successfully merged PR Add general citrine support #58 (commit) into the main everest-demo repo
#9 104.2 -- Performing Test C_COMPILER_SUPPORTS_WFORMAT_SIGNEDNESS
#9 104.3 -- Performing Test C_COMPILER_SUPPORTS_WFORMAT_SIGNEDNESS - Success
#9 104.3 -- CPM: adding package [email protected] (v0.2.1 to /ext/cache/cpm/everest-utils/eb27d0f4a46a0b821ae0631482182139d41c3e60/everest-utils)
#9 105.3 [edm]: Using metadata file: /ext/source/build/everest-metadata.yaml
#9 105.6 Found ev-cli version '0.0.24' which satisfies the requirement of ev-cli version '0.0.24'
#9 105.6 -- APPENDING /ext/source to EVEREST_PROJECT_DIRS
#9 105.6 -- Adding type definitions from /ext/source/types
#9 105.6 -- Adding error definitions from /ext/source/errors
#9 105.6 -- Adding interface definitions from /ext/source/interfaces
#9 105.6 -- Setting up C++ module API

  1. Failure Logs
    Workflow run from this current PR Changes from previous PR on Running automated tests via CICD #67
#8 91.44 -- Performing Test C_COMPILER_SUPPORTS_WFORMAT_SIGNEDNESS
#8 91.50 -- Performing Test C_COMPILER_SUPPORTS_WFORMAT_SIGNEDNESS - Success
#8 91.55 -- CPM: adding package [email protected] (v0.2.1 to /ext/cache/cpm/everest-utils/eb27d0f4a46a0b821ae0631482182139d41c3e60/everest-utils)
#8 92.28 [edm]: Using metadata file: /ext/source/build/everest-metadata.yaml
#8 92.37 CMake Error at cmake/ev-cli.cmake:4 (find_program):
#8 92.37   Could not find EV_CLI using the following names: ev-cli
#8 92.37 Call Stack (most recent call first):
#8 92.37   cmake/ev-project-bootstrap.cmake:3 (include)
#8 92.37   CMakeLists.txt:103 (include)
#8 92.37 
#8 92.37 
#8 92.38 -- Configuring incomplete, errors occurred!
#8 92.38 See also "/ext/source/build/CMakeFiles/CMakeOutput.log".
#8 92.38 See also "/ext/source/build/CMakeFiles/CMakeError.log".
#8 ERROR: process "/bin/sh -c git clone https://github.com/EVerest/everest-core.git         && cd everest-core         && git checkout ${EVEREST_VERSION}         && cd ..         && mkdir -p /ext/scripts         && mv install.sh /ext/scripts/install.sh         && mv everest-core /ext/source         && /entrypoint.sh run-script install" did not complete successfully: exit code: 1
------
 > [3/7] RUN git clone https://github.com/EVerest/everest-core.git         && cd everest-core         && git checkout 2024.3.0         && cd ..         && mkdir -p /ext/scripts         && mv install.sh /ext/scripts/install.sh         && mv everest-core /ext/source         && /entrypoint.sh run-script install:
92.37 CMake Error at cmake/ev-cli.cmake:4 (find_program):
92.37   Could not find EV_CLI using the following names: ev-cli
92.37 Call Stack (most recent call first):
92.37   cmake/ev-project-bootstrap.cmake:3 (include)
92.37   CMakeLists.txt:103 (include)
92.37 
92.37 
92.38 -- Configuring incomplete, errors occurred!
92.38 See also "/ext/source/build/CMakeFiles/CMakeOutput.log".
92.38 See also "/ext/source/build/CMakeFiles/CMakeError.log".

@MukuFlash03
Copy link
Author

MukuFlash03 commented Aug 10, 2024

I built the manager Docker image locally and took a look at the log files mentioned in the docker build command logs as mentioned above in this comment.

#8 92.38 See also "/ext/source/build/CMakeFiles/CMakeOutput.log".
#8 92.38 See also "/ext/source/build/CMakeFiles/CMakeError.log".

CMakeError.log
CMakeOutput.log

CMakeOutput.log file doesn't seem to contain any errors.
CMakeError.log contains 150 errors on doing a normal search.

I am not sure if anyone are related to the everest project or any code from the repo.
I will need to look deeper in the code.
But just looking at the errors indicates the errors are unrelated to everest code and are C++ library errors, syntax errors, etc.


Some error types seen are:

Type 1:

collect2: error: ld returned 1 exit status
make[1]: *** [CMakeFiles/cmTC_9b31f.dir/build.make:102: cmTC_9b31f] Error 1
make[1]: Leaving directory '/ext/source/build/CMakeFiles/CMakeTmp'
make: *** [Makefile:127: cmTC_9b31f/fast] Error 2

Type 2:

/ext/source/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:2:10: fatal error: openssl/base.h: No such file or directory
    2 | #include <openssl/base.h>

Type 3:

/ext/source/build/CMakeFiles/CheckTypeSize/SIZEOF___INT64.c:27:22: error: '__int64' undeclared here (not in a function); did you mean 'u_int'?
   27 | #define SIZE (sizeof(__int64))

Type 4:

/ext/source/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c: In function 'main':
/ext/source/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:41:19: error: 'stricmp' undeclared (first use in this function); did you mean 'strncmp'?
   41 |   return ((int*)(&stricmp))[argc];

Type 5:

/ext/cache/cpm/libcurl/deeb9f2c9bd611ea683f1475bb4c2b7e3a5d7642/libcurl/CMake/CurlTests.c: In function 'main':
/ext/cache/cpm/libcurl/deeb9f2c9bd611ea683f1475bb4c2b7e3a5d7642/libcurl/CMake/CurlTests.c:406:3: error: too many arguments to function 'fsetxattr'
  406 |   fsetxattr(0, 0, 0, 0, 0, 0);

Type 6:

/ext/cache/cpm/libcurl/deeb9f2c9bd611ea683f1475bb4c2b7e3a5d7642/libcurl/CMake/CurlTests.c:82:23: error: storage size of 'hdata' isn't known
   82 |   struct hostent_data hdata;

Type 7:

/ext/cache/cpm/libcurl/deeb9f2c9bd611ea683f1475bb4c2b7e3a5d7642/libcurl/CMake/CurlTests.c: In function 'main':
/ext/cache/cpm/libcurl/deeb9f2c9bd611ea683f1475bb4c2b7e3a5d7642/libcurl/CMake/CurlTests.c:383:51: error: subscripted value is neither array nor pointer nor vector
  383 |   check(strerror_r(EACCES, buffer, sizeof(buffer))[0]);

Type 8:

/ext/cache/cpm/libcurl/deeb9f2c9bd611ea683f1475bb4c2b7e3a5d7642/libcurl/CMake/CurlTests.c:430:31: error: expected ')' before numeric constant
  430 |   if(__builtin_available(macOS 10.12, *)) {}

@MukuFlash03
Copy link
Author

Since I didn't find much info in the log files generated, coming back to the error message:

#8 92.37 CMake Error at cmake/ev-cli.cmake:4 (find_program):
#8 92.37   Could not find EV_CLI using the following names: ev-cli
#8 92.37 Call Stack (most recent call first):
#8 92.37   cmake/ev-project-bootstrap.cmake:3 (include)
#8 92.37   CMakeLists.txt:103 (include)

The EV_CLI name not being found could be the next thing to investigate:

Could not find EV_CLI using the following names: ev-cli

@MukuFlash03
Copy link
Author

MukuFlash03 commented Aug 10, 2024

These are the two files mentioned in the error logs:

cmake/ev-cli.cmake
cmake/ev-project-bootstrap.cmake


Changes to cmake/ev-cli.make were last made on June 24th, 2024 - history

Changes to cmake/ev-project-bootstrap.cmake were last made on July 10th, 2024 - history


The last successful CI/CD run for everest-demo occurred on June 26th, 2024.

Changes in cmake/ev-cli.make were made before this last successful run, so this should not be the reason.

However, changes in cmake/ev-project-bootstrap.cmake were made after this last successful run and seem to be responsible for the error.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Aug 12, 2024

To investigate further, I changed the source repo in the RUN command to use a different branch in my forked repo for everest-core:

ARG EVEREST_VERSION=2024.3.0
ENV EVEREST_VERSION=${EVEREST_VERSION}

...
...

RUN git clone https://github.com/EVerest/everest-core.git \
        && cd everest-core \
        && git checkout ${EVEREST_VERSION} \
...
...

This however failed saying that no version spec found.
This was because my forked repo did not have any release tags and hence did not match the EVEREST_VERSION.


The EVEREST_VERSION refers to the released / tagged version of the everest-core repo found here.

I took a look at the EVEREST_VERSIONs and the latest released tags and found that the currently used TAG 2024.3.0 was released in Apr 2024.

The latest TAG 2024.7.1 was released in August 2024.

@MukuFlash03
Copy link
Author

MukuFlash03 commented Aug 12, 2024

So I began testing locally first by just building the manager image (since this is where the CI/CD was faliing):

docker build --no-cache -t troubleshoot-image-2 manager/

As mentioned in this comment, I had added log statements to the manager/install.sh script, and I used it to measure till what step the config was getting executed:

&& cmake -DBUILD_TESTING=ON ..\
&& make install -j6 \
&& make install_everest_testing \

I tested with these EVEREST_VERSION tags: 2024.4.0, 2024.5.0, 2024.7.0, 2024.7.1

FAILED - 2024.4.0, 2024.5.0

  • Fails during the cmake step; same ev_cli error encountered

FAILED - 2024.7.0

  • Passed the cmake, moved onto make install j6.
  • Went much further than these previous releases (close to 98% completion)

FAILED - 2024.7.1

  • Went even further, reached 100% but still failed as not all steps were completed.
  • Encountered a different error this time
1042.7 /usr/include/boost/asio/read_at.hpp:398:25: internal compiler error: Segmentation fault
1102.3 [100%] Built target OCPP201
1102.3 make: *** [Makefile:166: all] Error 2

@MukuFlash03
Copy link
Author

Seeing that the newer release versions of everest-core were pretty close to being successful while building the manager docker image, I thought of trying it out on the GitHub Actions workflows and not locally.


1. Testing with a copy of the current main branch of everest-demo - PASSED

First, I made a copy of the current main branch of everest-demo, changed the EVEREST_VERSION in manager/Dockerfile and ran the workflow.

As noted in the above comment, with the currently defined EVEREST_VERSION = 2024.3.0 , the workflow run had failed even for the main branch.

But, in the workflow run that I triggered with EVEREST_VERSION = 2024.7.1 (commit, workflow run).

The workflow PASSED !


2. Testing with a my forked repo everest-demo - FAILED

Next, I set the EVEREST_VERSION = 2024.7.1 in my forked repo and this FAILED, but it almost PASSED.
The error was not the ev_cli make error. In fact the entire build configuration setup process had been completed successfully.

The manager docker image was also built successfully.

It failed in the import / export docker image step which is a part of the Build and Push to Docker step in the workflow in the docker-build-and-push-images job, which is the first job in the workflow. (commit, workflow run)

#15 loading layer 606172f40a81 249B / 249B 36.0s done
#15 ERROR: write /ext/source/build/modules/CMakeFiles/GenericPowermeter.dir/GenericPowermeter/main/powermeterImpl.cpp.gcno: no space left on device

#14 exporting to docker image format
#14 sending tarball 70.0s done
#14 ERROR: rpc error: code = Unknown desc = write /ext/source/build/modules/CMakeFiles/GenericPowermeter.dir/GenericPowermeter/main/powermeterImpl.cpp.gcno: no space left on device
------
 > exporting to docker image format:
------
------
 > importing to docker:
------

 1 warning found (use docker --debug to expand):
 - FromPlatformFlagConstDisallowed: FROM --platform flag should not use constant value "linux/x86_64" (line 1)
ERROR: failed to solve: rpc error: code = Unknown desc = write /ext/source/build/modules/CMakeFiles/GenericPowermeter.dir/GenericPowermeter/main/powermeterImpl.cpp.gcno: no space left on device
Reference
Check build summary support
Error: buildx failed with: ERROR: failed to solve: rpc error: code = Unknown desc = write /ext/source/build/modules/CMakeFiles/GenericPowermeter.dir/GenericPowermeter/main/powermeterImpl.cpp.gcno: no space left on device

No space left, memory issues in the Github action runner OS image?

@MukuFlash03
Copy link
Author

MukuFlash03 commented Aug 13, 2024

Next immediate goal before implementing anything:

  • Figure out WHY build failed or why did it pass?
  • What does EVEREST_VERSION tag have to do with it?
  • Different branches / tags / releases in the past would be expected to work. Why is the previous version not working?

Focus on these fundamental questions; not everything has to be implemented / executed always. Now that I've tried so much stuff out, I should be able to answer these.

@andistorm
Copy link

Hey @MukuFlash03;
regarding your request in the today's working group meeting (I couldn't find you on Zulip). I hade a quick look now on what you did. I am not 100% sure what the goal is here. Could you maybe just ping me on Zulip an explain your goal and summarize your questions regarding the matrix workflow? (https://lfenergy.zulipchat.com/login/)

Added the hardcoded build-kit alpine image with the EVEREST_VERSION = 2024.3.0 and matching SHA256 digest to the previous latest tag version which has now been moved to untagged (7494bd6624aee3f882b4f1edbc589879e1d6d0ccc2c58f3f5c87ac1838ccd1de)

Found this image by going through the logs in the last successful run (in my forked repo) and looking at the FROM layer in the Docker build step.
fa60246

Image link:
https://github.com/everest/everest-ci/pkgs/container/build-kit-alpine/161694439

---

Expected: Should pass since it matches configurations of last successful run with matching EVEREST_VERSION and SHA256 digest for build-kit alpine base image.

----

Will add more details in PR comments.

----

Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
@MukuFlash03
Copy link
Author

MukuFlash03 commented Sep 11, 2024

@shankari identified the possible likely cause of the issue where the build is failing for the manager image.

The problem might be the latest build-kit-alpine image which is used as the base image for manager.

FROM --platform=linux/x86_64 ghcr.io/everest/build-kit-alpine:latest

Now, the latest version could be compatible with the latest EVEREST_VERSION of everest-core tags; but not compatible with the older versions.
So for now, hardcoding the build-kit-alpine version to the last compatible version with EVEREST_VERSION = 2024.3.0 should work.


Finding the correct tag was slightly tricky since there isn’t a way to know which all versions were tagged as “latest”.

A. Going through list of tags here

Current latest points to a version released on Aug 8th. But my changes were from Jun 3rd week, mid-July.

I tried testing out different versions around that time. I did see versions have download numbers listed as well here. So maybe, the one with highest downloads should work.
This was v.1.2.0 released on June 14th, which seemed like the correct one.

Result of workflow run: Failed as well with the same ev-cli cmake error.


B. Inspecting the Workflow logs

I inspected the last successful workflow runs and looked for the Docker build step hoping to find some more info other “latest” for the build-kit alpine image.

I did find the specific image finally:

#5 [1/8] FROM [ghcr.io/everest/build-kit-alpine:latest@sha256:7494bd6624aee3f882b4f1edbc589879e1d6d0ccc2c58f3f5c87ac1838ccd1de](http://ghcr.io/everest/build-kit-alpine:latest@sha256:7494bd6624aee3f882b4f1edbc589879e1d6d0ccc2c58f3f5c87ac1838ccd1de)
#5 resolve ghcr.io/everest/build-kit-alpine:latest@sha256:7494bd6624aee3f882b4f1edbc589879e1d6d0ccc2c58f3f5c87ac1838ccd1de done
#5 DONE 0.0s

The SHA256 Digest helps to uniquely identify the image.
But I did not find this tagged image in the list of tagged images here. I did however find it in the untagged images list here.

This image with the matching SHA256 digest 7494bd6 - has a high number of downloads = 4727, was released 9 months ago.

Result of workflow run: The run passed successfully.

@shankari shankari merged commit 79d0f94 into EVerest:main Sep 18, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants