Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous Integration / Continuous Delivery woes #268

Open
8 of 15 tasks
cerna opened this issue Mar 19, 2020 · 87 comments
Open
8 of 15 tasks

Continuous Integration / Continuous Delivery woes #268

cerna opened this issue Mar 19, 2020 · 87 comments

Comments

@cerna
Copy link
Contributor

cerna commented Mar 19, 2020


Tracking progress:

  • Eat the Dovetail-Automata/mk-cross-builder repository
  • Eat the zultron/mk-cross-builder repository
  • Start building/testing the Machinekit-HAL Debian Bullseye packages
  • Automate Docker images build and upload to Github Packages
  • Sign built packages with a key using dpkg-sig
  • Upload packages to Machinekit/Machinekit-HAL Cloudsmith.io repository
  • Run CMOCKA tests
  • Run Python tests
  • Rework Debian builder docker image build script to use configuration JSON (one point for settings)
  • Rework Debian package builder script to use configuration JSON (one point for settings)
  • Run amd64, armhf and arm64 runtests in Drone Cloud CI
  • Rework Drone CI yaml script into saner form - JSONScript with autoregeneration by git hook
  • Start building packages and running tests on Ubuntu 18.04 Bionic
  • Rework Travis CI yaml script into saner form
  • Implement saner git hook solution using Lefthook

As you probably know, the Machinekit buildserver (jenkins.machinekit.io) was turned off. It was done quietly, so I have no idea when exactly it happened. It ran test on pull requests and uploaded debian packages into the Machinekit's deb.machinekit.io repository. This is important for the C4 as without testing nothing can be merged, respective everything has to be merged.

So as an emergency measure the Travis build system was turned back on for the Machinekit-HAL repository. In response to discussion with @luminize - where we agreed that some artifact output from CI run would be nice, so users can download the .deb packages and use dpkg -i command to install - I implemented simple Github Actions workflow in the quick&dirty style as Travis doesn't allow keeping artifacts for later download and it's nice for users to have automated build of packages in theirs forks. Github keeps the build artifacts for 90 days after the fact.

Given the volatile situation with *.machinekit.io server, I think that its state should be conserved and package upload should be resumed into the Packagecloud and Cloudsmith repositories (for redundancy). I can start the upload now or when the package build for Machinekit-CNC is ready (currently Machinekit-CNC repository has no CI for testing and package build), but I think that the time for it is after both Machinekit-HAL and Machinekit-CNC can be installed. I also think that it is a time to drop Jessie support (Jessie will be obsolete in 3 months) and make sure that Machinekit-HAL runs and produces packages on Bullseye.

If I understand it correctly, the reason behind own Jenkins server was the 50 minuter running limit of Travis CI. Well, situation is vastly different today with everybody's mother giving Open-Source Projects free minutes on their cloud services. So how relevant is for example this issue for the current status of the project? Given that CI/CD rework will have to include solving the #195 issue, there is quite big window for any changes. Including the build Dockerfiles will also hopefully solve the current issues with image availability. (Machinekit-HAL currently uses my own DockerHUB account, as there are some missing images in the DovetailAutomata account, which caused #263.)

CI tools of today are also using the container-first approach, for which the current script build_with_docker is poorly suited. This script should be left for home use and the build commands exported to functions that then can be called directly from the container.

There is also provider called Drone Cloud which gives Open-Source projects free build minutes directly on armhf and arm64 hardware. This could be used to test Machinekit-HAL not only on amd64 as it is now but also on ARMs.

Last but not least, there are now cmocka tests in Machinekit-HAL. I am all in for unit testing. However, no tests are actually run now and the build server are not even installing the cmocka debian package. So that will also need to be addressed.

@zultron
Copy link
Contributor

zultron commented Mar 20, 2020

Not too important, but for the sake of interest, the history of the CI was as follows. We initially built packages on a buildbot in my shop. The ARM builds were painfully slow because they were running in an ARM chroot on x86 hardware in emulated mode. Later, we moved CI to Travis. ARM builds took longer than the 50 minute limit on the 2-CPU Travis builders, so somebody hacked the build system to split the POSIX, Xenomai and RT-PREEMPT builds into separate builds. That worked to get everything built, except it ended up causing another problem. I think you're right that @ArcEye set up the Jenkins server and recombined the ARM builds in order to finally solve this problem: no time limits on the combined ARM builds. Somewhere in there, I went in and got proper cross-building to work, and later set up the Docker containers we now use; this was an alternate solution, since (combined) ARM build times were reduced to 10 minutes or so on Travis.

The GitHub Actions and Cloudsmith you're looking at weren't around when we were working on this, so maybe there's an easy solution for package distribution in there. Packagecloud's problem is the draconian data storage and transfer limits made it impractical to use for public package distribution. If I were to try again today, and couldn't find any other simple, inexpensive solution like Packagecloud's but without limits, I would use Travis or any other CI system to upload built packages to a VPS running Docker images for receiving new packages and publishing them in a repo. I probably have scripts around somewhere that do this, or I'd likely try to integrate aptly.

Let me know if you need help, but it sounds like you have a more current understanding of the latest tools available today; things have changed quite a lot in the last several years.

@cerna
Copy link
Contributor Author

cerna commented Mar 21, 2020

Thank you for the historical details about build system, @zultron. I guess this video is related to this subject, right?

Do you have any idea how much traffic the Machinekit project needs? Packagecloud gives 25 GB of storage and 250 GB traffic (currently the watermark is at about 10 GB of stored packages and the traffic is negligible, as it wasn't really used). Cloudsmith gives 250 GB of storage and 1 TB traffic (they say these limits can be upped based on needs, but hard to say from the get go).

I asked on the Debian IRC channel on Freenode if anybody doesn't know of any FOSS project which uses the Cloudsmith service for .deb package distribution, but they just started arguing between themselves about it being proprietary and not on-topic for Debian 🙄 So I don't know how big project it is capable of serving.

Then there is the Oracle Cloud Always Free Tier with 10-100 GB storage and 10 TB transfer limits. And I was also considering using the GitLab Pages for repository hosting. It is possible to maintain this by hand (or this) or probably use the Aptly. They have 10 GB limit on repository and unlimited traffic (yeah, riiiight...).

We will probably have to solve this for the helper repository deb.machinekit.io (I don't have keys).

But in general, I would like for every piece of infrastructure to NOT BE dependent on any specific Machinekit developer or member.

I have created pull request #269 which eats the Dovetail-Automata/mk-cross-builder repository and closes the #195 issue. It would be nice if you could eyeball it. Later, I will take a knife and will go gut your other repository for goodies.

To make it work, I had to add libglib2.0-dev, libgtk2.0-dev, tcl8.6-dev and tk8.6-dev packages into build dependencies of Machinekit-HAL. I am not completely fine with it, mainly because the hanky-panky which goes on with these packages and the Dockerfile. The container happily builds without these packages and then there are dangling symlinks.

@cerna
Copy link
Contributor Author

cerna commented Mar 21, 2020

Bullseye is a problem. There are missing dependencies:

 machinekit-hal-build-deps : Depends: python-zmq (>= 14.0.1) but it is not installable
                             Depends: yapps2-runtime but it is not installable or
                                      python-yapps but it is not installable
                             Depends: python-pyftpdlib but it is not installable

Given that the same dependencies are in the Zultron/mk-cross-builder repository and there are built Docker images in DockerHUB, they must have deleted them in the last 5 months.

I am going to create a pull request with Bullseye cross builder, as this has nothing to do with it.

But, bloody Python.

@cerna
Copy link
Contributor Author

cerna commented Mar 22, 2020

There is some related discussion to this issue on Machinekit Google groups Forum.

@cerna
Copy link
Contributor Author

cerna commented Mar 22, 2020

I have just downloaded pyzmq (source for python-zmq), yapps2 (source for python-yapps) and pyftpdlib (source for python-pyftpdlib) through pip(3) command. Given that there are official Python alternatives, why these packages have to be installed from APT repository?

(I don't understand Python and never liked it.)

@cerna
Copy link
Contributor Author

cerna commented Mar 24, 2020

Update about the Github Actions logic flow for those who are interested:

Now that the Machinekit-HAL integrates the 'mk-cross-builder' functionality, the building of new images has to be integrated into general Machinekit-HAL CI/CD flow. That means the system has to decide when just download the image from Docker repository and when to build new images. I think that the most sensible solution is to build new packages in moment when the pull request event or push event has commits changing files from which the Docker images are build, so scripts/containers/buildsystem/debian, scripts/buildsystem/debian and some other related scripts. And - of course - when the repository has no builder images in its own Github Packages repository - the idea is to create solution which is as turn-key and isolated as it gets. In other cases - when there is a usable pre-existing image - just download it from Docker repository and be done with it.

There lays the first problem - Github Packages is free solution for Open-Source projects which uses the same login credentials as other Github services, so access from Github Actions is straightforward. But you cannot delete public packages, once it is there, it is there. And to download the package - even public one - you have to log in into the repository, and not just by your username and password, but you have to create token with download package access. (They must be on some serious quality product...) However, it's on one network and for the build-system it is the best solution, I think.

So after deciding if to build or pull, in case of building, there will be jobs which will build the builder images, one per type. The build is no problem. It takes on average up to ten minutes, given that these workers run in parallel, it is 10 minutes. Problem is how to share these Docker images with another workers which will run the testing and package building. One way is to upload them to Docker repository, that works very well. However, these images are not yet proven and you cannot delete them afterwards. So that is no good. Another way is to upload them as an artifact to GitHub storage and from other jobs then download them. And there lies a problem. GitHub says that for Open-Source, there is unlimited storage, in reality 'unlimited' means 5 GB per file and 50 GB in total (from other users testing). I was not able to upload the Docker saved images bigger than around 0.5 GB, much less the 2 GB tars. (It will upload the first five, then error out on all else.) The only way how I was able to get them to upload is by compressing them with xz --best - but that takes on average 3x more time than the building process. But even then it is better to delete the Docker image artifacts afterwards - only thing is you cannot do it from the same workflow, so realistically as a first thing you have to delete all Docker images artifacts from previous workflows. (Again, insert the drug comment.)

Fortunately the downloading and loading the artifact image into Docker in another job is fairly quick (2 minutes tops). Minor issue is, that it sometimes fails. But that may be because I am currently using the v2-preview version of download/upload artifacts function. (The second edition worked better on the bigger files.) Then the tests will run pretty much the same way as they are running now. If it passes, then another job will push these tested Docker images into the Github Packages Docker repository (and if the user specified TOKEN, user and repository, somewhere else.

This takes about an hour of parallel running (maximum is 20 workers). Not great but doable. Another way how to do it is to not build the Docker images beforehand in job on which are other jobs waiting, but build them in each job separately. It will mean that all 14 jobs will build the Docker image and run the tests/build packages. Then these images will be thrown away and another job will rebuild the 11 builder images and push them to Github Packages repository. (Cannot be done in the previous job given that all jobs in given matrix must finish with success.) Problem with this solution is obvious - testing and building will be done with different images and images stored will be different too. Given the time differences there could be upstream changes which would have chance to introduce subtle bugs.

I personally don't know which solution I should use. Maybe I am leaning more towards the one with crazy wait times on xz --best given that rebuilding images will not be something done very often. (Or at least I think it won't be.)

@lskillen
Copy link

lskillen commented Mar 24, 2020

Sounds like @cloudsmith-io would be a lot easier, plus it supports debian and redhat natively. ;) It's worth remembering that Docker isn't package management, so building the packages first, then containerising that, is usually a much slicker, flexible and more efficient solution; plus it lets native users install natively. If you need any help with that, just let us know (I work for Cloudsmith).

@zultron
Copy link
Contributor

zultron commented Mar 25, 2020

Do you have any idea how much traffic the Machinekit project needs? Packagecloud gives 25 GB of storage and 250 GB traffic (currently the watermark is at about 10 GB of stored packages and the traffic is negligible, as it wasn't really used). Cloudsmith gives 250 GB of storage and 1 TB traffic (they say these limits can be upped based on needs, but hard to say from the get go).

That sounds like more than it used to be. Also be careful: If stored files ever exceed 25 GB, which will happen after many CI builds but old ones are not pruned, they will suspend further uploads until the next billing cycle, even if you go in after the fact and clean up space. I emailed them about that, and they said it wasn't a bug, but a feature.

But in general, I would like for every piece of infrastructure to NOT BE dependent on any specific Machinekit developer or member.

That became my goal after realizing my mistake hosting CI in my shop, but I never quite realized it. The $5/month VPS with Docker images to manage the service was the best I ever figured out, but again, there are new services out there today and you seem pretty on top of it.

One other that could've worked was Suse's OBS. IIRC, since the repos migrated to deb.mk.io, they started supporting ARM-arch Debian builds, something that wasn't there before. It might be possible to build on Travis or other CI and upload results to a repo there.

I have created pull request #269 which eats the Dovetail-Automata/mk-cross-builder repository and closes the #195 issue. It would be nice if you could eyeball it.

That looks like a good way to pull the images in. I originally anticipated it becoming a new repo under the MK GH org, but integrating with the mk-hal repo sounds just as good, without having thought about it too hard.

To make it work, I had to add libglib2.0-dev, libgtk2.0-dev, tcl8.6-dev and tk8.6-dev packages into build dependencies of Machinekit-HAL. I am not completely fine with it, mainly because the hanky-panky which goes on with these packages and the Dockerfile. The container happily builds without these packages and then there are dangling symlinks.

Have you checked that the "hanky panky" is still even necessary? I think there was a problem in older gcc versions where gcc --sysroot=/sysroot still looked in the default /usr/include for headers by default, but later was fixed to look in /sysroot/usr/include. Could be that some of that ugliness isn't necessary anymore.

Ultimately, someday Debian packaging multi-arch support might work properly, and then the need for ALL of that ugliness will go away. I bet that might never happen with TCL/TK, but maybe MK will be TCL/TK-free someday. I also remember cython being one of the blockers. When that's all resolved, cross-building Debian packages will be possible with automatic dependency resolution using standard tools and simple scripts out of Docker. Maybe I'll have to keep smoking my pipe several more years first, though.

@zultron
Copy link
Contributor

zultron commented Mar 25, 2020

I have just downloaded pyzmq (source for python-zmq), yapps2 (source for python-yapps) and pyftpdlib (source for python-pyftpdlib) through pip(3) command. Given that there are official Python alternatives, why these packages have to be installed from APT repository?

Because APT installing machinekit .debs can't pull in those dependencies unless they're also referenced in an APT repository.

This is a packaging problem, not a Python problem. It used to be the same problem with e.g. libjansson and other non-Python sources that weren't packaged in Debian.

(I don't understand Python and never liked it.)

If this issue is turning into an arena for a battle about favorite programming languages, then my gambit is, "I like Python and I'm happy that many other projects I'm involved with, like ROS and RedHat, make extensive use of it."

@cerna
Copy link
Contributor Author

cerna commented Mar 29, 2020

(...)One other that could've worked was Suse's OBS(...)

I was thinking about getting the ZeroMQ packages from there. But it seems that not all architectures currently supported in Machinekit project are available there. I will add it into to investigate roster. Frankly, the upload can be pretty much anywhere and everywhere.

That looks like a good way to pull the images in. I originally anticipated it becoming a new repo under the MK GH org, but integrating with the mk-hal repo sounds just as good, without having thought about it too hard.

Actually, it was based on your comment. It just made more sense to me to do it this way given that you need to somehow upgrade the images when new Debian package dependencies are added or updated as it is a build input. This way it will be part of normal Machinekit-HAL testing/package build flow.

(...)Have you checked that the "hanky panky" is still even necessary?(...)

No, I discovered that commit 05bad16 needs reverting otherwise build configure doesn't work, but that's all. I didn't want to follow in too many rabbit holed. This is change which can be done gradually in small steps in line with the C4, so I am trying to do it that way instead of big bang. (You would still need the packages I mentioned as that is checked in configure.)

But when it is time to purge Jessie parts, that's the time to look into it more, I think.

(...)cross-building Debian packages will be possible with automatic(...)

There is (was) developer - @Shaeto -in legacy Machinekit/Machinekit repository interested in making Fedora .rpm packages. I have no idea if the logic could be somehow generalized above distro specific parts, but just for sure I put everything into debian/ folders.

Of course, maybe with CMAKE support and CPACK, that will be solved on its own. Or maybe the use of FPM/NFPM or other similar projects should be investigated.

@cerna
Copy link
Contributor Author

cerna commented Mar 29, 2020

If this issue is turning into an arena for a battle about favorite programming languages, then my gambit is, "I like Python and I'm happy that many other projects I'm involved with, like ROS and RedHat, make extensive use of it."

This issue is about what is specified in the title and first introductory post. I am a programmer, I do solve problems. That being said, solving problems in areas I care about means that I understand the particularities of given territory, and so I am able to produce much better solution, i.e. elegant code. So far, I have been able to come up with following ideas:

  1. Distribute problematic APT packages in special repository

  2. Install the python modules by pip in postinst script

  3. Use Python dh-virtualenv project

There is python-zmq available from the official ZeroMQ Open Build Service account. But no arm builds. The yapps2-runtime is build from yapps2 source, from which yapps2 package is build. Then there is python3-pyftpslib version of python 2 python-pyfptdlib.

But as you do like Python and have much bigger practical experience with it, I will gladly defer it to you. I just want Machinekit-HAL on Bullseye.

In this light, I would postpone the discussion about favourite languages and the associated altercation to later date. Looking through the calendar, my preference is never. How is that working for you? (We will have to consult with the TCL/TK guy yet.)

@cerna
Copy link
Contributor Author

cerna commented Mar 29, 2020

@lskillen,
thank you for piping in. I thought I have seen that avatar somewhere - in the love-open-source example repository.

(...)worth remembering that Docker isn't package management, so building the packages first, then containerising that, is usually a much slicker, flexible and more efficient solution; plus it lets native users install natively(...)

Actually, the Docker images are only used for the .deb package build (and testing). They solve the build and runtime dependencies so the CI Virtual Machine doesn't have to do it on per package basis.

The end result is distributed as a .deb (and in future maybe as a .rpm).

If you need any help with that, just let us know (I work for Cloudsmith).

I will take you on the offer:

  1. (You have Circle-CI Orb already.) Of course, I can use the Cloudsmith-CLI application in Docker, but that additional code and there is probably many projects which could use it.

  2. When I created the Machinekit organization and the Machinekit-HAL repository on @cloudsmith-io, I had to select the "I have enough pull in the project" box. So - as a company - how do you look at distributing packages in this repository which are needed as a runtime dependency, but are not actual part of the project? (Like the pyzmq, xenomai-*, libevl etc - basically ones which do not have official upstream repositories for all architectures and Debian versions, so we have to hack it.)

@zultron
Copy link
Contributor

zultron commented Mar 31, 2020

1. Distribute problematic APT packages in special repository

The MK project has been known to do that in the past. The Jessie repo has (had?) kernel, ZMQ-related and other packages unavailable in the upstream distro. The effort hand-roll a package for a 3rd-party APT repo turns out to an annoying, but often trivial (using existing packaging sources) and one-time pain.

2. Install the python modules by `pip` in `postinst` script

Using pip is fine for folks building from source, but using it in a postinst script defies common practice, and inventing new uses for postinst can be fraught with problems (as we've seen even in MK packaging where package scripts were used to manage symlinks).

3. Use Python [dh-virtualenv](https://github.com/spotify/dh-virtualenv) project

Looks like a pretty cool project for someone (else) to dig into.

There is python-zmq available from the official ZeroMQ Open Build Service account. But no arm builds. The yapps2-runtime is build from yapps2 source, from which yapps2 package is build. Then there is python3-pyftpslib version of python 2 python-pyfptdlib.

These are a potential source of packaging to help with option (1).

But as you do like Python and have much bigger practical experience with it, I will gladly defer it to you. I just want Machinekit-HAL on Bullseye.

Let's go with option (1) and publish .deb dependencies missing from upstream repos in a 3rd-party repo, repackaging from sources that already exist for other distros (Buster, Sid, etc.). This is the easiest option, especially given that we've done it already and it'll be easy to do it again.

In this light, I would postpone the discussion about favourite languages and the associated altercation to later date. Looking through the calendar, my preference is never. How is that working for you? (We will have to consult with the TCL/TK guy yet.)

Perfect! 100% on the same page, calendar page or otherwise.

On the other hand, I can find time on my calendar soon to help build packages for missing deps, a trivial task. If you had a list of exactly which packages are missing from Bullseye, that would save me half the work.

If the Docker CI images are still useful, @ArcEye submitted a PR at Dovetail-Automata/mk-cross-builder#8 to support Bullseye, which I'm feeling very embarrassed about having dropped right now.

I think the non-trivial part of the APT packaging equation is still going to be building the EMC application. If help is needed with that, I'll volunteer once MK-HAL is packaged up and online. Part of my LCNC-EMC port was to work out many of the fundamental build system issues left from the HAL/CNC repo split.

@lskillen
Copy link

@cerna Just noticed your reply now. :-)

Actually, the Docker images are only used for the .deb package build (and testing). They solve the build and runtime dependencies so the CI Virtual Machine doesn't have to do it on per package basis.

The end result is distributed as a .deb (and in future maybe as a .rpm).

Fantastic; we have similar techniques for our own internal environments at Cloudsmith.

1. (You have Circle-CI Orb already.) Of course, I can use the Cloudsmith-CLI application in Docker, but that additional code and there is probably many projects which could use it.

Yup! Use it if you can. Any suggestions or code enhancements are also welcome.

2. When I created the Machinekit organization and the Machinekit-HAL repository on @cloudsmith-io, I had to select the "I have enough pull in the project" box. So - as a company - how do you look at distributing packages in this repository which are needed as a runtime dependency, but are not actual part of the project? (Like the `pyzmq`, `xenomai-*`, `libevl` etc - basically ones which do not have official upstream repositories for all architectures and Debian versions, so we have to hack it.)

We're more than happy for you to upload dependencies, assuming that you utilise the repository for distribution of your primary artefacts as well; i.e. it can't be for dependencies only. Other than that, the only real requirement is a (polite) link back to Cloudsmith.

However, you're free to organise your pipeline into multiple repositories; e.g. you can create a repository just for the dependencies, as long as you have another repository for your primary artefacts. That would keep your outputs separate from the artefacts.

In the future we'll help organise this for automatically by labelling dependencies explicitly.

@cerna
Copy link
Contributor Author

cerna commented Mar 31, 2020

@zultron,

On the other hand, I can find time on my calendar soon to help build packages for missing deps, a trivial task. If you had a list of exactly which packages are missing from Bullseye, that would save me half the work.

  • Newly installed Debian Bullseye on AMD64 from minimal install CD ISO with Debian Desktop Environment: Cinnamon, Print server and Standard System Utilities
  • Apt updated and upgraded
  • Added the Debian Bullseye Machinekit repository: deb http://deb.machinekit.io/debian bullseye main
  • Installed manually by sudo apt install command: git
  • Installed manually by sudo apt install --no-install-recommends command: devscripts, equivs
  • Called manually the mk-build-deps -ir command
  • Discovered missing packages python-zmq, python-protobuf, python-pyftpdlib, python-yapps
  • Missing packages removed from control.in file
  • Installed curl from Debian repository, then followed instruction on how to install pip: curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py and installed by python get-pip.py
  • Installed by pip packages: protobuf and pyftpdlib
  • Here ./configure stopped complaining
  • MAKE errored on "no module named yapps" -> hal/user_comps/Submakefile:31 ../bin/adx1345 Error 1 -> File ../bin/comp, line 24 from yapps import runtime
  • Installed by pip packages: yapps2
  • Make finished successfully and runtests finished successfully
    ==> Package python-zmq is completely useless in Build-depends and probably should be removed.

  • Cured Dockerfile to not include the aforementioned packages (BTW, do these have to be specified in Dockerfile separately? It is in section Install Multi-Arch: foreign dependencies) and to include the pip tool and the aforementioned packages
  • Cured build script to use Podman instead of Docker as Docker has problem with unix sockets on five-ish kernels
  • Build the builder image for AMD64_11 tag
  • Cured packaging script to use Podman instead of Docker
  • Build the Machinekit-HAL packages
  • When trying to install the first machinekit-hal_0.3* .deb file, the APT errored on non-installable packages: python-glade2, python-gtkgkxt1, python-zmq, python-protobuf, python-pyftpdlib, python-pydot and python-gst-1.0 or python-gst0.10
  • Removed offending packages from Dependencies
  • Rebuild packages
  • Installed packages machinekit-hal and machinekit-hal-posix without problems
  • halrun command create instance and I can send other commands to it: Tested newinst, newthread, addf, start

I think all of these issues are connected to Python 2 End-Of-Life in January 2020. Because there are python3-* packages for Bullseye in official Debian repository. But these seem to be unusable for current Machinekit state.

@cerna
Copy link
Contributor Author

cerna commented Apr 1, 2020

@lskillen,
fantastic, thank you for the answer.

Turns out I am a moron, I read what I have written, and 🤦‍♂️. What I actually meant to ask:

Do you plan to introduce official Github Actions action? (You have Circle-CI Orb already.) Of course, I can use the Cloudsmith-CLI application in Docker, but that additional code and there is probably many projects which could use it.

In other words, I left out the most important part of the question. 🤦‍♂️

Other than that, the only real requirement is a (polite) link back to Cloudsmith.

Sure, that's given.

However, you're free to organise your pipeline into multiple repositories; e.g. you can create a repository just for the dependencies, as long as you have another repository for your primary artefacts. That would keep your outputs separate from the artefacts.

So if I created new repository Machinekit-dependencies with all dependencies for all Machinekit projects, nnd then have repository Machinekit-HAL where I will push build artifacts from this repository, and then have repository Machinekit-CNC where I will push build artifacts from Machinekit-CNC, then that will be an OK way how to do it? Nice!

@cerna
Copy link
Contributor Author

cerna commented Apr 1, 2020

If the Docker CI images are still useful, @ArcEye submitted a PR at Dovetail-Automata/mk-cross-builder#8 to support Bullseye, which I'm feeling very embarrassed about having dropped right now.

Yes, Docker CI images are useful. I am not going to fundamentally change something which is currently working and I don't have that deep knowledge about it to feel comfortable to do deep cuts. I might do some hacking. I actually picked it up in #270 and it is now part of Machinekit-HAL proper. The comment I described how I build the package use this work.

I think the non-trivial part of the APT packaging equation is still going to be building the EMC application. If help is needed with that, I'll volunteer once MK-HAL is packaged up and online. Part of my LCNC-EMC port was to work out many of the fundamental build system issues left from the HAL/CNC repo split.

I haven't looked into it yet. I somehow hoped that there will be CMAKE based build flow first, but that's probably not going to be the case.

But I will get the Machinekit-HAL packaging up and running with publishing to @cloudsmith-io first and foremost. Taking the -CNC part into consideration will only slow everything down to nothing-gets-finished level.

@lskillen
Copy link

lskillen commented Apr 1, 2020

@cerna

Do you plan to introduce official Github Actions action? (You have Circle-CI Orb already.) Of course, I can use the Cloudsmith-CLI application in Docker, but that additional code and there is probably many projects which could use it.

We have one that's been forked from a user of ours (who's OK with us taking ownership):
https://github.com/cloudsmith-io/action

It's incredibly spartan at the moment, but we'll almost certainly be tidying this up and publishing it to the GitHub marketplace as well. PRs welcome!

In other words, I left out the most important part of the question. 🤦‍♂

I did wonder!

So if I created new repository Machinekit-dependencies with all dependencies for all Machinekit projects, nnd then have repository Machinekit-HAL where I will push build artifacts from this repository, and then have repository Machinekit-CNC where I will push build artifacts from Machinekit-CNC, then that will be an OK way how to do it? Nice!

Yes, yes and yes. :-)

@cerna
Copy link
Contributor Author

cerna commented Apr 2, 2020

I have looked at the Debian packages produced from the original Jenkins builder and discovered that these were signed by the dpkg-sig tool with D030445104BADB8A5FC9544FF81BD2B7499BE968 sub-key from Machinekit Signer.

mk@mk:~/Downloads/mktemp$ dpkg-sig --list machinekit-hal-posix-dbgsym_0.2.1561737052.git8ad6145-1~stretch_armhf.deb
Processing machinekit-hal-posix-dbgsym_0.2.1561737052.git8ad6145-1~stretch_armhf.deb...
builder
mk@mk:~/Downloads/mktemp$ dpkg-sig --verify machinekit-hal-posix-dbgsym_0.2.1561737052.git8ad6145-1~stretch_armhf.deb
Processing machinekit-hal-posix-dbgsym_0.2.1561737052.git8ad6145-1~stretch_armhf.deb...
GOODSIG _gpgbuilder D030445104BADB8A5FC9544FF81BD2B7499BE968 1561738832

And it got me thinking that this is probably a very good idea - to help differentiate that the package was built from official Machinekit/Machinekit-HAL code-base and not from some fork. (Or the opposite, that the package is from specific fork.)

But does anybody (@luminize, @zultron, @cdsteinkuehler) have the original Machinekit Signer primary key with which a new sub key could be created for this job?

@cerna
Copy link
Contributor Author

cerna commented Apr 16, 2020

I have been playing with multistrap for #273. There is a libck-dev with dependencies for Debian Buster now. However, it is not in the main distro repository, but in the buster-backports repository.

I don't think that should be a problem, but it is. As I cannot get multistrap to satisfy dependencies for machinekit-hal-build-deps from primarily the main repository and only packages which are not included - i.e. libck-dev and libck0 - to be taken from buster-backports. Multistrap just installs the newest packages available. And that causes problem because standard apt installs with priority resolution with first trying the normal repository (which I think is the right course of action).

This fails on protobuf-compiler in the pipeline. (Headers are created with different version.)

If anybody knows how to solve this, I am all ears.

@zultron
Copy link
Contributor

zultron commented Apr 24, 2020

I started taking a look at the Bullseye deps. I suspect it'll take a full python3 upgrade to support Bullseye, and that will probably take some real rework. Some issues I already find:

  • No python3-imaging-tk, python3-glade2, python3-gtkglext1 packages
  • No python3 yapps for Jessie (could be solved by backporting the package)

I started a branch, but only got as far as fixing the Docker image build, protobuf python bindings and comp generator before stopping. (This certainly deserves a new issue to further discuss python3 porting.)

@cerna
Copy link
Contributor Author

cerna commented Apr 24, 2020

(...)and that will probably take some real rework(...)

Well, Python2 was EOL for few months and the situation will be worsening with packages going missing left and right, so it is probably high time to make the switch. But I just hope (given that the discussion about Python 3 is pretty old: #114 and machinekit/machinekit#563) it will not turn into rabbit hole and take both Bullseye and Python 3 projects with it to never land. (As I will not be very useful for this endeavour.)

No python3-imaging-tk, python3-glade2, python3-gtkglext1 packages

Looking at python-imaging-tk, it was meant as a transitional package. I guess the transition period ended, huh. To be frank, I am not terribly happy about Machinekit-HAL requiring UI packages, I always thought about it as a headless suite which should not even have X11/Wayland support.

No python3 yapps for Jessie (could be solved by backporting the package)

Jessie will be slashing LTS in two months. I am not sure it is still worth the effort (given that this will take few weeks) to have it supported for 14 days before I turn the builder off. I am just saying. Jessie had a good run, but it's time to let it go.

I started a branch, but only got as far as fixing the Docker image build, protobuf python bindings and comp generator before stopping.

LinuxCNC's @rene-dev recently started Python3 work (at least he said so on IRC #linuxcnc-devel). Maybe it's worth to have a looksie into his work and port parts which are still the same in both LinuxCNC and Machinekit-HAL?

@zultron
Copy link
Contributor

zultron commented Apr 24, 2020

(...)and that will probably take some real rework(...)

Well, Python2 was EOL for few months and the situation will be worsening with packages going missing left and right, so it is probably high time to make the switch. But I just hope (given that the discussion about Python 3 is pretty old: #114 and machinekit/machinekit#563) it will not turn into rabbit hole and take both Bullseye and Python 3 projects with it to never land. (As I will not be very useful for this endeavour.)

Bullseye is going to be a real endeavor to bring up because of the python3 issue. As I said offline, I'd like to coordinate with the LCNC folks over this, since any changes to support python3 on the HAL side need to be mirrored on the EMC side, and going forward, building LCNC EMC against MK HAL is still the most sustainable plan for making packages available for MK HAL and its top application.

Looking at python-imaging-tk, it was meant as a transitional package. I guess the transition period ended, huh. To be frank, I am not terribly happy about Machinekit-HAL requiring UI packages, I always thought about it as a headless suite which should not even have X11/Wayland support.

I'm definitely in favor of factoring out anything TCL/TK, but I do think tools like halscope belong in HAL and are invaluable. Where do you draw the line, or what would you like to see in an ideal world?

Jessie will be slashing LTS in two months. I am not sure it is still worth the effort (given that this will take few weeks) to have it supported for 14 days before I turn the builder off. I am just saying. Jessie had a good run, but it's time to let it go.

No disagreement from here. I'd love to jettison some of the ugly stuff we have in the packaging and CI configuration needed to support Jessie. Last I heard, though, @dkhughes has a lot of BBBs out in the field still running Jessie, which is why instead of ditching it back then, we replaced the unmaintained Emdebian tool chain with Linaro.

LinuxCNC's @rene-dev recently started Python3 work (at least he said so on IRC #linuxcnc-devel). Maybe it's worth to have a looksie into his work and port parts which are still the same in both LinuxCNC and Machinekit-HAL?

LinuxCNC/linuxcnc#403 pretty much lays out what's already been done. @gibsonc appears to have done the heavy lifting porting the C-language bindings I started bumping up against a few hours into my naive attempt referenced above (like halmodule.cc). That would be pretty easy to pull over.

Still, I'm most interested in getting packages for released distros online first. If you'd like to point me to where the project is with that and how I can help, I'll spend a few days hammering on it.

@cerna
Copy link
Contributor Author

cerna commented Apr 24, 2020

I'm definitely in favour of factoring out anything TCL/TK, but I do think tools like halscope belong in HAL and are invaluable. Where do you draw the line, or what would you like to see in an ideal world?

And I am up there with you. However, I think it should be in separate repository/separate packages and solved as the original idea dictates. (You will find, many of my ideas are pretty in line with the original Haberler's ones.) Ring buffer from real-time side to the transshipment point, where the ring buffer frame will be sent on ZeroMQ socket to display application. That way it can run on the same machine as Machinekit-HAL or on notebook next to the device (like service guys are wont to do).

I have been thinking about doing something like it based on WPF/Noesis GUI (first for HAL meter) but it is on low burner so far for me. (More important things need to be done.)

No disagreement from here. I'd love to jettison some of the ugly stuff we have in the packaging and CI configuration needed to support Jessie. Last I heard, though, @dkhughes has a lot of BBBs out in the field still running Jessie, which is why instead of ditching it back then, we replaced the unmaintained Emdebian tool chain with Linaro.

That was a year ago, no? I am hoping since then @dkhughes twisted his customer's hands and is now mostly running on distribution which will be supported little longer. Otherwise, they will not be able to upgrade, of course if he won't support it on the side (or just don't upgrade Machinekit installation, it will be the same thing [pretty much] without security upgrades).

Still, I'm most interested in getting packages for released distros online first. If you'd like to point me to where the project is with that and how I can help, I'll spend a few days hammering on it.

I think it is mostly done. I just need to decide which packages to upload to Cloudsmith. After @luminize merges the #274 which implements the Docker image caching and auto-build, I will just redo some less important parts. Like transform the bash build commands into other scripting language to allow for hierarchical composing - so the same code will be reusable in more streamlined way.

What I would like to see is the #246, #250 and #200 done - even if the Machinekit-CNC will lag behind. I know @kinsamanka wanted to first wait on it, but it dragged on too long. It shouldn't be such problem with Machinekit-CNC package building and it has to be done. Better for it to hurt little bit from start than wait another year for it.

So if you could take look on it, I would be glad. (Not only it would mean better IDE support for Machinekit-HAL, but it would be great for other purposes too [I am in mostly for the IDE support 😃]).

@cerna
Copy link
Contributor Author

cerna commented Jun 26, 2020

OK, I have reimplemented the build functionality in basthon 3: commits and surprisingly it seems all to work (package building, image building and Machinekit-HAL testing).

Hopefully it will be cleaner for potential developers and I haven't just spend time reinventing a wheel. I just need to patch the Github Actions and then open pull request.

(Even it is definitely not perfect, it is good enough and while trying to have nice linear history in git, I completely trashed it while rebasing multiple times. So I am little tired of the endless history rewriting. I can change what needs to change later.)


And it looks like Github changed its GrapgQL Packages API, so the current Github Actions workflow will fail. (It was preview and now it should be integrated into the proper specification.) Not sure how long it hasn't been working, but nobody complained, so hopefully not that long.

@cerna
Copy link
Contributor Author

cerna commented Jul 2, 2020

Pull request #288 removed the temp/temp tagging nonsense, reworked the Docker image building script (so far only local native build, there is an issue with SSL certificates and curl somewhere and I have so far no idea where - it could be the Docker buildx, Debian, QEMU or something else) and changed how the packages are being build to standard Debian Multi-Arch. I am hoping that there will be no problems (of course), but testing on real hardware is testing on real hardware (@the-snowwhite, if I could bother you).

Now onto implementing the Travis CI arm64 build and Drone CI amd64, armhf and arm64 ones.

And mainly actually building and packaging EMCApplication.


BTW, good to know that /home/machinekit/build/machinekit-hal/tests/threads.0 fails non-deterministically even on arm64 runner. @kinsamanka in his #250 has some changes to it. Will have to look if it is not a solution to this problem.

@the-snowwhite
Copy link
Contributor

the-snowwhite commented Jul 2, 2020

@cerna Sure, albeit can you a bit more low level verbose state what you need to have tested and how ?

@cerna
Copy link
Contributor Author

cerna commented Jul 2, 2020

@the-snowwhite,
right, sorry...

Well, the code of the application itself was not changed, all that was changed was few minor build recipes and the way how packages are build. When running RIP runtests, the tests are still green. So this should be OK.

What I need to know - in a nutshell - is if you can see some degradation when running the newly built packages in your standard usage cases in comparison to the old ones. I have looked at the binaries, and they all have the correct ELF header, but I could miss some. Also, there was a change in how the Python binaries are named.

So, what I would appreciate is:

  • To install the packages
  • Try running whatever you normally run and look for any changes
  • Try Python HAL access modules
  • Try running runtests from package install (machinekit-hal-dev now ships a runtests binary, so one can install the package, clone machinekit-hal repository and invoke runtests machinekit-hal/tests) - some will fail because of how the runtests were originally implemented

@the-snowwhite
Copy link
Contributor

the-snowwhite commented Jul 7, 2020

@cerna sorry for my late reply, for some reason I got no notification of your response.
I'm not sure if I can test my normal use cases with just a standalone machinekit-hal package. Currently I'm running a cnc router and 3d-printer via machinekit client and qtquickvcp with python style hal configs.
These setups rely on files that never (or not just yet) made it into machinekit hal.
like this:
https://github.com/machinekit/machinekit-cnc/tree/master/lib/python/fdm
... !
I'm not sure if there are other (machinekit-cnc) dependencies, the fdm folder can manually be copied to the hal config folder.
my python files have these headers:

import sys
import os
import subprocess
import importlib
from machinekit import launcher
from machinekit import config
from time import *

import os
from machinekit import rtapi as rt
from machinekit import hal
from machinekit import config as c
from fdm.config import velocity_extrusion as ve
from fdm.config import base
from fdm.config import storage
from fdm_local.config import motion
import cramps as hardware

It would be nice to be able to run these setups without machinikit-cnc or l-cnc

@the-snowwhite
Copy link
Contributor

@cerna
I attempted running my open builds ox router (mklauncher) config from a fresh debian stretch sd with only the latest machinekit-hal package installed:
The result in Machinekit client was:

starting configserver... done
starting machinekit... /bin/sh: 1: machinekit: not found

Culprit is this run file requires machinekit (executable):
https://github.com/the-snowwhite/Hm2-soc_FDM/blob/master/Cramps/PY/OX/run.py#L34

@cerna
Copy link
Contributor Author

cerna commented Jul 8, 2020

@the-snowwhite,
thank you for the testings!

Yes, the cut between Machinekit-HAL and Machinekit-CNC and now what will become the EMCApplication is not that clean (pretty bloody actually) and there is definitely work to be done to clean it up. I personally consider the split into smaller repositories as one of the in the set of best decisions (too bad it wasn't done sooner, given that the talk about it goes to the beginning) and wouldn't want for Machinekit-HAL to start growing into Machinekit [original] like repository. That being said, I think that the CNC specific Python modules should go into own git repository with separate package distribution route (while creating clean tree dependency structure between all parts). Basically push out all CNC stuff dependent on other specific CNC parts from Machinekit-HAL into own logical homes.

I quite like what @kinsamanka started in #250 in https://github.com/machinekit/machinekit-hal/blob/df5e884d31f8e2668ad80c1c2be66028a64cc3b4/debian/control.in - putting things into own logical packages for distribution. (I have been getting my CMAKE knowledge into hot, useful state and so far I am liking the modern approach a much better than I last remembered, so transforming the Machinekit-HAL into CMAKE is next logical step on the line CI/CD-Docker builder-CMAKE buildsystem-Package split. [Before any real development can happen, I guess.])

But until that happens, it will be kinda unusable for these kinds of tasks.

@cerna
Copy link
Contributor Author

cerna commented Aug 15, 2020

@zultron in #293 was bitten by problem stemming from Docker image caching in Github Actins CI - it can happen that Docker image built for one branch can be used for test build workflow of another branch. And because both branches can have fundamental differences, the run fails even though it should not. I realized this problem at the time I was implementing it but given that for my work style it would not cause a problem, I decided to postpone solving it to later date.

So, now I had to think about it, had an aha moment and came up with following design flow (development of pull request in progress):

  • If workflow is currently running from previous push, stop it
  • Add branch label to Docker image
  • Get all possible combinations of targets from JSON
  • Query all Docker images for possible combinations and get the difets sha256, branch label, and commit sha label from the images
  • Test if all images are present (if not, force rebuild), if all are of the current branch (if not, force rebuild) and if all are of the same git commit sha (if not, force rebuild)
  • Test if between the git commit sha from labels and HEAD was some change on watched files (if so, force rebuild)
  • Store the information in output JSON
  • In each parallel (matrix) runner, after docker pull check the digest sha256 (if these are not the same, force a failure of the job)

Hopefully, this workflow will be a lot more immune to cross-branch issues and also solve the issue with force pushing.

BTW, the renaming in #295 left out the Docker image upload jobs and Debian packages upload job, so that should also be shortened to be in-line with the change.

@zultron
Copy link
Contributor

zultron commented Aug 17, 2020

I'm looking at all the amazing CI infrastructure @cerna has put together here to help me get off the ground more quickly with #297, building the EtherLab driver for the same OS and architecture variants that the CI system builds for Machinekit.

After a whole lot of copy'n'paste, I'm starting to wish that some of that work was pulled out of the Machinekit-HAL repo and made independent, since so much of it is reusable, such as the Docker base images, Python container build scripts, JSON distro settings and entrypoint script; that is, almost all of it.

I don't want to expand the scope of this issue, so maybe this goes in a new one, but this would be very welcome if the project ends up with many separate repos all building packages in Docker for the same OS+arch matrix.

@zultron
Copy link
Contributor

zultron commented Aug 17, 2020

[...] it can happen that Docker image built for one branch can be used for test build workflow of another branch. And because both branches can have fundamental differences, the run fails even though it should not. [...]

So, now I had to think about it, had an aha moment and came up with following design flow (development of pull request in progress):
[...]

I'm sure you have this handled already, but just in case, for another project, I needed to do something similar.

We wanted to know if a checked out revision matched a Docker image, and pull a new one if not. The scheme I devised was to find a way to generate a reproducible hash of the files used to build the Docker image. The hash would only change when the image input files would change, and never otherwise. This hash was then built into the image tag, although it could have been put into an image label as well, say -l IMAGE_HASH=deadbeef.

So for this application, you'd do something similar: compute the hash from the repo files, query the Docker registry for an image with a matching IMAGE_HASH label, and then either pull the existing image or else build one if none exists.

I can produce the hash generation commands if needed. They're not rocket science, but there are a few gotchas we had to address before they were 100% reliable.

@zultron
Copy link
Contributor

zultron commented Aug 17, 2020

After a whole lot of copy'n'paste, I'm starting to wish that some of that work was pulled out of the Machinekit-HAL repo and made independent, since so much of it is reusable, such as the Docker base images, Python container build scripts, JSON distro settings and entrypoint script; that is, almost all of it.

Here's an example of what they do in a bunch of ROS-Industrial repos. Projects that use it simply check out the repo into their source tree in CI and run the scripts. This is pretty versatile, and works with a bunch of CI systems, a nice touch.

@zultron
Copy link
Contributor

zultron commented Aug 19, 2020

After a whole lot of copy'n'paste, I'm starting to wish that some of that work was pulled out of the Machinekit-HAL repo and made independent, since so much of it is reusable, such as the Docker base images, Python container build scripts, JSON distro settings and entrypoint script; that is, almost all of it.

I've spent quite some hours now with the GH Actions CI config while working on #297, and while not done yet, at least the basic Docker image and package build flows work. One of the things I've done is try to remove bits specific to the MK-HAL repository and make them generic and configurable, in hopes of doing something like the above, separating out a shared CI config.

As I know very well from personal experience, CI configurations for the project are necessarily very complex. @cerna has done a fantastic job building up this new system, as well as vastly simplifying the Docker builders we used to use (which were nasty and hairy, and which I wrote!). Starting from the MK-HAL configuration for #297 has saved me unknown dozens of hours, since I could copy and paste 90% and make it work with only minor changes. I'm really pleased with it, so please keep that in mind even as I propose improvements below!

As it is now, there is a LOT of logic built into the workflow file in the form of embedded shell scripts. It's likely my own deficiency that these turn out to be quite delicate for me, and going through repeated iterations of pushing changes to GH, waiting several for the Actions run, and going back to fix problems has been frustrating and time-consuming.

If the CI scripts were able to run stand-alone in a local dev environment, these iterations could be drastically shortened by being able to run individual steps independently, without having to queue up the entire workflow in GH Actions. The basic workflow could stay the same, with the GH Actions workflow keeping the same general structure and maintaining the same use of output parameters for carrying configuration data between jobs, encrypted secrets for access credentials, Docker registries for caching images, etc. There are already Python scripts used to build the container and packages, so it makes sense to convert workflow file shell script logic into Python; then, the differences between running the workflow in a local dev environment vs. the GH Actions environment could be encapsulated using object-oriented programming. In the same way, the workflow could be adapted to other CI systems, should the need arise, and of course the workflow can be shared between MK-HAL, the EtherLab Master and Linuxcnc-EtherCAT HAL driver repositories (and potentially the MK-EMC, though it already has a CI configuration), and improvements will benefit all repos.

Does the problem description make sense, and does the proposal sound reasonable?

@cerna
Copy link
Contributor Author

cerna commented Aug 19, 2020

The truth to be told is that I wasn't thinking about using this outside Machinekit-HAL. Of course, I am not saying that it is not possible. It is possible. But some analysis of common requirements and processes across all repositories or projects where it could be used will be needed.

Here's an example of what they do in a bunch of ROS-Industrial repos. Projects that use it simply check out the repo into their source tree in CI and run the scripts. This is pretty versatile, and works with a bunch of CI systems, a nice touch.

This looks like a warehouse for commonly used scripts. Well, one can do something similar to this in Github Actions with own actions. These can be written in Typescript, or can be a Docker container (and then use practically any language possible) or newly can be a script. The best part - and really the only part which make it specific - is that the Input/Output is already solved. Drone has something similar with its modules, but not so versatile, I would say. I don't know about Travis, I don't think so as the whole ecosystem is not pluggable and modular (I think that they will try to introduce similar concepts to stay in the game and relevant, but it surely won't be overnight). And - at least in documentation portions I have read so far - talk about simple bash scripts. (But at least they introduced the workspaces - something like immutable shared drive that following jobs can use.)

The Python scripts are nothing to write about, but they sure could be abstracted to some classes and then the repository would have only some basic settings Python object (or configuration JSON) which would specify the labels or arguments send to the Docker builder. (I am also afraid if this would not start to reimplement some already existing project - isn't there something readily available and usable already?)

Problem also is that the Drone and Travis CI are both somewhat working but actually in pretty terrible conditions. Turns out the GitHub Actions are pretty advanced (even through I originally thought otherwise). And to enable the same functionality on both Drone and Travis, one has to use local git hooks managers with (probably Dockerized) running environment - because only Github Actions (and Azure, as far as I know) allow to dynamically create and alter the recipes at runtime (up to a point, obviously). So I put is at back-burner (because for now it is good enough and there is the #200 and so).

Nice thing about that repository - That thing has more pull requests than the whole Machinekit-HAL. If Machinekit implemented something similar, should that presume that the Machinekit-HAL would be integral for all, in other words that it would be building and testing targets dependent on the Machinekit-HAL or do you want more universal system/approach which would not be dependent on Machinekit in any way?

Because this looks like it is targeted at modules.

@cerna
Copy link
Contributor Author

cerna commented Aug 22, 2020

Travis CI integration in #299 is behaving oddly - for example job 108.5 failed, the log says that it failed, but the whole job is green, i.e. it passed.

Basically first four jobs, which are the build RIP and run runtests ones, failed (as they should), but the Debian packages building ones all passed - even though they should have failed too.

This means that some part of the bash script is eating up the error code and the Travis runner gets 0 in cases where it should have got something else.

@zultron
Copy link
Contributor

zultron commented Aug 22, 2020

Travis CI integration in #299 is behaving oddly - for example job 108.5 failed, the log says that it failed, but the whole job is green, i.e. it passed.

That's because the script needs to either set -e or else replace the ; characters with &&. Similar problem as in #293.

https://github.com/rene-dev/machinekit-hal/blob/python3/.travis.yml#L144-L163

@cerna
Copy link
Contributor Author

cerna commented Aug 22, 2020

That's because the script needs to either set -e or else replace the ; characters with &&. Similar problem as in #293.

Yup, I did it at the same (similar) time. I am not surprised I fucked both the same way. I will create a PR later today.

cerna added a commit to cerna/machinekit-hal that referenced this issue Aug 22, 2020
Due to problem described in machinekit#268 and machinekit#292, bash commands run in CI scripts can fail with non-zero exit code, but the following command will hide this from the CI runner by itself exiting with zero.

Two basic ways of solving this issue exists: add `set -e`at the top of the script block or pairing individual commands with && operator. Given that the pevious work (machinekit#292) uses the operator solution, this one uses it too.
cerna added a commit to cerna/machinekit-hal that referenced this issue Aug 23, 2020
Due to problem described in machinekit#268 and machinekit#292, bash commands run in CI scripts can fail with non-zero exit code, but the following command will hide this from the CI runner by itself exiting with zero.

Two basic ways of solving this issue exists: add `set -e`at the top of the script block or pairing individual commands with && operator. Given that the pevious work (machinekit#292) uses the operator solution, this one uses it too.
@cerna
Copy link
Contributor Author

cerna commented Aug 24, 2020

@lskillen,
I am having trouble with the automatic installation script on Debian Bullseye in Docker container. Have you encountered similar problem?

The simplest way how to reproduce this (most important information is at the end):

mars@mars:~/Downloads$ docker run -it --rm debian:bullseye
root@9e2946ef8be8:/# apt update
Get:1 http://deb.debian.org/debian bullseye InRelease [116 kB]
Get:2 http://deb.debian.org/debian bullseye/main amd64 Packages [7675 kB]
Fetched 7791 kB in 4s (1746 kB/s)   
Reading package lists... Done
Building dependency tree       
Reading state information... Done
22 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@9e2946ef8be8:/# apt install curl
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  ca-certificates krb5-locales libbrotli1 libcurl4 libgssapi-krb5-2 libk5crypto3 libkeyutils1 libkrb5-3 libkrb5support0 libldap-2.4-2 libldap-common libnghttp2-14 libpsl5 librtmp1 libsasl2-2
  libsasl2-modules libsasl2-modules-db libssh2-1 libssl1.1 openssl publicsuffix
Suggested packages:
  krb5-doc krb5-user libsasl2-modules-gssapi-mit | libsasl2-modules-gssapi-heimdal libsasl2-modules-ldap libsasl2-modules-otp libsasl2-modules-sql
The following NEW packages will be installed:
  ca-certificates curl krb5-locales libbrotli1 libcurl4 libgssapi-krb5-2 libk5crypto3 libkeyutils1 libkrb5-3 libkrb5support0 libldap-2.4-2 libldap-common libnghttp2-14 libpsl5 librtmp1
  libsasl2-2 libsasl2-modules libsasl2-modules-db libssh2-1 libssl1.1 openssl publicsuffix
0 upgraded, 22 newly installed, 0 to remove and 22 not upgraded.
Need to get 5243 kB of archives.
After this operation, 12.7 MB of additional disk space will be used.
Do you want to continue? [Y/n] 
Get:1 http://deb.debian.org/debian bullseye/main amd64 krb5-locales all 1.17-10 [94.6 kB]
Get:2 http://deb.debian.org/debian bullseye/main amd64 libssl1.1 amd64 1.1.1g-1 [1543 kB]
Get:3 http://deb.debian.org/debian bullseye/main amd64 openssl amd64 1.1.1g-1 [846 kB]
Get:4 http://deb.debian.org/debian bullseye/main amd64 ca-certificates all 20200601 [158 kB]
Get:5 http://deb.debian.org/debian bullseye/main amd64 libbrotli1 amd64 1.0.7-7 [267 kB]
Get:6 http://deb.debian.org/debian bullseye/main amd64 libkrb5support0 amd64 1.17-10 [64.6 kB]
Get:7 http://deb.debian.org/debian bullseye/main amd64 libk5crypto3 amd64 1.17-10 [115 kB]
Get:8 http://deb.debian.org/debian bullseye/main amd64 libkeyutils1 amd64 1.6.1-2 [15.4 kB]
Get:9 http://deb.debian.org/debian bullseye/main amd64 libkrb5-3 amd64 1.17-10 [366 kB]
Get:10 http://deb.debian.org/debian bullseye/main amd64 libgssapi-krb5-2 amd64 1.17-10 [156 kB]
Get:11 http://deb.debian.org/debian bullseye/main amd64 libsasl2-modules-db amd64 2.1.27+dfsg-2 [69.0 kB]
Get:12 http://deb.debian.org/debian bullseye/main amd64 libsasl2-2 amd64 2.1.27+dfsg-2 [106 kB]
Get:13 http://deb.debian.org/debian bullseye/main amd64 libldap-common all 2.4.50+dfsg-1 [92.9 kB]
Get:14 http://deb.debian.org/debian bullseye/main amd64 libldap-2.4-2 amd64 2.4.50+dfsg-1+b1 [228 kB]
Get:15 http://deb.debian.org/debian bullseye/main amd64 libnghttp2-14 amd64 1.41.0-3 [74.0 kB]
Get:16 http://deb.debian.org/debian bullseye/main amd64 libpsl5 amd64 0.21.0-1.1 [55.3 kB]
Get:17 http://deb.debian.org/debian bullseye/main amd64 librtmp1 amd64 2.4+20151223.gitfa8646d.1-2+b2 [60.8 kB]
Get:18 http://deb.debian.org/debian bullseye/main amd64 libssh2-1 amd64 1.8.0-2.1 [140 kB]
Get:19 http://deb.debian.org/debian bullseye/main amd64 libcurl4 amd64 7.68.0-1+b1 [322 kB]
Get:20 http://deb.debian.org/debian bullseye/main amd64 curl amd64 7.68.0-1+b1 [249 kB]
Get:21 http://deb.debian.org/debian bullseye/main amd64 libsasl2-modules amd64 2.1.27+dfsg-2 [104 kB]
Get:22 http://deb.debian.org/debian bullseye/main amd64 publicsuffix all 20200729.1725-1 [118 kB]
Fetched 5243 kB in 3s (1829 kB/s)     
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package krb5-locales.
(Reading database ... 6760 files and directories currently installed.)
Preparing to unpack .../00-krb5-locales_1.17-10_all.deb ...
Unpacking krb5-locales (1.17-10) ...
Selecting previously unselected package libssl1.1:amd64.
Preparing to unpack .../01-libssl1.1_1.1.1g-1_amd64.deb ...
Unpacking libssl1.1:amd64 (1.1.1g-1) ...
Selecting previously unselected package openssl.
Preparing to unpack .../02-openssl_1.1.1g-1_amd64.deb ...
Unpacking openssl (1.1.1g-1) ...
Selecting previously unselected package ca-certificates.
Preparing to unpack .../03-ca-certificates_20200601_all.deb ...
Unpacking ca-certificates (20200601) ...
Selecting previously unselected package libbrotli1:amd64.
Preparing to unpack .../04-libbrotli1_1.0.7-7_amd64.deb ...
Unpacking libbrotli1:amd64 (1.0.7-7) ...
Selecting previously unselected package libkrb5support0:amd64.
Preparing to unpack .../05-libkrb5support0_1.17-10_amd64.deb ...
Unpacking libkrb5support0:amd64 (1.17-10) ...
Selecting previously unselected package libk5crypto3:amd64.
Preparing to unpack .../06-libk5crypto3_1.17-10_amd64.deb ...
Unpacking libk5crypto3:amd64 (1.17-10) ...
Selecting previously unselected package libkeyutils1:amd64.
Preparing to unpack .../07-libkeyutils1_1.6.1-2_amd64.deb ...
Unpacking libkeyutils1:amd64 (1.6.1-2) ...
Selecting previously unselected package libkrb5-3:amd64.
Preparing to unpack .../08-libkrb5-3_1.17-10_amd64.deb ...
Unpacking libkrb5-3:amd64 (1.17-10) ...
Selecting previously unselected package libgssapi-krb5-2:amd64.
Preparing to unpack .../09-libgssapi-krb5-2_1.17-10_amd64.deb ...
Unpacking libgssapi-krb5-2:amd64 (1.17-10) ...
Selecting previously unselected package libsasl2-modules-db:amd64.
Preparing to unpack .../10-libsasl2-modules-db_2.1.27+dfsg-2_amd64.deb ...
Unpacking libsasl2-modules-db:amd64 (2.1.27+dfsg-2) ...
Selecting previously unselected package libsasl2-2:amd64.
Preparing to unpack .../11-libsasl2-2_2.1.27+dfsg-2_amd64.deb ...
Unpacking libsasl2-2:amd64 (2.1.27+dfsg-2) ...
Selecting previously unselected package libldap-common.
Preparing to unpack .../12-libldap-common_2.4.50+dfsg-1_all.deb ...
Unpacking libldap-common (2.4.50+dfsg-1) ...
Selecting previously unselected package libldap-2.4-2:amd64.
Preparing to unpack .../13-libldap-2.4-2_2.4.50+dfsg-1+b1_amd64.deb ...
Unpacking libldap-2.4-2:amd64 (2.4.50+dfsg-1+b1) ...
Selecting previously unselected package libnghttp2-14:amd64.
Preparing to unpack .../14-libnghttp2-14_1.41.0-3_amd64.deb ...
Unpacking libnghttp2-14:amd64 (1.41.0-3) ...
Selecting previously unselected package libpsl5:amd64.
Preparing to unpack .../15-libpsl5_0.21.0-1.1_amd64.deb ...
Unpacking libpsl5:amd64 (0.21.0-1.1) ...
Selecting previously unselected package librtmp1:amd64.
Preparing to unpack .../16-librtmp1_2.4+20151223.gitfa8646d.1-2+b2_amd64.deb ...
Unpacking librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2+b2) ...
Selecting previously unselected package libssh2-1:amd64.
Preparing to unpack .../17-libssh2-1_1.8.0-2.1_amd64.deb ...
Unpacking libssh2-1:amd64 (1.8.0-2.1) ...
Selecting previously unselected package libcurl4:amd64.
Preparing to unpack .../18-libcurl4_7.68.0-1+b1_amd64.deb ...
Unpacking libcurl4:amd64 (7.68.0-1+b1) ...
Selecting previously unselected package curl.
Preparing to unpack .../19-curl_7.68.0-1+b1_amd64.deb ...
Unpacking curl (7.68.0-1+b1) ...
Selecting previously unselected package libsasl2-modules:amd64.
Preparing to unpack .../20-libsasl2-modules_2.1.27+dfsg-2_amd64.deb ...
Unpacking libsasl2-modules:amd64 (2.1.27+dfsg-2) ...
Selecting previously unselected package publicsuffix.
Preparing to unpack .../21-publicsuffix_20200729.1725-1_all.deb ...
Unpacking publicsuffix (20200729.1725-1) ...
Setting up libkeyutils1:amd64 (1.6.1-2) ...
Setting up libpsl5:amd64 (0.21.0-1.1) ...
Setting up libssl1.1:amd64 (1.1.1g-1) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.3 /usr/local/share/perl/5.30.3 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7.)
debconf: falling back to frontend: Teletype
Setting up libbrotli1:amd64 (1.0.7-7) ...
Setting up libsasl2-modules:amd64 (2.1.27+dfsg-2) ...
Setting up libnghttp2-14:amd64 (1.41.0-3) ...
Setting up krb5-locales (1.17-10) ...
Setting up libldap-common (2.4.50+dfsg-1) ...
Setting up libkrb5support0:amd64 (1.17-10) ...
Setting up libsasl2-modules-db:amd64 (2.1.27+dfsg-2) ...
Setting up librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2+b2) ...
Setting up libk5crypto3:amd64 (1.17-10) ...
Setting up libsasl2-2:amd64 (2.1.27+dfsg-2) ...
Setting up libssh2-1:amd64 (1.8.0-2.1) ...
Setting up libkrb5-3:amd64 (1.17-10) ...
Setting up openssl (1.1.1g-1) ...
Setting up publicsuffix (20200729.1725-1) ...
Setting up libldap-2.4-2:amd64 (2.4.50+dfsg-1+b1) ...
Setting up ca-certificates (20200601) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.3 /usr/local/share/perl/5.30.3 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7.)
debconf: falling back to frontend: Teletype
Updating certificates in /etc/ssl/certs...
126 added, 0 removed; done.
Setting up libgssapi-krb5-2:amd64 (1.17-10) ...
Setting up libcurl4:amd64 (7.68.0-1+b1) ...
Setting up curl (7.68.0-1+b1) ...
Processing triggers for libc-bin (2.31-2) ...
Processing triggers for ca-certificates (20200601) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
root@9e2946ef8be8:/# curl -1sLf \
>   'https://dl.cloudsmith.io/public/machinekit/machinekit/cfg/setup/bash.deb.sh' \
>   | bash
Executing the  setup script for the 'machinekit/machinekit' repository ...

   OK: Checking for required executable 'curl' ...
   OK: Checking for required executable 'apt-get' ...
   OK: Detecting your OS distribution and release using system methods ...
 ^^^^: OS detected as: debian  ()
 FAIL: Checking for apt dependency 'apt-transport-https' ...
   OK: Updating apt repository metadata cache ...
   OK: Attempting to install 'apt-transport-https' ...
 FAIL: Checking for apt dependency 'gnupg' ...
   OK: Attempting to install 'gnupg' ...
   OK: Importing 'machinekit/machinekit' repository GPG key into apt ...
 FAIL: Checking if upstream install config is OK ...
 >>>>: Failed to fetch configuration for your OS distribution release/version.
 >>>>: It looks like we don't currently support your distribution release and
 >>>>: version. This is something that we can fix by adding it to our list of
 >>>>: supported versions (see contact us below), or you can manually override
 >>>>: the values below to an equivalent distribution that we do support:
 >>>>: Here is what *was* detected/provided for your distribution:
 >>>>:
 >>>>:   distro:   'debian'
 >>>>:   version:  ''
 >>>>:   codename: ''
 >>>>:   arch:     'x86_64'
 >>>>:
 >>>>: You can force this script to use a particular value by specifying distro,
 >>>>: version, or codename via environment variable. E.g., to specify a distro
 >>>>: such as Ubuntu/Xenial (16.04), use the following:
 >>>>:
 >>>>:   <curl command> | distro=ubuntu version=16.04 codename=xenial sudo bash
 >>>>:
 >>>>: You can contact us at Cloudsmith ([email protected]) for further assistance.

root@9e2946ef8be8:/# 

cerna referenced this issue Sep 2, 2020
For some OS and distribution versions, the automatic Cloudsmith's script has a problem and in the end no repositories are installed. Fortunately, the script allows user to explicitly specify which repositories (for which OS) he wants to install.
@cerna
Copy link
Contributor Author

cerna commented Sep 4, 2020

Looking again at the Travis CI configuration (which is now in very precocious state and in need of rework), I started looking in the remote API, specifically starting a new build by sending some data to remote endpoint as described at Triggering builds documentation.

In practical terms, it would mean two "job" or "builds" per every git push or opening pull request. The first one (which would be specified in the .travis.yml file in root of given repository) would create the build config from debian-distro-settings.json or other well-known sources and trigger the second job through the API. That way one can get the same functionality (hopefully [famous last words]) as the current Github Actions workflow (which I take as a muster).

Travis CI supports (I don't know for how long, but they are calling it still beta version, so probably not that long) Importing Shared Build Configuration, which is something to investigate, if Machinekit organization will go with Machinekit-CI separate repository.

The structure of debian-distro-settings.json will also have to change to encompass the cross-building capability, in other words from which BUILD architecture given HOST can be build. (This is actually based on the available packages in Debian repositories and the fact that so far Machinekit's projects are gcc based, for Clang it would be different.)

Also, what must be decided is if Machinekit wants to test 32bit versions on 64bit platform (where possible without the use of QEMU). That is testing i386 on amd64 machines and armhf on most amd64 servers (not all arm64v8 processors suppots the arm32 instruction set). Because that will also need some form of representation in debian-distro-settings.json and changes in build and test scripts.

@cerna
Copy link
Contributor Author

cerna commented Dec 21, 2020

Well,
Travis CI is slowly but surely going away as a hub for Open Source continuous integration. On 2nd of November, they stopped offering unlimited machine time for OSS and instead replaced it with one time trial offer - after this allotment is gone, you are done. (Well, there is some backdoor for OSS to ask for additional minutes, but it is on per-request basis and projects are actually turned down recently.) So, this is the end.

Machinekit as of now has about half of the trial minutes left.

Too bad that the Graviton2 based VM were quite good and there is no alternative for it as of now.

I am going to limit the Travis builds to only testing of the arm64 platform to preserve the minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants