Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: "junction" nodes to repackage multiple pieces of Freight as a single piece of Freight #3193

Open
3 tasks done
aayushsrivastava opened this issue Dec 24, 2024 · 10 comments

Comments

@aayushsrivastava
Copy link

aayushsrivastava commented Dec 24, 2024

Checklist

  • I've searched the issue queue to verify this is not a duplicate feature request.
  • I've pasted the output of kargo version, if applicable.
    • kargo version: v1.1.2
  • I've pasted logs, if applicable.

Proposed Feature

I want to select multiple freight from different freight timelines when creating a promotion to a stage. Currently, a promotion can only accommodate one freight.

Motivation

I am trying to create a CD pipeline for a Team to deploy their microservices. Each of the microservices is a Kargo Warehouse 1.

Changes to multiple microservices are accumulated in a uat stage and released together to the prod stage in a regular cadence.

During the manual promotion to prod, it is cumbersome to do promotions freight-by-freight. Since we want to also review the changes before updating the live GitOps branch, we will use git-open-pr and git-wait-for-pr in the promotion steps for the prod stage. This makes it worse since Kargo takes up to 5 minutes to confirm and move from a git-wait-for-pr step. With multiple freight, it takes up to (5 x N) minutes to complete the deployment of N microservices because of the delay in git-wait-for-pr step. There would also be N PRs to review.

If multiple Freight could be promoted together in one Promotion, we would only raise one PR and the delay in git-wait-for-pr step will also only take 5 minutes instead of (5 x N) minutes.

[1] One may suggest we could use a single Freight with all images to work around this. The reason I am modeling microservice to Warehouse 1:1 is that it is easy to manually create a freight for "hotfixes". If I had instead created a single Warehouse that subscribes to all the images, a hotfix would require specifying the expected versions of every image during manual Freight Assembly which is cumbersome and mistake-prone.

Suggested Implementation

The Promotion CR could accept a list of freight instead of a single freight. That would be a breaking change for the CRD though.

There might be good reasons for only allowing a single freight update per promotion, curious to know the why if that's the case.

@aayushsrivastava aayushsrivastava changed the title Promoting multiple freight in a single Promotion Promoting multiple Freight in a single Promotion Dec 24, 2024
@aayushsrivastava
Copy link
Author

I thought more about the problem mentioned in the "Motivation" section, and how I could best solve it with the present feature set of Kargo.

I could create multiple "parallel" DAGs in one project. One DAG for each microservice. This would help avoid the "forced serialization" of microservice promotions which happens due to shared DAG.

I would still like to raise one PR, if possible. Wondering if it's safe for the same logical stages in different DAGs to push to a single shared branch (eg. a branch called promotion-request/prod) so that I can raise a single PR to be merged into the stage-specific branch. I don't see any problems since Kargo performs a git pull --rebase during the git-push promotion step and the git-open-pr step also seems to adopt existing PRs on the same branch on gitlab. The combination of the two makes me feel it is safe to do multiple promotions in parallel as long as they are modifying different paths on the git repository.

There are a few UX changes that would still be needed to make the multiple DAGs pattern suitable for general-use.

  1. The image history grid should only show the stages from the DAG that deals with the image. Currently, it shows every single stage in a Kargo project which is a problem because the space is limited for that grid. I will probably create another issue for this.
  2. The slow scroll speed makes it challenging to perform operations on "parallel" DAGs. Issue: Kargo UI always centers and has slow scroll speed #3161

@aayushsrivastava aayushsrivastava changed the title Promoting multiple Freight in a single Promotion Promoting multiple pieces of Freight in a single Promotion Jan 2, 2025
@krancour
Copy link
Member

krancour commented Jan 2, 2025

My typical advice is that if you have artifacts that must (for one reason or another) move through a pipeline as a unit, then those artifacts belong in a single piece of Freight. i.e. Should be created by the same Warehouse, and then the problem of needing to do n > 1 Promotions to effect a single logical state change goes away...

Changes to multiple microservices are accumulated in a uat stage and released together to the prod stage in a regular cadence.

The way you've worded this, I'm inferring that you actually don't care about the artifacts for various microservices progressing through the pipeline as a unit all the way up to and including UAT, and then for the "last mile" to Production, you do want to Progress all those artifacts as a single unit.

Have I got that right?

Although we do not seem to have an open issue for it, this exact use case has been on our radar for a long time, and you can see that reflected in places like stage.spec.requestedFreight where the kind of Freight you're looking for is expressed with an "origin." At present, a Warehouse is the only place from which Freight can originate, but we built this way because we anticipated more types of origins in the future -- and in particular, we've anticipated a "junction" whose purpose would be to "repackage" the contents of two or more pieces of Freight as a new, single piece of Freight so that all the artifacts referenced therein progress from that point forward as a unit.

The way I see something like this working: A junction would be as if a Warehouse and a Stage "had a baby," in that it would behave partly like one and partly like the other. Like a Stage, it could request Freight from multiple origins (Warehouses), and acceptable sources for those (i.e. upstream Stages) and you could "promote" Freight from any of those Warehouses to it. Unlike a Stage, it wouldn't deploy anything. Like a Warehouse, it would produce new Freight (in this case, by combining all its current Freight) either automatically or upon request.

As I said, this has been in the back of our heads for a long time, and your use case is the exact one for which we'd conceived this. Do you agree this would solve your problem?

This is, unfortunately, not trivial or we'd have done it already. In terms of size and complexity, this is the sort of feature that would eat up one entire development cycle all by itself.

We'll have to see how @jessesuen thinks this ought to be prioritized.

Edit: I'm tentatively changing the title to match this line of thinking, but can adjust it further if the thread turns in a different direction.

@krancour krancour changed the title Promoting multiple pieces of Freight in a single Promotion proposal: "junction" nodes to repackage multiple pieces of Freight as a single piece of Freight Jan 2, 2025
@Brightside56
Copy link
Contributor

[1] One may suggest we could use a single Freight with all images to work around this. The reason I am modeling microservice to Warehouse 1:1 is that it is easy to manually create a freight for "hotfixes". If I had instead created a single Warehouse that subscribes to all the images, a hotfix would require specifying the expected versions of every image during manual Freight Assembly which is cumbersome and mistake-prone.

I know it's not so flexible as spaceship implementation with repackaging, but #2988

@krancour
Copy link
Member

krancour commented Jan 4, 2025

I don't think loading up your Freight with a large number of artifacts is the answer. There have been a few other threads about this recently. I see a lot of people doing this and it's finally setting in that this isn't necessarily what people want, rather it's what people are getting stuck with when the patterns to solve their problems aren't clear. That's a gap we're going to be working very hard to address now.

@aayushsrivastava
Copy link
Author

Thanks for the extensive proposal @krancour!

Let me try to elaborate the use case a bit more before I talk about the possible solutions. You can also validate whether this use case is what you folks had in mind when conceptualizing the "junction" node.


Our microservices are independently versioned and they are deployed independently (as independent Helm Releases today, but we plan to let ArgoCD adopt them as independent Argo Applications soon). I agree with @krancour that it may be "more correct" to have such independently deployed units as separate pieces of Freight that can be independently promoted in separate pipelines.

I think the essence of the problem is that in our teams, we are doing Continuous Delivery but not doing Continuous Deployment yet. Just adding definitions to make it unambiguous what I mean by this:

Continous Delivery: Continuous Delivery is the practice expanding your Continuous Integration (CI) usage to automatically re-deploy a proven build to a QA or UAT environment.
Continuous Deployment: An automatic push all the way into production; Maybe every commit
(Source)

Since we are not practicing Continuous Deployment for one reason or the other, we deploy all the changes once at the end of a development cycle/sprint to production. What this typically involves is picking all the artifacts last verified in one Stage and putting them on a downstream Stage. Often this is just an unemotional automated process and the expectation from teams is that the trunk is in a releasable state.

The way you've worded this, I'm inferring that you actually don't care about the artifacts for various microservices progressing through the pipeline as a unit all the way up to and including UAT, and then for the "last mile" to Production, you do want to Progress all those artifacts as a single unit.
Have I got that right?

Not quite.

First of all, it's not just the "last mile" to Production. We need to do it twice.
First time for manually promoting all artifacts in the dev stage to the uat. Generally in a day or two, we will manually promote all the artifacts from uat to prod. I guess it may seem a bit odd to some but the reason is that we don't have a separate staging. Effectively, uat serves as a staging for us so we like to keep it close to prod most of the time for testing hotfixes. If a prod hotfix is needed while a new release is getting ready on uat, we just club it with the new release.

In essence, we are looking to minimize the friction for promoting all pieces of Freight (each Freight being one of N microservices) in a Kargo Project from one logical stage to another. It being an atomic promotion would be nice-to-have but I think the more important concern is how fast/easily can I execute this intention. We want to minimize the number of actions required on UI/CLI.

Do note that this requirement is only within a team. A team could own 5-25 of such deployables. Any cross-team dependency is covered with Contract tests and the teams deploy independently of each other.


Junction nodes

Do you agree this would solve your problem?

I think it will definitely decrease the "friction" (lesser number of PRs to review) but it will violate the sanctity of a promotion dealing with only one deployable. That is the same problem being discussed here: #3203


I think there might be a few alternative enhancements for reducing the aforementioned friction that could be easier to implement:

  1. One-click manual promotion of the last Freight promoted and verified in a Stage to all its immediately downstream Stages (will only work if the Stage contains one Freight). With this, people will not have to mess around with the Freight Timeline to execute a simple stage-to-stage promotion in one DAG. This will increase efficiency when trying to perform similar manual promotions in multiple DAGs.

@krancour
Copy link
Member

krancour commented Jan 4, 2025

Hey @aayushsrivastava, great discussion!

I identify with most of what you've said. A few things:

idk if it's important or not, but re: the overloaded meanings of "CD" ("continuous delivery" vs "continuous deployment"), I don't think there's widespread agreement on the definitions you cited. Personally, I make a slightly different distinction because it serves my purposes well. Using Amazon as an analogy, if you buy a tent, it's delivered to your doorstep. Delivery is getting something where it needs to go. You deploy the tent when you pitch it. So fwiw, looking at things from this angle has often helped me to maintain a distinction between the responsibilities of Kargo and the responsibilities of a gitops agent such as Argo CD. It's Kargo's job to get things where they need to go (delivery), which basically means wrangling your gitops repo into the correct state. The gitops agent does all the heavy lifting of actually applying that desired state (deployment).

To counteract any confusion over the two CDs or competing definitions, we've taken to calling Kargo a "continuous promotion" tool.

If I understand you correctly, you basically want to promote everything automatically and continuously up to a certain point (uat). You want promotion to prod to be manual. fwiw, this is quite common among Kargo users.

My original understanding of your problem may have been a little off. It's not that you need everything to move atomically into prod. It's just that it takes a long time to promote n different things you've "collected" in uat into prod if you have to do it one piece of Freight at a time. Is this more accurate?

So let's start with why we don't currently allow one Promotion to promote multiple pieces of Freight. It's mostly a matter of UX. When you want to promote one thing at a time, it's easy. You specify Freight and a Stage. What and where. When a Stage requests Freight from multiple origins, it's basically got slots for n different "kinds" of Freight. Now you want to promote and it become a question of, "How do you want to fill each of these n slots?" We couldn't quite figure out a good UX for that.

The junction idea came about for cases where users really do need to move artifacts that moved freely and independently through other stages into prod as an atomic unit. This is surprisingly common. In your case, if I'm understanding you more correctly now, your concern is more about the speed with which you can move everything the last mile more so than an explicit requirement that all the artifacts must journey that last mile together. If it takes you five minutes to promote one and you've got five, that's means someone's babysitting that process for 25 minutes. So, ultimately, I think the junction notion still would help you here.

If you've been using uat to "collect" all the things that need to go to prod, if you can then imagine a node "east" of uat and "west" of prod, and it accepts autopromotion from uat, it will continuously collect whatever's in uat. It can repackage them as a new kind of Freight to fill prod's one "slot" and then the limitation of promoting only one piece of Freight at a time is no longer problematic. There's nothing to babysit for 25 minutes. It's five. (Or less. You know you can speed things up with clicking refresh on the target Stage?) Even better (imho), I think it solves the UX problem that promoting multiple Freight at the same time presented. It continuously "collects" stuff and has it ready to go instead of asking you just-in-time how you want to fill every "slot" in the target Stage. And let me emphasize that the promotion from the junction to prod could still be manual.

I also love your "one click promote downstream" idea. We currently make you fiddle with selecting Freight from the timeline because we didn't want to constrain promotion to any given Stage to just whatever was currently in the previous Stage(s). (You might sometimes need to quickly roll prod back to its previous state and you wouldn't want to have to restore test, uat, etc. to that state first in order to get it there.) But there's absolutely no reason we can't have some kind of shortcut for "promote what's here now" downstream. I think the only challenge standing in the way of that is, again UX. We'd need to find a way to very clearly and visually differentiate between the two actions of "promote something to my downstreams" vs "promote what I have now to my downstreams." This seems solvable and probably ought to be its own issue, because I think this is a great idea that can stand alone from the rest of this issue. I'll write it up as its own issue on Monday unless you beat me to it.

@Brightside56
Copy link
Contributor

Brightside56 commented Jan 6, 2025

I don't think loading up your Freight with a large number of artifacts is the answer. There have been a few other threads about this recently. I see a lot of people doing this and it's finally setting in that this isn't necessarily what people want, rather it's what people are getting stuck with when the patterns to solve their problems aren't clear. That's a gap we're going to be working very hard to address now.

In a greenfield with rainbows, pink ponies and without legacy - it's probably not the answer. In the real world I may have 10 monorepos with 10 CI-pipelines from one yaml-definition. Those pipelines are rendering helm-chart manifests (yes, with an unknown mutant with 3-5-7-10 images inside) main -> environment branches, for years, without constraints, without changing the approach to dev/ops, without refactoring application code, without coming to manager or board and saying "Hey! We have a bunch of problems... Most probably we don't have to solve these problems for another 5 years, but let's do this with Kargo"

people are getting stuck with when the patterns to solve their problems aren't clear

I didn't encounter these "problems" without Kargo. I don't consider these as problems, but rather a pragmatic approach with possibility for a team to write definitions of what they want and deploy it how they want in cases where they don't need complex delivery mechanism or doesn't know in details delivery mechanism they need. It's hard for me to understand why Kargo can't do exactly the same thing as a simple pipeline, upgrading this experience

@krancour
Copy link
Member

krancour commented Jan 6, 2025

@Brightside56 I sense your frustration, and I don't know what to tell you other than that I believe there's a lot more flexibility in Kargo than you may realize. It's just a tool for implementing your own processes. I really do believe that there's a percentage of users whose processes are good on paper, but end up implemented poorly through no fault of their own. There are large gaps in our docs and we are lacking examples for how scenarios like yours (due in part to its complexity) are best handled with Kargo. Those gaps are causing some users to end up with something that may be truly painful and is actually not quite what they'd wanted. I think a lot of this is going to be solved with the significant docs overhaul coming this quarter. In the meantime, please let's try to keep the conversation positive and productive.

@Brightside56
Copy link
Contributor

Brightside56 commented Jan 6, 2025

I really do believe that there's a percentage of users whose processes are good on paper, but end up implemented poorly through no fault of their own

Delivery is something applied on top of processes and product architecture and does not exist separately. For example, I most probably don't need delivery pattern/approach which satisfy microservice architecture principles while my product doesn't. Processes or architecture may be not good, but often improvement of those processes requires to solve 99 problems of teams, product architecture or whole company.

Do these problems hurt? Will efforts to solve these problems bring real value for business in specific situation or should it be done just for the sake of microservice architecture dogma or well-organized process itself? I think this question needs to be answered to determine if put something here is what people need

I don't think it matters whose fault it is. I think pattern with many artifacts is far from being perfect, but it's widely used, because often it has value for business and therefore should be satisfied by tooling for delivery/promotion, at least as starting point. I am sure that these difficulties (working with many artifacts in freight) are significant barrier to Kargo adoption which may be easily eliminated. This looks like big low hanging fruit

IMO half of problem - impossibility to change just one or couple of images for a new freight. Another 50% are issues related to UI - I'm talking not about its beauty and convenience for many artifacts scenario, I'm about situations when the browser simply becomes unresponsive due to working with a large number of artifacts or bugs making physically difficult to find and set required version of the image

scenarios like yours (due in part to its complexity)

From my observations - there may be plenty variations of such scenarios and it may take a lot of time to address each of them separately. It would be much easier for me to come to my colleagues and suggest value - something they can start work with tomorrow and what doesn't break existing approach and at same time makes significant improvement over it... And later can be improved to something better

@krancour
Copy link
Member

krancour commented Jan 6, 2025

@Brightside56 you have a lot of valid observations, but I think this thread is getting a little off track.

The original topic was promoting multiple pieces of Freight concurrently. I explained our choice not to permit that and proposed that something else that's been on our radar for a long time addresses @aayushsrivastava's use case.

I'm not sure why we we ended up on this tangent, but it's probably worthy of its own thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants