Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackStorm and Kubernetes Use Cases #38

Open
theankushjain opened this issue Jun 8, 2020 · 5 comments
Open

StackStorm and Kubernetes Use Cases #38

theankushjain opened this issue Jun 8, 2020 · 5 comments

Comments

@theankushjain
Copy link

Patching your Kubernetes nodes is one of the solid use cases that I found. But there is a need to brainstorm and find other use-cases. I believe there are many use cases that are yet to be discovered.

@arm4b arm4b changed the title Discussion related to Use Cases in Kubernetes Cluster StackStorm vs Kubernetes Use Cases Jun 8, 2020
@arm4b
Copy link
Member

arm4b commented Jun 8, 2020

Examples based on: https://www.devoperandi.com/stackstorm-for-kubernetes-just-took-a-giant-leap-forward/ blog post:

  • Imagine being able to configure network policies through an automated StackStorm workflow based on a particular projects needs.
  • Think about how RBAC could be managed using our Kubernetes Authz Webhook through StackStorm.
  • Or how about kicking of Kubernetes Jobs to Administer some cluster level cleanup activity but handing that off to your NOC.
  • Or allowing your Operations team to patch a HorizontalPodAutoscaler through a UI.
  • We could build a metadata framework derived from the Kubernetes API annotations/labels for governance.

The possibilities are now literally endless.

@arm4b
Copy link
Member

arm4b commented Jun 8, 2020

  • K8s app Blue/Green Deployments.
    While K8s provides rolling upgrades for the Pods, the missing in the Kubernetes engine is Blue-Green deployments for the app K8s is running.
    Following the https://www.ianlewis.org/en/bluegreen-deployments-kubernetes example, StackStorm can do that kind of automation in a good and more maintainable way with the Workflows.
  • Adding ChatOps to K8s
    How about triggering a K8s deployment for the new version of your app from the chat?
  • Updating K8s cluster version.
    One of the first google results about upgrading K8s cluster version: https://medium.com/retailmenot-engineering/zero-downtime-kubernetes-cluster-upgrades-aab4cac943d2
    The HOWTO provides a series of steps and suggests setting up a new node pool and migrate K8s workfloads from old nodes to new. This all could be organized together in the StackStorm workflow and better automated instead of manual repetitive steps. As you gain the new operational knowledge the workflow could be improved with more logic and edge cases steps.

@arm4b
Copy link
Member

arm4b commented Jun 8, 2020

  • Security compliance checks & Remediation for every new K8s Deployment
    StackStorm to listen for the new K8s deployments and run a series of security checks and rules once there is a new deployment. In case of failed security check, - open a ticket with report, create an alert, page human, etc. Potential Remediation action: block a deployment/restrict network access/etc.
  • Security response Automation based on K8s cluster audit events
    StackStorm can be a consumer for a webhook request from the K8s cluster audit service https://kubernetes.io/docs/tasks/debug-application-cluster/audit/ and react based on these events.
  • Security keys Rotation for the K8s cluster/apps
    Rotation for any security items is rarely easy and frequently involves repetitive steps (ex: Service Account Key Rotation kubernetes/kubernetes#20165). StackStorm can not just automate that as external orchestrator, but also keep this important operational workflow knowledge documented and maintained as a code. This could be applied for K8s cluster administration in general or even for individual apps running in K8s.

@emptywee
Copy link

emptywee commented Jun 8, 2020

While we generally leverage Spinnaker for k8s deployments, we mainly use Stackstorm to provide Chatops Aliases for various day-to-day tasks like rolling restarts, regular restarts, maintenance mode on/off (bring down certain set of pods based on the type of maintenance), as well as kubernetes and flatcar/coreos version update. Once our clusters grew to unmanageable number of nodes (unmanageable by hand, I mean), I had to sit down and design a series of workflows to automate kubernetes version update. Since our clusters are own self-managed and self-hosted clusters, initially based on coreos, now flatcar linux, the approach is very specific (e.g. it's based on generating ignition configs; it's also using self-written microservices to generate certs and configs, etc) I am not sure if it's worth sharing them as they are. We also have nginx plus in front of the cluster and iBGP enabled calico-powered connectivity with the clusters with route reflectors running on BIRD software. All these things are utilized during cluster update to take nodes out of rotation safely (mainly for master nodes).

High level idea is pretty simple:

  1. Update master nodes one at a time, safely taking it out of rotation on the respective load balancer, decrease its BGP preference for routing, ensure safe update, execute scripts if need be, update configuration files/manifests, optionally reboot it, and put it back into rotation once it's back from reboot, verify that the cluster is stable, and move on to the next node.
  2. Update each worker by cordoning and draining it, execute any scripts needed, update configuration files/manifests, reboot if requested, and uncordon it once it's ready.

The workflows also update on-going status into a redis instance, which we visualize later using a simple python flask backend + angular frontend apps.

Here's what it looks like:

image

@arm4b arm4b changed the title StackStorm vs Kubernetes Use Cases StackStorm and Kubernetes Use Cases Jun 8, 2020
@mickmcgrath13
Copy link

All great ideas!
In particular, I like:

  • K8S based RBAC
  • utilizing K8S secrets
  • chatops for non config-based tasks (restarts, etc)
  • node patching/updating

Here are a few more i've come across:

  • K8S cleanup - sensor to clean up old pods from jobs (or any otherwise "dangling" pods)
  • Helm support - helm doesn't always work (a failed deployment might get you into a "can't deploy because there are no deployed releases" scenario, for example). A stackstorm sensor to detect that state and clean it up (if deployment logs live somewhere ST2 can reach them) or a webhook in the deployment script that says "if can't deploy [...], call ST2".
  • Dynamic environment cleanup - If you create dynamic environments (per pull request, for example) and don't have a direct way to clean it up, maybe give "namespaces" with a specific label a TTL via stackstorm

...as has already been mentioned, possibilities are endless. I tend to try to find k8s solutions via the k8s ecosystem, but there are definitely gaps that ST2 can fill.

Also, here's a bonus i came across a while back:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants