Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UI and other minor changes #333

Closed
wants to merge 29 commits into from
Closed

Fix UI and other minor changes #333

wants to merge 29 commits into from

Conversation

shreyas-badiger
Copy link
Collaborator

uthark and others added 29 commits May 17, 2022 11:14
Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shreyas Badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: Lara Aydin <[email protected]>

Co-authored-by: Shreyas Badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
* #arktika2474: lastnodeDrain & lastNodeTerminate not set

Signed-off-by: sbadla1 <[email protected]>

* #arktika2474: lastnodeDrain & lastNodeTerminate not set

Signed-off-by: sbadla1 <[email protected]>

* #arktika2474: lastnodeDrain & lastNodeTerminate not set

Signed-off-by: sbadla1 <[email protected]>

* #arktika2474: lastnodeDrain & lastNodeTerminate not set

Signed-off-by: sbadla1 <[email protected]>

* #arktika2474: lastnodeDrain & lastNodeTerminate not set

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: Venkata Gunapati <[email protected]>
Signed-off-by: sbadiger <[email protected]>
* Fix metrics calculation issue in v1

Signed-off-by: xshao <[email protected]>

* Expose the rollingUpgrade status as metrics

Signed-off-by: xshao <[email protected]>

* Fix nil exception

Signed-off-by: xshao <[email protected]>

* Add unit test

Signed-off-by: xshao <[email protected]>

* Uppgrade controller-runtime to v0.7.0

Signed-off-by: xshao <[email protected]>

* Revert "Uppgrade controller-runtime to v0.7.0"

This reverts commit 4996bbf

Signed-off-by: xshao <[email protected]>

* Keep the new metrics only.

Signed-off-by: xshao <[email protected]>

* Keep the new metrics only.

Signed-off-by: xshao <[email protected]>

* Keep the new metrics only.

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>
)

* Delete README.md

Signed-off-by: Eytan Avisror <[email protected]>

* delete all

Signed-off-by: Eytan Avisror <[email protected]>

* scaffolding

Signed-off-by: Eytan Avisror <[email protected]>

* add API

Signed-off-by: Eytan Avisror <[email protected]>

* initial code

Signed-off-by: Eytan Avisror <[email protected]>

* add more scaffolding

Signed-off-by: Eytan Avisror <[email protected]>

* Add kubernetes API calls

Signed-off-by: Eytan Avisror <[email protected]>

* aws API calls

Signed-off-by: Eytan Avisror <[email protected]>

* AWS API calls & Drift detection

Signed-off-by: Eytan Avisror <[email protected]>

* initial rotation logic

Signed-off-by: Eytan Avisror <[email protected]>

* Implemented RollingUpgrade object validation. (#176)

* Validation step to check Nodes and ASG launch configs

Signed-off-by: shreyas-badiger <[email protected]>

* Validating launch definition after a rolling upgrade

Signed-off-by: shreyas-badiger <[email protected]>

* Fix all the "make vet" errors in Controller V2 branch. (#177)

* Validation step to check Nodes and ASG launch configs

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Validating launch definition after a rolling upgrade

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Resolve error log message and return statement

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Adding Functional Test (#113)

* Adding BDD, workflow and badge

* Changing CI workflow job name

* Adding make manifests

* Clarifying cron time zone comment

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* release 0.13 (#115)

* release 0.13

* Update CHANGELOG.md

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* bump version (#116)

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Repo selection for CI and BDD workflows & CI step for releases (#117)

* CI-BDD not on forks & Step for releases (#2)

* Testing CI-BDD not on forks & Step for releases

* Adding step for image with tag git-tag

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Terminate unjoined nodes

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Resolving PR comments

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version 0.14. (#121)

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to 0.15-dev.

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix typo in README.md. (#125)

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Ignore the terminated instance during upgrade

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Added WARNING prefix in the logging

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Apply suggestions from code review

Co-authored-by: Kevin Downey <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Capitalize sprintf to Sprintf

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Upgrade to Go 1.15 (#128)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix few typos and simplify error returns, remove redundant types (#131)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Readiness gates implementation for eager mode (#130)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Adding Functional Test (#113)

* Adding BDD, workflow and badge

* Changing CI workflow job name

* Adding make manifests

* Clarifying cron time zone comment

Signed-off-by: sbadiger <[email protected]>

* Validation step to check Nodes and ASG launch configs (#112)

* Validation step to check Nodes and ASG launch configs

* Validating launch definition after a rolling upgrade

* Resolve error log message and return statement

Co-authored-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* release 0.13 (#115)

* release 0.13

* Update CHANGELOG.md

Signed-off-by: sbadiger <[email protected]>

* bump version (#116)

Signed-off-by: sbadiger <[email protected]>

* Repo selection for CI and BDD workflows & CI step for releases (#117)

* CI-BDD not on forks & Step for releases (#2)

* Testing CI-BDD not on forks & Step for releases

* Adding step for image with tag git-tag

Signed-off-by: sbadiger <[email protected]>

* Terminate unjoined nodes (#120)

* Validation step to check Nodes and ASG launch configs

* Validating launch definition after a rolling upgrade

* Resolve error log message and return statement

* Terminate unjoined nodes

* Resolving PR comments

Co-authored-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version 0.14. (#121)

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to 0.15-dev.

Signed-off-by: sbadiger <[email protected]>

* Fix bug when switching to launch templates (#136)

* Update rollingupgrade_controller.go

* Update rollingupgrade_controller.go

Signed-off-by: Eytan Avisror <[email protected]>

* spacing fixes

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Extract script runner to a separate type; fix work with env. variables (#132)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version v0.15 (#137)

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to v0.16-dev.

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Propagate parent env variables to allow to talk with API Server (#144)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump Golang CI action to fix failed CI run (#146)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Simplify (#145)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add Expiration to cache and do not refresh ASG if cache is not expired (#143)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix documentation for uniform across AZ Update strategy and fix typos (#147)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Move cluster state from package level to a cluster state impl (#148)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Simplify work with intstr type. (#149)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* If instance is in standby mode already, just return (#138)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Handle terminated instances gracefully. (#150)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Template version comparison fix (#155)

* get template version

Signed-off-by: Eytan Avisror <[email protected]>

* fix tests

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* release 0.16 (#157)

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* bump version to 0.17-dev (#158)

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Don't uncordon node on failure to run postDrain script when IgnoreDrainFailures set (#151)

* Don't uncordon node on failure to run postDrain script when IgnoreDrainFailures set

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Test node uncordon when postDrain / postDrainWait script fails

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Abort on strategy failure instead of continuing (#152)

* Abort on strategy failure instead of continuing

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Remove unformatted error message placeholder

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Explictly specify strategy for tests

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* use NamespacedName (#160)

Signed-off-by: Eytan Avisror <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version v0.17 (#161)

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to v0.18-dev (#162)

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Move constants to types so that they can be reused (#167)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Remove separate module for pkg/log (#168)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump dependencies. (#169)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* use standard fmt.Errorf to format error message; unify error format (#171)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix namespaced name order (#170)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add instance id to the logs (#173)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump golang and busybox (#172)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Expose template list and other execution errors to logs (#166)

* Log and return wrapped launchtemplate error

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Expose execution error in logs

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* output can contain other messages from API Server, so be more relaxed (#174)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Delete README.md

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* delete all

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* scaffolding

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add API

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* initial code

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add more scaffolding

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add kubernetes API calls

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* aws API calls

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* AWS API calls & Drift detection

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* validate() function

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* modified validate()

Signed-off-by: sbadiger <[email protected]>

* modified validate()

Signed-off-by: sbadiger <[email protected]>

* initial rotation logic

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* basic script_runner without any modifications

Signed-off-by: sbadiger <[email protected]>

* Fix all the vet related errors

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Alfredo Garo <[email protected]>
Co-authored-by: Eytan Avisror <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Craig Robson <[email protected]>
Co-authored-by: Kevin Downey <[email protected]>
Co-authored-by: Oleg Atamanenko <[email protected]>
Co-authored-by: Shreyas Badiger <[email protected]>
Co-authored-by: Adam Malcontenti-Wilson <[email protected]>
Co-authored-by: Adam Malcontenti-Wilson <[email protected]>
Co-authored-by: Eytan Avisror <[email protected]>

* Controller v2: Implementation of Instance termination (#178)

* fix make vet errors.

Signed-off-by: sbadiger <[email protected]>

* Terminate instances and run v2 for first time.

Signed-off-by: sbadiger <[email protected]>

* Addressing review comments

Signed-off-by: sbadiger <[email protected]>

* addressing more review comments

Signed-off-by: sbadiger <[email protected]>

* Log error message

Signed-off-by: sbadiger <[email protected]>

* error handling for instance tagging

Signed-off-by: sbadiger <[email protected]>

* Migrate Script Runner (#179)

* Basic script runner

Signed-off-by: Eytan Avisror <[email protected]>

* Update upgrade.go

Signed-off-by: Eytan Avisror <[email protected]>

* Implemented node drain. (#181)

* Eager mode implementation (#183)

* Eager mode implementation

Signed-off-by: sbadiger <[email protected]>

* Metrics features (#189)

Signed-off-by: xshao <[email protected]>

* Process the batch rotation in parallel (#192)

* Process the batch rotation in parallel

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* Move the DrainManager within ReplaceBatch(), to access one per RollingUpgrade CR (#195)

Signed-off-by: sbadiger <[email protected]>

* Refine metrics implementation to support goroutines (#196)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>

* Ignore generated code  (#201)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Fix bug in deleting the entry in syncMap (#203)

Signed-off-by: sbadiger <[email protected]>

* Unit tests for controller-v2 (#215)

* Unit tests

Signed-off-by: sbadiger <[email protected]>

* minor change in accessing the namespace name

Signed-off-by: sbadiger <[email protected]>

* move helper functions to a differnt file

Signed-off-by: sbadiger <[email protected]>

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: renamed some methods related to metrics (#224)

Signed-off-by: sbadla1 <[email protected]>

* #2286: removed version from metric namespace (#227)

Signed-off-by: sbadla1 <[email protected]>

* Create RollingUpgradeContext (#234)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>

* Resolve compile errors caused by merge conflict. (#235)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

* resolve compile errors due to merge conflict

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>

* upgrade-manager-v2: Move DrainManager back to Reconciler (#236)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2285: renamed some methods related to metrics (#224)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2286: removed version from metric namespace (#227)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* resolve compile errors due to merge conflict

Signed-off-by: sbadiger <[email protected]>

* move drain-manager to reconciler

Signed-off-by: sbadiger <[email protected]>

* initialize RollingUpgrade object

Signed-off-by: sbadiger <[email protected]>

* use bool instead of count for standby function

Signed-off-by: sbadiger <[email protected]>

* refactor in-progress and standby code

Signed-off-by: sbadiger <[email protected]>

* rename instance standby function

Signed-off-by: sbadiger <[email protected]>

* DrainManager changes in unit test files

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>

* V2 controller metrics concurrency fix (#231)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into upgrade_metrics.go

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into metrics.go

Signed-off-by: xshao <[email protected]>

* add missing parenthesis (#239)

* metricsMutex should be initialized (#240)

Signed-off-by: xshao <[email protected]>

* upgrade-manager-v2: Load test fixes (#245)

* upgrade-manager-v2: Move DrainManager back to Reconciler (#236)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2285: renamed some methods related to metrics (#224)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2286: removed version from metric namespace (#227)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* resolve compile errors due to merge conflict

Signed-off-by: sbadiger <[email protected]>

* move drain-manager to reconciler

Signed-off-by: sbadiger <[email protected]>

* initialize RollingUpgrade object

Signed-off-by: sbadiger <[email protected]>

* use bool instead of count for standby function

Signed-off-by: sbadiger <[email protected]>

* refactor in-progress and standby code

Signed-off-by: sbadiger <[email protected]>

* rename instance standby function

Signed-off-by: sbadiger <[email protected]>

* DrainManager changes in unit test files

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* V2 controller metrics concurrency fix (#231)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into upgrade_metrics.go

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into metrics.go

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add missing parenthesis

Signed-off-by: sbadiger <[email protected]>

* load test fixes

Signed-off-by: sbadiger <[email protected]>

* handle scaling group not found

Signed-off-by: sbadiger <[email protected]>

* Update upgrade.go

Signed-off-by: sbadiger <[email protected]>

* log one level up

* remove double logging

Signed-off-by: sbadiger <[email protected]>

* final push before RC release. (#254)

* support IgnoreDrainFailures flag

Signed-off-by: sbadiger <[email protected]>

* add else condition

Signed-off-by: sbadiger <[email protected]>

* set min for maxUnavailable

Signed-off-by: sbadiger <[email protected]>

* calculateMaxUnavailable function

Signed-off-by: sbadiger <[email protected]>

* add a new coloumn (completePercentage)

Signed-off-by: sbadiger <[email protected]>

* disable debug logs by default

Signed-off-by: sbadiger <[email protected]>

* Fix metrics collecting issue (#249)

* metricsMutex should be initialized

Signed-off-by: xshao <[email protected]>

* Use InProcessingNode instead of Stringp[] so that it can have the status of steps

Signed-off-by: xshao <[email protected]>

* Revert "Fix metrics collecting issue (#249)" (#256)

This reverts commit f5dd1cb5f76f2b78cb15c53daed14032a2a4c6ec.

* Fix metrics calculation issue (#258)

* metricsMutex should be initialized

Signed-off-by: xshao <[email protected]>

* Use InProcessingNode instead of Stringp[] so that it can have the status of steps

Signed-off-by: xshao <[email protected]>

* Make the change backward compatible

Signed-off-by: xshao <[email protected]>

* Make the change backward compatible

Signed-off-by: xshao <[email protected]>

* Add mutex for InProcessingNode deleting

Signed-off-by: xshao <[email protected]>

* Add a mock for test and update version in Makefile (#262)

Signed-off-by: sbadiger <[email protected]>

* and CR end time (#264)

Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: expose totalProcessing time and other metrics (#265)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: remove function duplicate declaration. (#266)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* remove function duplication

Signed-off-by: sbadiger <[email protected]>

* Carry the metrics status in RollingUpgrade CR (#267)

* Update metrics status at same time

Signed-off-by: xshao <[email protected]>

* Update metrics status when terminating instance

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* move cloud discovery after nodeInterval / drainInterval wait (#270)

Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: Add nodeEvents handler instead of a watch handler (#272)

* upgrade-manager-v2: remove function duplicate declaration. (#266)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* remove function duplication

Signed-off-by: sbadiger <[email protected]>

* Carry the metrics status in RollingUpgrade CR (#267)

* Update metrics status at same time

Signed-off-by: xshao <[email protected]>

* Update metrics status when terminating instance

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* move cloud discovery after nodeInterval / drainInterval wait

Signed-off-by: sbadiger <[email protected]>

* Add watch event for cluster nodes instead of API calls

Signed-off-by: sbadiger <[email protected]>

* upon node deletion, remove it from syncMap as well

Signed-off-by: sbadiger <[email protected]>

* Add nodeEvents handler instead of watch handler

Signed-off-by: sbadiger <[email protected]>

* Ignore Reconciles on nodeEvents

Signed-off-by: sbadiger <[email protected]>

* Add comments

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sheldon Shao <[email protected]>

* upgrade-manager-v2: Process next batch while waiting on nodeInterval period. (#273)

* upgrade-manager-v2: remove function duplicate declaration. (#266)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* remove function duplication

Signed-off-by: sbadiger <[email protected]>

* Carry the metrics status in RollingUpgrade CR (#267)

* Update metrics status at same time

Signed-off-by: xshao <[email protected]>

* Update metrics status when terminating instance

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* move cloud discovery after nodeInterval / drainInterval wait

Signed-off-by: sbadiger <[email protected]>

* Add watch event for cluster nodes instead of API calls

Signed-off-by: sbadiger <[email protected]>

* upon node deletion, remove it from syncMap as well

Signed-off-by: sbadiger <[email protected]>

* Add nodeEvents handler instead of watch handler

Signed-off-by: sbadiger <[email protected]>

* Ignore Reconciles on nodeEvents

Signed-off-by: sbadiger <[email protected]>

* Add comments

Signed-off-by: sbadiger <[email protected]>

* Set nextbatch to standBy while waiting for terminate

* Avoid parallel reconcile operation per ASG

* add default requeue time

Co-authored-by: Sheldon Shao <[email protected]>

* upgrade-manager-v2: Fix unit tests (#275)

* Delete README.md

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* delete all

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* scaffolding

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add API

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* initial code

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add more scaffolding

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add kubernetes API calls

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* aws API calls

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* AWS API calls & Drift detection

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* initial rotation logic

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Implemented RollingUpgrade object validation. (#176)

* Validation step to check Nodes and ASG launch configs

Signed-off-by: shreyas-badiger <[email protected]>

* Validating launch definition after a rolling upgrade

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix all the "make vet" errors in Controller V2 branch. (#177)

* Validation step to check Nodes and ASG launch configs

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Validating launch definition after a rolling upgrade

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Resolve error log message and return statement

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Adding Functional Test (#113)

* Adding BDD, workflow and badge

* Changing CI workflow job name

* Adding make manifests

* Clarifying cron time zone comment

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* release 0.13 (#115)

* release 0.13

* Update CHANGELOG.md

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* bump version (#116)

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Repo selection for CI and BDD workflows & CI step for releases (#117)

* CI-BDD not on forks & Step for releases (#2)

* Testing CI-BDD not on forks & Step for releases

* Adding step for image with tag git-tag

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Terminate unjoined nodes

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Resolving PR comments

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version 0.14. (#121)

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to 0.15-dev.

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix typo in README.md. (#125)

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Ignore the terminated instance during upgrade

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Added WARNING prefix in the logging

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Apply suggestions from code review

Co-authored-by: Kevin Downey <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Capitalize sprintf to Sprintf

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Upgrade to Go 1.15 (#128)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix few typos and simplify error returns, remove redundant types (#131)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Readiness gates implementation for eager mode (#130)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Adding Functional Test (#113)

* Adding BDD, workflow and badge

* Changing CI workflow job name

* Adding make manifests

* Clarifying cron time zone comment

Signed-off-by: sbadiger <[email protected]>

* Validation step to check Nodes and ASG launch configs (#112)

* Validation step to check Nodes and ASG launch configs

* Validating launch definition after a rolling upgrade

* Resolve error log message and return statement

Co-authored-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* release 0.13 (#115)

* release 0.13

* Update CHANGELOG.md

Signed-off-by: sbadiger <[email protected]>

* bump version (#116)

Signed-off-by: sbadiger <[email protected]>

* Repo selection for CI and BDD workflows & CI step for releases (#117)

* CI-BDD not on forks & Step for releases (#2)

* Testing CI-BDD not on forks & Step for releases

* Adding step for image with tag git-tag

Signed-off-by: sbadiger <[email protected]>

* Terminate unjoined nodes (#120)

* Validation step to check Nodes and ASG launch configs

* Validating launch definition after a rolling upgrade

* Resolve error log message and return statement

* Terminate unjoined nodes

* Resolving PR comments

Co-authored-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version 0.14. (#121)

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to 0.15-dev.

Signed-off-by: sbadiger <[email protected]>

* Fix bug when switching to launch templates (#136)

* Update rollingupgrade_controller.go

* Update rollingupgrade_controller.go

Signed-off-by: Eytan Avisror <[email protected]>

* spacing fixes

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Extract script runner to a separate type; fix work with env. variables (#132)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version v0.15 (#137)

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to v0.16-dev.

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Propagate parent env variables to allow to talk with API Server (#144)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump Golang CI action to fix failed CI run (#146)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Simplify (#145)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add Expiration to cache and do not refresh ASG if cache is not expired (#143)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix documentation for uniform across AZ Update strategy and fix typos (#147)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Move cluster state from package level to a cluster state impl (#148)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Simplify work with intstr type. (#149)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* If instance is in standby mode already, just return (#138)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Handle terminated instances gracefully. (#150)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Template version comparison fix (#155)

* get template version

Signed-off-by: Eytan Avisror <[email protected]>

* fix tests

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* release 0.16 (#157)

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* bump version to 0.17-dev (#158)

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Don't uncordon node on failure to run postDrain script when IgnoreDrainFailures set (#151)

* Don't uncordon node on failure to run postDrain script when IgnoreDrainFailures set

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Test node uncordon when postDrain / postDrainWait script fails

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Abort on strategy failure instead of continuing (#152)

* Abort on strategy failure instead of continuing

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Remove unformatted error message placeholder

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Explictly specify strategy for tests

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* use NamespacedName (#160)

Signed-off-by: Eytan Avisror <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Set version and update CHANGELOG for version v0.17 (#161)

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump version to v0.18-dev (#162)

Signed-off-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Move constants to types so that they can be reused (#167)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Remove separate module for pkg/log (#168)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump dependencies. (#169)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* use standard fmt.Errorf to format error message; unify error format (#171)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix namespaced name order (#170)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add instance id to the logs (#173)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Bump golang and busybox (#172)

Signed-off-by: Oleg Atamanenko <[email protected]>

Co-authored-by: Shri Javadekar <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Expose template list and other execution errors to logs (#166)

* Log and return wrapped launchtemplate error

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>

* Expose execution error in logs

Signed-off-by: Adam Malcontenti-Wilson <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* output can contain other messages from API Server, so be more relaxed (#174)

Signed-off-by: Oleg Atamanenko <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Delete README.md

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* delete all

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* scaffolding

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add API

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* initial code

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add more scaffolding

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add kubernetes API calls

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* aws API calls

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* AWS API calls & Drift detection

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* validate() function

Signed-off-by: shreyas-badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* modified validate()

Signed-off-by: sbadiger <[email protected]>

* modified validate()

Signed-off-by: sbadiger <[email protected]>

* initial rotation logic

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* basic script_runner without any modifications

Signed-off-by: sbadiger <[email protected]>

* Fix all the vet related errors

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Alfredo Garo <[email protected]>
Co-authored-by: Eytan Avisror <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Shri Javadekar <[email protected]>
Co-authored-by: Craig Robson <[email protected]>
Co-authored-by: Kevin Downey <[email protected]>
Co-authored-by: Oleg Atamanenko <[email protected]>
Co-authored-by: Shreyas Badiger <[email protected]>
Co-authored-by: Adam Malcontenti-Wilson <[email protected]>
Co-authored-by: Adam Malcontenti-Wilson <[email protected]>
Co-authored-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Controller v2: Implementation of Instance termination (#178)

* fix make vet errors.

Signed-off-by: sbadiger <[email protected]>

* Terminate instances and run v2 for first time.

Signed-off-by: sbadiger <[email protected]>

* Addressing review comments

Signed-off-by: sbadiger <[email protected]>

* addressing more review comments

Signed-off-by: sbadiger <[email protected]>

* Log error message

Signed-off-by: sbadiger <[email protected]>

* error handling for instance tagging

Signed-off-by: sbadiger <[email protected]>

* Migrate Script Runner (#179)

* Basic script runner

Signed-off-by: Eytan Avisror <[email protected]>

* Update upgrade.go

Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Implemented node drain. (#181)

Signed-off-by: sbadiger <[email protected]>

* Eager mode implementation (#183)

* Eager mode implementation

Signed-off-by: sbadiger <[email protected]>

* Metrics features (#189)

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Process the batch rotation in parallel (#192)

* Process the batch rotation in parallel

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* Move the DrainManager within ReplaceBatch(), to access one per RollingUpgrade CR (#195)

Signed-off-by: sbadiger <[email protected]>

* Refine metrics implementation to support goroutines (#196)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Ignore generated code  (#201)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Fix bug in deleting the entry in syncMap (#203)

Signed-off-by: sbadiger <[email protected]>

* Unit tests for controller-v2 (#215)

* Unit tests

Signed-off-by: sbadiger <[email protected]>

* minor change in accessing the namespace name

Signed-off-by: sbadiger <[email protected]>

* move helper functions to a differnt file

Signed-off-by: sbadiger <[email protected]>

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2285: renamed some methods related to metrics (#224)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2286: removed version from metric namespace (#227)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgradeContext (#234)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Resolve compile errors caused by merge conflict. (#235)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

* resolve compile errors due to merge conflict

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: Move DrainManager back to Reconciler (#236)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2285: renamed some methods related to metrics (#224)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2286: removed version from metric namespace (#227)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* resolve compile errors due to merge conflict

Signed-off-by: sbadiger <[email protected]>

* move drain-manager to reconciler

Signed-off-by: sbadiger <[email protected]>

* initialize RollingUpgrade object

Signed-off-by: sbadiger <[email protected]>

* use bool instead of count for standby function

Signed-off-by: sbadiger <[email protected]>

* refactor in-progress and standby code

Signed-off-by: sbadiger <[email protected]>

* rename instance standby function

Signed-off-by: sbadiger <[email protected]>

* DrainManager changes in unit test files

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* V2 controller metrics concurrency fix (#231)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into upgrade_metrics.go

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into metrics.go

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add missing parenthesis (#239)

Signed-off-by: sbadiger <[email protected]>

* metricsMutex should be initialized (#240)

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: Load test fixes (#245)

* upgrade-manager-v2: Move DrainManager back to Reconciler (#236)

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* log cloud discovery failure

Signed-off-by: sbadiger <[email protected]>

* Create RollingUpgrade Context

Signed-off-by: sbadiger <[email protected]>

* rollingupgrade context

Signed-off-by: sbadiger <[email protected]>

* #2285: rollup CR statistic metrics in v2 (#218)

* #2285: rollup CR statistic metrics in v2

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>

* #2285: updated metric flags

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2285: renamed some methods related to metrics (#224)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* #2286: removed version from metric namespace (#227)

Signed-off-by: sbadla1 <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* resolve compile errors due to merge conflict

Signed-off-by: sbadiger <[email protected]>

* move drain-manager to reconciler

Signed-off-by: sbadiger <[email protected]>

* initialize RollingUpgrade object

Signed-off-by: sbadiger <[email protected]>

* use bool instead of count for standby function

Signed-off-by: sbadiger <[email protected]>

* refactor in-progress and standby code

Signed-off-by: sbadiger <[email protected]>

* rename instance standby function

Signed-off-by: sbadiger <[email protected]>

* DrainManager changes in unit test files

Signed-off-by: sbadiger <[email protected]>

Co-authored-by: Sahil Badla <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* V2 controller metrics concurrency fix (#231)

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Refine the metrics status

Signed-off-by: xshao <[email protected]>

* Fix test case error

Signed-off-by: xshao <[email protected]>

* Use group instead of ASG

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Ignore generated code

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Fix the concurrent issue

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into RollingUpgradeContext

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into upgrade_metrics.go

Signed-off-by: xshao <[email protected]>

* Move metrics related functions into metrics.go

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* add missing parenthesis

Signed-off-by: sbadiger <[email protected]>

* load test fixes

Signed-off-by: sbadiger <[email protected]>

* handle scaling group not found

Signed-off-by: sbadiger <[email protected]>

* Update upgrade.go

Signed-off-by: sbadiger <[email protected]>

* log one level up

* remove double logging

Signed-off-by: sbadiger <[email protected]>

* final push before RC release. (#254)

* support IgnoreDrainFailures flag

Signed-off-by: sbadiger <[email protected]>

* add else condition

Signed-off-by: sbadiger <[email protected]>

* set min for maxUnavailable

Signed-off-by: sbadiger <[email protected]>

* calculateMaxUnavailable function

Signed-off-by: sbadiger <[email protected]>

* add a new coloumn (completePercentage)

Signed-off-by: sbadiger <[email protected]>

* disable debug logs by default

Signed-off-by: sbadiger <[email protected]>

* Fix metrics collecting issue (#249)

* metricsMutex should be initialized

Signed-off-by: xshao <[email protected]>

* Use InProcessingNode instead of Stringp[] so that it can have the status of steps

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Revert "Fix metrics collecting issue (#249)" (#256)

This reverts commit f5dd1cb5f76f2b78cb15c53daed14032a2a4c6ec.

Signed-off-by: sbadiger <[email protected]>

* Fix metrics calculation issue (#258)

* metricsMutex should be initialized

Signed-off-by: xshao <[email protected]>

* Use InProcessingNode instead of Stringp[] so that it can have the status of steps

Signed-off-by: xshao <[email protected]>

* Make the change backward compatible

Signed-off-by: xshao <[email protected]>

* Make the change backward compatible

Signed-off-by: xshao <[email protected]>

* Add mutex for InProcessingNode deleting

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* Add a mock for test and update version in Makefile (#262)

Signed-off-by: sbadiger <[email protected]>

* and CR end time (#264)

Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: expose totalProcessing time and other metrics (#265)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: remove function duplicate declaration. (#266)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* remove function duplication

Signed-off-by: sbadiger <[email protected]>

* Carry the metrics status in RollingUpgrade CR (#267)

* Update metrics status at same time

Signed-off-by: xshao <[email protected]>

* Update metrics status when terminating instance

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>
Signed-off-by: sbadiger <[email protected]>

* move cloud discovery after nodeInterval / drainInterval wait (#270)

Signed-off-by: sbadiger <[email protected]>

* upgrade-manager-v2: Add nodeEvents handler instead of a watch handler (#272)

* upgrade-manager-v2: remove function duplicate declaration. (#266)

* and CR end time

Signed-off-by: sbadiger <[email protected]>

* expose totalProcessing time and other metrics

Signed-off-by: sbadiger <[email protected]>

* addressing review comments

Signed-off-by: sbadiger <[email protected]>

* remove function duplication

Signed-off-by: sbadiger <[email protected]>

* Carry the metrics status in RollingUpgrade CR (#267)

* Update metrics status at same time

Signed-off-by: xshao <[email protected]>

* Update metrics status when terminating instance

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <[email protected]>

* Add terminated step

Signed-off-by: xshao <Sheldon_…
* add README

Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: Eytan Avisror <[email protected]>
Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
* add clean target in Makefile

Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
* add clean target in Makefile

Signed-off-by: sbadiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
* fix BDD Github action

Signed-off-by: sbadiger <[email protected]>
* fix error 'failed to set instances to stand-by'

Signed-off-by: Ameya Joshi <[email protected]>

Co-authored-by: Ameya Joshi <[email protected]>
Co-authored-by: Shreyas Badiger <[email protected]>
Signed-off-by: sbadiger <[email protected]>
@shreyas-badiger
Copy link
Collaborator Author

Github is pulling several other commits. Closing the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants