Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor & Bug Fixes #38

Merged
merged 14 commits into from
Jan 15, 2020
Merged

Conversation

eytan-avisror
Copy link
Collaborator

Fixes #37, #34
Contributes to #35

This PR adds/modifies the following:

  • Introduce AWS API Caching to help with throttling with concurrent terminations
  • Implement custom waiters instead of using AWS waiters (to improve caching).
  • Use event.eventCompleted to drop goroutines when an instance is abandoned (Abandoned instances don't drop active goroutines #34)
  • Drain failure will no longer fails an event (Drain failure should not fail event #37)
  • Structure improvements to improve performance, along with jitter/sleep changes
  • Deregister sequentially before spawning waiters
  • Fix conflict with event names (time may not be unique)

@eytan-avisror eytan-avisror requested a review from a team as a code owner January 14, 2020 21:15
@codecov
Copy link

codecov bot commented Jan 14, 2020

Codecov Report

Merging #38 into master will decrease coverage by 7.6%.
The diff coverage is 58.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #38      +/-   ##
==========================================
- Coverage   80.42%   72.82%   -7.61%     
==========================================
  Files           9        9              
  Lines         746      861     +115     
==========================================
+ Hits          600      627      +27     
- Misses        100      176      +76     
- Partials       46       58      +12
Impacted Files Coverage Δ
pkg/service/types.go 60.86% <100%> (+0.86%) ⬆️
pkg/service/elbv2.go 100% <100%> (+7.14%) ⬆️
pkg/service/events.go 81.25% <100%> (+0.6%) ⬆️
pkg/service/elb.go 100% <100%> (+5.26%) ⬆️
pkg/service/server.go 62.36% <41.3%> (-14.83%) ⬇️
pkg/service/autoscaling.go 78.57% <60%> (ø) ⬆️
pkg/service/sqs.go 71.42% <63.63%> (-2.77%) ⬇️
pkg/service/nodes.go 86.74% <75%> (-3.88%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4dd3244...c9fd52a. Read the comment docs.

Copy link
Contributor

@kevdowney kevdowney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cmd/serve.go Show resolved Hide resolved
cmd/serve.go Outdated Show resolved Hide resolved
Copy link

@davemasselink davemasselink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great updates, cache all the things!!

pkg/service/elb.go Show resolved Hide resolved
pkg/service/elbv2.go Show resolved Hide resolved
pkg/service/server.go Outdated Show resolved Hide resolved
pkg/service/server.go Outdated Show resolved Hide resolved
pkg/service/server.go Outdated Show resolved Hide resolved
pkg/service/server.go Show resolved Hide resolved
cmd/serve.go Outdated Show resolved Hide resolved
@eytan-avisror
Copy link
Collaborator Author

eytan-avisror commented Jan 15, 2020

Load tests seem good:

Test A (Extreme Scale), 500 TargetGroups

Screen Shot 2020-01-15 at 1 25 58 PM

3 concurrent terminations:

Time To Process: avg. 15 minutes
Cache Hit/Total: 2095/7590 (~27%)
Throttled Calls: 4950 calls
Restarts: 0

16 concurrent terminations:

Time To Process: avg. 50 minutes
Cache Hit/Total: 4780/23052 (~20%)
Throttled Calls: 20664 calls
Restarts: 0

Test B (Average Scale), 250 TargetGroups

Screen Shot 2020-01-15 at 2 19 57 PM

19 concurrent terminations:

Time To Process: avg. 30 minutes
Cache Hit/Total: 2519/14233 (~17%)
Throttled Calls: 11425 calls
Restarts: 0

Test C (Low Scale), 50 TargetGroups

Screen Shot 2020-01-15 at 2 58 55 PM

19 concurrent terminations:

Time To Process: avg. 10 minutes
Cache Hit/Total: 5543/11858 (~46%)
Throttled Calls: 5933 calls
Restarts: 0

@eytan-avisror eytan-avisror merged commit 980154a into keikoproj:master Jan 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drain failure should not fail event
3 participants