-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust resources and modify node selection for different Frontend app… #2379
Conversation
📝 WalkthroughWalkthroughThis pull request involves systematic updates to Kubernetes stage configuration files across multiple services. The changes primarily focus on resource allocation and node affinity settings. Consistently across different service configurations, CPU and memory resource limits have been adjusted, with some reduced and others increased, while node affinity rules have been simplified. The modifications shift node selection criteria from Changes
Sequence DiagramsequenceDiagram
participant Scheduler
participant Node
Scheduler->>Node: Check node role
alt Control Plane Node
Scheduler->>Node: Schedule Pod
else Non-Control Plane Node
Scheduler-->>Node: Skip Scheduling
end
Suggested Reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
🔇 Additional comments (2)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
k8s/reports/values-stage.yaml (1)
32-37
: Verify control-plane node selection strategyMoving all services to target control-plane nodes could lead to resource contention. Control plane nodes typically run critical cluster components and should be protected from excessive workload.
Consider:
- Using dedicated worker nodes for application workloads
- Implementing proper taints/tolerations if control-plane nodes are intended for specific workloads
- Setting up node anti-affinity rules to ensure better pod distribution
k8s/docs/values-stage.yaml (1)
22-26
: Align resource settings with service typeDocumentation services typically have predictable resource usage patterns. The current settings might be too restrictive:
- CPU request of 1m is extremely low
- Memory request of 20Mi might be insufficient for serving documentation assets
Consider setting resource requests based on p90 usage metrics to ensure stable performance.
k8s/calibrate/values-stage.yaml (1)
22-26
: Review overall resource management strategyThe current configuration shows a pattern of minimal resource allocation across services. While this might optimize resource usage, it could impact service reliability.
Key considerations:
- The CPU request of 1m might cause scheduling issues
- All services targeting control-plane nodes could create a single point of failure
Recommendations:
- Implement proper resource monitoring to establish baseline requirements
- Consider using node labels for workload distribution instead of targeting control-plane nodes
- Document the reasoning behind these resource constraints for future reference
Also applies to: 36-41
k8s/platform/values-stage.yaml (1)
37-42
: Reconsider scheduling workloads on control-plane nodes.The node affinity configuration has been changed to specifically target control-plane nodes. This approach raises some architectural concerns:
- Control-plane nodes are critical for cluster management and should ideally be dedicated to these tasks
- Running application workloads on control-plane nodes could:
- Impact cluster stability
- Affect cluster management operations
- Pose potential security risks
Consider:
- Using dedicated worker nodes for application workloads
- If resource constraints are driving this decision, explore:
- Adding more worker nodes
- Using node pools with appropriate sizing
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
k8s/calibrate/values-stage.yaml
(2 hunks)k8s/docs/values-stage.yaml
(2 hunks)k8s/inventory/values-stage.yaml
(1 hunks)k8s/netmanager/values-stage.yaml
(2 hunks)k8s/platform/values-stage.yaml
(2 hunks)k8s/reports/values-stage.yaml
(1 hunks)
🔇 Additional comments (3)
k8s/reports/values-stage.yaml (1)
20-24
: Review resource allocation settingsThe CPU request of 1m (0.001 cores) seems extremely low for a reports service. While this allows for higher pod density, it might lead to CPU starvation under load. Consider setting more realistic CPU requests based on actual usage patterns.
Let's verify the historical CPU usage:
k8s/netmanager/values-stage.yaml (1)
20-24
: Validate frontend resource constraintsThe memory limit of 80Mi for a frontend application seems tight. Modern web applications typically require more memory for client-side processing and caching.
Let's check for any OOM (Out of Memory) incidents:
k8s/platform/values-stage.yaml (1)
24-26
: Verify the significant reduction in resource requests.The CPU request has been reduced by 90% (100m → 10m) and memory by 78% (700Mi → 150Mi). While this optimization could improve resource utilization, such aggressive reductions warrant careful consideration:
- The CPU request (10m) is only 5% of its limit (200m), which might lead to CPU throttling under load
- Consider monitoring these metrics after deployment:
- CPU throttling incidents
- Memory usage patterns
- Application response times
Would you like me to help generate a monitoring plan or suggest intermediate values for a more gradual reduction?
k8s/inventory/values-stage.yaml
Outdated
cpu: 1m | ||
memory: 100Mi | ||
limits: | ||
cpu: 50m | ||
memory: 80Mi | ||
cpu: 10m | ||
memory: 150Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider horizontal scaling implications
With CPU limits set to 10m, the service might hit CPU throttling under load. The autoscaling configuration (maxReplicas: 2) might need adjustment to compensate for the tight resource limits.
Consider:
- Increasing CPU limits to at least 50m based on service requirements
- Adjusting HPA maxReplicas to handle traffic spikes
- Setting appropriate targetCPUUtilizationPercentage for smoother scaling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Benjamin. I will perform merge duty after Samantha submits her PR review (approval).
Description
Adjust resources and Modify node selection for different applications
Changes Made
Testing
Additional Notes
[Add any additional notes or comments here]
Summary by CodeRabbit
Resource Optimization
Node Scheduling
general-purpose
tocontrol-plane