Adjust resources and modify node selection for different Frontend app… #2379

BenjaminSsempala · 2025-01-15T21:23:20Z

Description

Adjust resources and Modify node selection for different applications

Changes Made

Modify requests and limits for apps
Modify affinity for apps

Testing

Tested locally
Tested against staging environment
Relevant tests passed: [List test names]

Additional Notes

[Add any additional notes or comments here]

Summary by CodeRabbit

Resource Optimization
- Reduced CPU limits across multiple services from 50m to 10m
- Adjusted memory limits and requests for various services
- Increased memory limits in some services to enhance performance
- Simplified resource allocation configurations
Node Scheduling
- Updated node affinity configurations
- Shifted node selection criteria from general-purpose to control-plane
- Simplified node selector terms across services, focusing on specific roles

…lications

coderabbitai · 2025-01-15T21:23:27Z

📝 Walkthrough

Walkthrough

This pull request involves systematic updates to Kubernetes stage configuration files across multiple services. The changes primarily focus on resource allocation and node affinity settings. Consistently across different service configurations, CPU and memory resource limits have been adjusted, with some reduced and others increased, while node affinity rules have been simplified. The modifications shift node selection criteria from node-type to role, specifically targeting control-plane nodes, indicating a strategic reconfiguration of deployment scheduling parameters.

Changes

File	Changes
`k8s/calibrate/values-stage.yaml`	- CPU limit: 50m → 10m - Memory limit: 80Mi → 100Mi - CPU request: 5m → 1m - Node affinity: Removed `preferredDuringSchedulingIgnoredDuringExecution`, added `nodeSelectorTerms` for `role: control-plane`
`k8s/docs/values-stage.yaml`	- CPU limit: 50m → 10m - CPU request: 10m → 1m - Node affinity: Simplified to `nodeSelectorTerms` with `role: control-plane`
`k8s/inventory/values-stage.yaml`	- CPU request: 5m → 10m - Memory request: 20Mi → 100Mi - Memory limit: 80Mi → 150Mi - Node affinity: Updated to `role: control-plane`
`k8s/netmanager/values-stage.yaml`	- CPU request: 5m → 20m - CPU limit: 50m → 100m - Memory limit: 80Mi → 200Mi - Node affinity: Updated to `role: control-plane`
`k8s/platform/values-stage.yaml`	- CPU request: 100m → 20m - Memory request: 250Mi → 150Mi - Memory limit: 700Mi → 350Mi - Node affinity: Updated to `role: control-plane`
`k8s/reports/values-stage.yaml`	- CPU request: 5m → 10m - Memory request: 20Mi → 150Mi - Memory limit: 80Mi → 200Mi - Node affinity: Updated to `role: moderate-usage`

Sequence Diagram

sequenceDiagram
    participant Scheduler
    participant Node
    Scheduler->>Node: Check node role
    alt Control Plane Node
        Scheduler->>Node: Schedule Pod
    else Non-Control Plane Node
        Scheduler-->>Node: Skip Scheduling
    end

Suggested Reviewers

sserurich
Psalmz777
Baalmart

Poem

🌐 Kubernetes Dance 🕺
Resources trimmed, nodes aligned,
Control plane nodes now refined,
CPU limits take a bow,
Scheduling magic happens now!
🚀 Deploy with grace, deploy with might! 🌟

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0dc979 and b8207af.

📒 Files selected for processing (4)

k8s/inventory/values-stage.yaml (1 hunks)
k8s/netmanager/values-stage.yaml (1 hunks)
k8s/platform/values-stage.yaml (2 hunks)
k8s/reports/values-stage.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

k8s/inventory/values-stage.yaml
k8s/reports/values-stage.yaml
k8s/netmanager/values-stage.yaml

🔇 Additional comments (2)

k8s/platform/values-stage.yaml (2)

37-42: Reconsider scheduling workloads on control-plane nodes.

The current node affinity configuration targets control-plane nodes, which is generally not recommended because:

Control-plane nodes should be reserved for critical cluster components

This practice could impact cluster stability and scalability

Worker nodes are better suited for application workloads

Consider using dedicated worker nodes instead:
  nodeSelectorTerms:
    - matchExpressions:
        - key: role
          operator: In
          values:
-           - control-plane
+           - worker
Let's check if this is a pattern across other services:

23-26: Review resource allocation settings carefully.

The current resource settings raise a few concerns:

The CPU request of 20m (0.02 cores) seems extremely low for a Next.js frontend application, which could lead to poor performance under load.

The memory limit of 350Mi might be restrictive for a Next.js application, especially with SSR workloads.

The large gap between limits and requests (10x for CPU) might cause resource contention issues.

Let's verify the application's resource usage patterns:

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

k8s/reports/values-stage.yaml (1)

32-37: Verify control-plane node selection strategy

Moving all services to target control-plane nodes could lead to resource contention. Control plane nodes typically run critical cluster components and should be protected from excessive workload.

Consider:

Using dedicated worker nodes for application workloads

Implementing proper taints/tolerations if control-plane nodes are intended for specific workloads

Setting up node anti-affinity rules to ensure better pod distribution

k8s/docs/values-stage.yaml (1)

22-26: Align resource settings with service type

Documentation services typically have predictable resource usage patterns. The current settings might be too restrictive:

CPU request of 1m is extremely low

Memory request of 20Mi might be insufficient for serving documentation assets

Consider setting resource requests based on p90 usage metrics to ensure stable performance.

k8s/calibrate/values-stage.yaml (1)

22-26: Review overall resource management strategy

The current configuration shows a pattern of minimal resource allocation across services. While this might optimize resource usage, it could impact service reliability.

Key considerations:

The CPU request of 1m might cause scheduling issues

All services targeting control-plane nodes could create a single point of failure

Recommendations:

Implement proper resource monitoring to establish baseline requirements

Consider using node labels for workload distribution instead of targeting control-plane nodes

Document the reasoning behind these resource constraints for future reference

Also applies to: 36-41

k8s/platform/values-stage.yaml (1)

37-42: Reconsider scheduling workloads on control-plane nodes.

The node affinity configuration has been changed to specifically target control-plane nodes. This approach raises some architectural concerns:

Control-plane nodes are critical for cluster management and should ideally be dedicated to these tasks

Running application workloads on control-plane nodes could:

Impact cluster stability

Affect cluster management operations

Pose potential security risks

Consider:

Using dedicated worker nodes for application workloads

If resource constraints are driving this decision, explore:

Adding more worker nodes

Using node pools with appropriate sizing

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7485e58 and c0dc979.

📒 Files selected for processing (6)

k8s/calibrate/values-stage.yaml (2 hunks)
k8s/docs/values-stage.yaml (2 hunks)
k8s/inventory/values-stage.yaml (1 hunks)
k8s/netmanager/values-stage.yaml (2 hunks)
k8s/platform/values-stage.yaml (2 hunks)
k8s/reports/values-stage.yaml (1 hunks)

🔇 Additional comments (3)

k8s/reports/values-stage.yaml (1)

20-24: Review resource allocation settings

The CPU request of 1m (0.001 cores) seems extremely low for a reports service. While this allows for higher pod density, it might lead to CPU starvation under load. Consider setting more realistic CPU requests based on actual usage patterns.

Let's verify the historical CPU usage:

k8s/netmanager/values-stage.yaml (1)

20-24: Validate frontend resource constraints

The memory limit of 80Mi for a frontend application seems tight. Modern web applications typically require more memory for client-side processing and caching.

Let's check for any OOM (Out of Memory) incidents:

k8s/platform/values-stage.yaml (1)

24-26: Verify the significant reduction in resource requests.

The CPU request has been reduced by 90% (100m → 10m) and memory by 78% (700Mi → 150Mi). While this optimization could improve resource utilization, such aggressive reductions warrant careful consideration:

The CPU request (10m) is only 5% of its limit (200m), which might lead to CPU throttling under load

Consider monitoring these metrics after deployment:

CPU throttling incidents

Memory usage patterns

Application response times

Would you like me to help generate a monitoring plan or suggest intermediate values for a more gradual reduction?

coderabbitai · 2025-01-15T21:25:55Z

k8s/inventory/values-stage.yaml

+    cpu: 1m
+    memory: 100Mi
  limits:
-    cpu: 50m
-    memory: 80Mi
+    cpu: 10m
+    memory: 150Mi


🛠️ Refactor suggestion

Consider horizontal scaling implications

With CPU limits set to 10m, the service might hit CPU throttling under load. The autoscaling configuration (maxReplicas: 2) might need adjustment to compensate for the tight resource limits.

Consider:

Increasing CPU limits to at least 50m based on service requirements

Adjusting HPA maxReplicas to handle traffic spikes

Setting appropriate targetCPUUtilizationPercentage for smoother scaling

Baalmart

Thanks Benjamin. I will perform merge duty after Samantha submits her PR review (approval).

Adjust resources and modify node selection for different Frontend app…

c0dc979

…lications

BenjaminSsempala added the ready for review label Jan 15, 2025

BenjaminSsempala requested review from Baalmart and Psalmz777 January 15, 2025 21:23

BenjaminSsempala self-assigned this Jan 15, 2025

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

Baalmart approved these changes Jan 16, 2025

View reviewed changes

Modify resources for some apps

b8207af

Psalmz777 approved these changes Jan 17, 2025

View reviewed changes

Baalmart merged commit 48131c5 into staging Jan 21, 2025
31 checks passed

Baalmart deleted the resource-utilisation branch January 21, 2025 06:32

Baalmart mentioned this pull request Jan 21, 2025

move to production #2389

Merged

2 tasks

This was referenced Jan 22, 2025

App Engine optimisation: Reducing idle instances for app engine #2390

Merged

Resource optimisation: Updating node Selection values #2388

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust resources and modify node selection for different Frontend app… #2379

Adjust resources and modify node selection for different Frontend app… #2379

BenjaminSsempala commented Jan 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Suggested Reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jan 15, 2025

Baalmart left a comment

Adjust resources and modify node selection for different Frontend app… #2379

Adjust resources and modify node selection for different Frontend app… #2379

Conversation

BenjaminSsempala commented Jan 15, 2025 • edited by coderabbitai bot Loading

Description

Changes Made

Testing

Additional Notes

Summary by CodeRabbit

coderabbitai bot commented Jan 15, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram

Suggested Reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 15, 2025

Choose a reason for hiding this comment

Baalmart left a comment

Choose a reason for hiding this comment

BenjaminSsempala commented Jan 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)