Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust resources and modify node selection for different Frontend app… #2379

Merged
merged 2 commits into from
Jan 21, 2025

Conversation

BenjaminSsempala
Copy link
Contributor

@BenjaminSsempala BenjaminSsempala commented Jan 15, 2025

Description

Adjust resources and Modify node selection for different applications

Changes Made

  • Modify requests and limits for apps
  • Modify affinity for apps

Testing

  • Tested locally
  • Tested against staging environment
  • Relevant tests passed: [List test names]

Additional Notes

[Add any additional notes or comments here]

Summary by CodeRabbit

  • Resource Optimization

    • Reduced CPU limits across multiple services from 50m to 10m
    • Adjusted memory limits and requests for various services
    • Increased memory limits in some services to enhance performance
    • Simplified resource allocation configurations
  • Node Scheduling

    • Updated node affinity configurations
    • Shifted node selection criteria from general-purpose to control-plane
    • Simplified node selector terms across services, focusing on specific roles

Copy link

coderabbitai bot commented Jan 15, 2025

📝 Walkthrough

Walkthrough

This pull request involves systematic updates to Kubernetes stage configuration files across multiple services. The changes primarily focus on resource allocation and node affinity settings. Consistently across different service configurations, CPU and memory resource limits have been adjusted, with some reduced and others increased, while node affinity rules have been simplified. The modifications shift node selection criteria from node-type to role, specifically targeting control-plane nodes, indicating a strategic reconfiguration of deployment scheduling parameters.

Changes

File Changes
k8s/calibrate/values-stage.yaml - CPU limit: 50m → 10m
- Memory limit: 80Mi → 100Mi
- CPU request: 5m → 1m
- Node affinity: Removed preferredDuringSchedulingIgnoredDuringExecution, added nodeSelectorTerms for role: control-plane
k8s/docs/values-stage.yaml - CPU limit: 50m → 10m
- CPU request: 10m → 1m
- Node affinity: Simplified to nodeSelectorTerms with role: control-plane
k8s/inventory/values-stage.yaml - CPU request: 5m → 10m
- Memory request: 20Mi → 100Mi
- Memory limit: 80Mi → 150Mi
- Node affinity: Updated to role: control-plane
k8s/netmanager/values-stage.yaml - CPU request: 5m → 20m
- CPU limit: 50m → 100m
- Memory limit: 80Mi → 200Mi
- Node affinity: Updated to role: control-plane
k8s/platform/values-stage.yaml - CPU request: 100m → 20m
- Memory request: 250Mi → 150Mi
- Memory limit: 700Mi → 350Mi
- Node affinity: Updated to role: control-plane
k8s/reports/values-stage.yaml - CPU request: 5m → 10m
- Memory request: 20Mi → 150Mi
- Memory limit: 80Mi → 200Mi
- Node affinity: Updated to role: moderate-usage

Sequence Diagram

sequenceDiagram
    participant Scheduler
    participant Node
    Scheduler->>Node: Check node role
    alt Control Plane Node
        Scheduler->>Node: Schedule Pod
    else Non-Control Plane Node
        Scheduler-->>Node: Skip Scheduling
    end
Loading

Suggested Reviewers

  • sserurich
  • Psalmz777
  • Baalmart

Poem

🌐 Kubernetes Dance 🕺
Resources trimmed, nodes aligned,
Control plane nodes now refined,
CPU limits take a bow,
Scheduling magic happens now!
🚀 Deploy with grace, deploy with might! 🌟


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0dc979 and b8207af.

📒 Files selected for processing (4)
  • k8s/inventory/values-stage.yaml (1 hunks)
  • k8s/netmanager/values-stage.yaml (1 hunks)
  • k8s/platform/values-stage.yaml (2 hunks)
  • k8s/reports/values-stage.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • k8s/inventory/values-stage.yaml
  • k8s/reports/values-stage.yaml
  • k8s/netmanager/values-stage.yaml
🔇 Additional comments (2)
k8s/platform/values-stage.yaml (2)

37-42: Reconsider scheduling workloads on control-plane nodes.

The current node affinity configuration targets control-plane nodes, which is generally not recommended because:

  1. Control-plane nodes should be reserved for critical cluster components
  2. This practice could impact cluster stability and scalability
  3. Worker nodes are better suited for application workloads

Consider using dedicated worker nodes instead:

  nodeSelectorTerms:
    - matchExpressions:
        - key: role
          operator: In
          values:
-           - control-plane
+           - worker

Let's check if this is a pattern across other services:


23-26: Review resource allocation settings carefully.

The current resource settings raise a few concerns:

  1. The CPU request of 20m (0.02 cores) seems extremely low for a Next.js frontend application, which could lead to poor performance under load.
  2. The memory limit of 350Mi might be restrictive for a Next.js application, especially with SSR workloads.
  3. The large gap between limits and requests (10x for CPU) might cause resource contention issues.

Let's verify the application's resource usage patterns:


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
k8s/reports/values-stage.yaml (1)

32-37: Verify control-plane node selection strategy

Moving all services to target control-plane nodes could lead to resource contention. Control plane nodes typically run critical cluster components and should be protected from excessive workload.

Consider:

  1. Using dedicated worker nodes for application workloads
  2. Implementing proper taints/tolerations if control-plane nodes are intended for specific workloads
  3. Setting up node anti-affinity rules to ensure better pod distribution
k8s/docs/values-stage.yaml (1)

22-26: Align resource settings with service type

Documentation services typically have predictable resource usage patterns. The current settings might be too restrictive:

  • CPU request of 1m is extremely low
  • Memory request of 20Mi might be insufficient for serving documentation assets

Consider setting resource requests based on p90 usage metrics to ensure stable performance.

k8s/calibrate/values-stage.yaml (1)

22-26: Review overall resource management strategy

The current configuration shows a pattern of minimal resource allocation across services. While this might optimize resource usage, it could impact service reliability.

Key considerations:

  1. The CPU request of 1m might cause scheduling issues
  2. All services targeting control-plane nodes could create a single point of failure

Recommendations:

  1. Implement proper resource monitoring to establish baseline requirements
  2. Consider using node labels for workload distribution instead of targeting control-plane nodes
  3. Document the reasoning behind these resource constraints for future reference

Also applies to: 36-41

k8s/platform/values-stage.yaml (1)

37-42: Reconsider scheduling workloads on control-plane nodes.

The node affinity configuration has been changed to specifically target control-plane nodes. This approach raises some architectural concerns:

  1. Control-plane nodes are critical for cluster management and should ideally be dedicated to these tasks
  2. Running application workloads on control-plane nodes could:
    • Impact cluster stability
    • Affect cluster management operations
    • Pose potential security risks

Consider:

  1. Using dedicated worker nodes for application workloads
  2. If resource constraints are driving this decision, explore:
    • Adding more worker nodes
    • Using node pools with appropriate sizing
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7485e58 and c0dc979.

📒 Files selected for processing (6)
  • k8s/calibrate/values-stage.yaml (2 hunks)
  • k8s/docs/values-stage.yaml (2 hunks)
  • k8s/inventory/values-stage.yaml (1 hunks)
  • k8s/netmanager/values-stage.yaml (2 hunks)
  • k8s/platform/values-stage.yaml (2 hunks)
  • k8s/reports/values-stage.yaml (1 hunks)
🔇 Additional comments (3)
k8s/reports/values-stage.yaml (1)

20-24: Review resource allocation settings

The CPU request of 1m (0.001 cores) seems extremely low for a reports service. While this allows for higher pod density, it might lead to CPU starvation under load. Consider setting more realistic CPU requests based on actual usage patterns.

Let's verify the historical CPU usage:

k8s/netmanager/values-stage.yaml (1)

20-24: Validate frontend resource constraints

The memory limit of 80Mi for a frontend application seems tight. Modern web applications typically require more memory for client-side processing and caching.

Let's check for any OOM (Out of Memory) incidents:

k8s/platform/values-stage.yaml (1)

24-26: Verify the significant reduction in resource requests.

The CPU request has been reduced by 90% (100m → 10m) and memory by 78% (700Mi → 150Mi). While this optimization could improve resource utilization, such aggressive reductions warrant careful consideration:

  1. The CPU request (10m) is only 5% of its limit (200m), which might lead to CPU throttling under load
  2. Consider monitoring these metrics after deployment:
    • CPU throttling incidents
    • Memory usage patterns
    • Application response times

Would you like me to help generate a monitoring plan or suggest intermediate values for a more gradual reduction?

Comment on lines 20 to 24
cpu: 1m
memory: 100Mi
limits:
cpu: 50m
memory: 80Mi
cpu: 10m
memory: 150Mi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider horizontal scaling implications

With CPU limits set to 10m, the service might hit CPU throttling under load. The autoscaling configuration (maxReplicas: 2) might need adjustment to compensate for the tight resource limits.

Consider:

  1. Increasing CPU limits to at least 50m based on service requirements
  2. Adjusting HPA maxReplicas to handle traffic spikes
  3. Setting appropriate targetCPUUtilizationPercentage for smoother scaling

Copy link
Collaborator

@Baalmart Baalmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Benjamin. I will perform merge duty after Samantha submits her PR review (approval).

@Baalmart Baalmart merged commit 48131c5 into staging Jan 21, 2025
31 checks passed
@Baalmart Baalmart deleted the resource-utilisation branch January 21, 2025 06:32
@Baalmart Baalmart mentioned this pull request Jan 21, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants