Resource Limits

Configure resource limits and requests for optimal OnCallM performance and reliability.

Default Resource Configuration

yaml

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Sizing Guidelines

Small Deployment (< 100 alerts/day)

yaml

resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "256Mi"
    cpu: "250m"

Characteristics:

Development or staging environments
Low alert volume
Single replica sufficient

Medium Deployment (100-1000 alerts/day)

yaml

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Characteristics:

Production environments
Moderate alert volume
2-3 replicas recommended

Large Deployment (> 1000 alerts/day)

yaml

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

Characteristics:

High-volume production
Multiple clusters
3+ replicas with auto-scaling

Memory Requirements

Base Memory Usage

Base OnCallM process: ~50MB
FastAPI framework: ~30MB
Python runtime: ~40MB
AI processing buffer: ~100MB
Total baseline: ~220MB

Per-Alert Memory

Alert processing: ~2MB per alert
AI analysis: ~5MB per alert
Report generation: ~1MB per alert
Queue overhead: ~0.5MB per alert
Total per alert: ~8.5MB

Memory Calculation

bash

# Formula
Total Memory = Base Memory + (Concurrent Alerts × Per-Alert Memory)

# Example: 20 concurrent alerts
Total Memory = 220MB + (20 × 8.5MB) = 390MB
Recommended limit = 390MB × 1.3 (buffer) = 507MB ≈ 512MB

CPU Requirements

CPU Usage Patterns

Webhook processing: Low CPU (10-20%)
Data collection: Medium CPU (30-50%)
AI analysis: Variable CPU (20-80%)
Report generation: Low CPU (10-30%)

CPU Calculation

bash

# Base CPU usage
Base CPU: 50m (0.05 cores)

# Per concurrent alert
Alert processing: 15m per alert

# Example: 10 concurrent alerts
Total CPU = 50m + (10 × 15m) = 200m
Recommended limit = 200m × 2 (burst) = 400m

Helm Configuration

values.yaml

yaml

oncallm:
  replicaCount: 2
  
  resources:
    requests:
      memory: "256Mi"
      cpu: "250m"
    limits:
      memory: "512Mi"
      cpu: "500m"
  
  # Auto-scaling configuration
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70
    targetMemoryUtilizationPercentage: 80

Override for Production

bash

helm install oncallm ./charts/oncallm \
  --set resources.requests.memory=512Mi \
  --set resources.requests.cpu=500m \
  --set resources.limits.memory=1Gi \
  --set resources.limits.cpu=1000m \
  --set replicaCount=3

Kubernetes Deployment

Complete Deployment Example

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: oncallm
  labels:
    app: oncallm
spec:
  replicas: 2
  selector:
    matchLabels:
      app: oncallm
  template:
    metadata:
      labels:
        app: oncallm
    spec:
      containers:
      - name: oncallm
        image: oncallm/oncallm:latest
        ports:
        - containerPort: 8001
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: oncallm-secrets
              key: OPENAI_API_KEY
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8001
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8001
          initialDelaySeconds: 5
          periodSeconds: 5

Auto-scaling Configuration

Horizontal Pod Autoscaler

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: oncallm-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: oncallm
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60

Vertical Pod Autoscaler

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: oncallm-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: oncallm
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: oncallm
      maxAllowed:
        cpu: 2000m
        memory: 2Gi
      minAllowed:
        cpu: 100m
        memory: 128Mi

Resource Monitoring

Prometheus Metrics

yaml

# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: oncallm-metrics
spec:
  selector:
    matchLabels:
      app: oncallm
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

Key Metrics to Monitor

promql

# CPU usage
rate(container_cpu_usage_seconds_total{pod=~"oncallm.*"}[5m])

# Memory usage
container_memory_usage_bytes{pod=~"oncallm.*"}

# Memory limits
container_spec_memory_limit_bytes{pod=~"oncallm.*"}

# Queue size
oncallm_alert_queue_size

# Processing time
oncallm_alert_processing_duration_seconds

Alerting Rules

yaml

groups:
- name: oncallm-resources
  rules:
  - alert: OnCallMHighMemoryUsage
    expr: |
      (container_memory_usage_bytes{pod=~"oncallm.*"} / 
       container_spec_memory_limit_bytes{pod=~"oncallm.*"}) > 0.85
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "OnCallM memory usage is high"
      description: "Pod &#123;&#123; $labels.pod &#125;&#125; memory usage is &#123;&#123; $value | humanizePercentage &#125;&#125;"

  - alert: OnCallMHighCPUUsage
    expr: |
      rate(container_cpu_usage_seconds_total{pod=~"oncallm.*"}[5m]) > 0.8
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "OnCallM CPU usage is high"
      description: "Pod &#123;&#123; $labels.pod &#125;&#125; CPU usage is &#123;&#123; $value | humanize &#125;&#125;"

Performance Tuning

Worker Thread Configuration

python

# Environment variables
WORKER_THREADS = int(os.getenv("WORKER_THREADS", "10"))

# Rule of thumb: 2-5 threads per CPU core
# For 500m CPU (0.5 cores): 1-3 threads
# For 1000m CPU (1 core): 2-5 threads

Queue Size Limits

python

# Prevent memory exhaustion
MAX_QUEUE_SIZE = int(os.getenv("MAX_QUEUE_SIZE", "100"))

if queue.qsize() > MAX_QUEUE_SIZE:
    raise HTTPException(status_code=503, detail="Queue full")

AI API Rate Limiting

python

# OpenAI rate limits
OPENAI_RPM = int(os.getenv("OPENAI_RPM", "60"))  # Requests per minute
OPENAI_TPM = int(os.getenv("OPENAI_TPM", "60000"))  # Tokens per minute

# Implement rate limiting
@rate_limit(requests_per_minute=OPENAI_RPM)
def call_openai_api(prompt):
    # API call implementation
    pass

Troubleshooting

Common Resource Issues

OOMKilled pods?

bash

# Check memory usage
kubectl top pods -l app=oncallm

# Check events
kubectl get events --field-selector reason=OOMKilling

# Increase memory limits
helm upgrade oncallm ./charts/oncallm \
  --set resources.limits.memory=1Gi

CPU throttling?

bash

# Check CPU throttling metrics
kubectl exec -it oncallm-pod -- cat /sys/fs/cgroup/cpu/cpu.stat

# Increase CPU limits
helm upgrade oncallm ./charts/oncallm \
  --set resources.limits.cpu=1000m

Slow response times?

bash

# Check queue size
curl http://oncallm:8001/health | jq .queue_size

# Scale horizontally
kubectl scale deployment oncallm --replicas=3

Resource Optimization

bash

# Monitor resource usage over time
kubectl top pods -l app=oncallm --containers=true

# Analyze resource utilization
kubectl describe hpa oncallm-hpa

# Review VPA recommendations
kubectl describe vpa oncallm-vpa

Resource Limits

Default Resource Configuration

Sizing Guidelines

Small Deployment (< 100 alerts/day)

Medium Deployment (100-1000 alerts/day)

Large Deployment (> 1000 alerts/day)

Memory Requirements

Base Memory Usage

Per-Alert Memory

Memory Calculation

CPU Requirements

CPU Usage Patterns

CPU Calculation

Helm Configuration

values.yaml

Override for Production

Kubernetes Deployment

Complete Deployment Example

Auto-scaling Configuration

Horizontal Pod Autoscaler

Vertical Pod Autoscaler

Resource Monitoring

Prometheus Metrics

Key Metrics to Monitor

Alerting Rules

Performance Tuning

Worker Thread Configuration

Queue Size Limits

AI API Rate Limiting

Troubleshooting

Common Resource Issues

Resource Optimization

Best Practices

Resource Planning

Cost Optimization

Reliability

Next Steps

Resource Limits ​

Default Resource Configuration ​

Sizing Guidelines ​

Small Deployment (< 100 alerts/day) ​

Medium Deployment (100-1000 alerts/day) ​

Large Deployment (> 1000 alerts/day) ​

Memory Requirements ​

Base Memory Usage ​

Per-Alert Memory ​

Memory Calculation ​

CPU Requirements ​

CPU Usage Patterns ​

CPU Calculation ​

Helm Configuration ​

values.yaml ​

Override for Production ​

Kubernetes Deployment ​

Complete Deployment Example ​

Auto-scaling Configuration ​

Horizontal Pod Autoscaler ​

Vertical Pod Autoscaler ​

Resource Monitoring ​

Prometheus Metrics ​

Key Metrics to Monitor ​

Alerting Rules ​

Performance Tuning ​

Worker Thread Configuration ​

Queue Size Limits ​

AI API Rate Limiting ​

Troubleshooting ​

Common Resource Issues ​

Resource Optimization ​

Best Practices ​

Resource Planning ​

Cost Optimization ​

Reliability ​

Next Steps ​

Resource Limits

Default Resource Configuration

Sizing Guidelines

Small Deployment (< 100 alerts/day)

Medium Deployment (100-1000 alerts/day)

Large Deployment (> 1000 alerts/day)

Memory Requirements

Base Memory Usage

Per-Alert Memory

Memory Calculation

CPU Requirements

CPU Usage Patterns

CPU Calculation

Helm Configuration

values.yaml

Override for Production

Kubernetes Deployment

Complete Deployment Example

Auto-scaling Configuration

Horizontal Pod Autoscaler

Vertical Pod Autoscaler

Resource Monitoring

Prometheus Metrics

Key Metrics to Monitor

Alerting Rules

Performance Tuning

Worker Thread Configuration

Queue Size Limits

AI API Rate Limiting

Troubleshooting

Common Resource Issues

Resource Optimization

Best Practices

Resource Planning

Cost Optimization

Reliability

Next Steps