Sustainable Scaling: Implementing Carbon-Aware Kubernetes with KEDA

As software engineers, we have spent decades optimizing for three main pillars: performance, cost, and reliability. However, a fourth pillar is rapidly becoming a requirement for modern infrastructure: sustainability. While cloud providers have made significant strides in Power Usage Effectiveness (PUE), the responsibility for how we consume that power lies with us.

The concept of 'Carbon-Aware Computing' is simple: do more when the grid is clean and do less when the grid is dirty. In a Kubernetes environment, this means moving beyond static scaling or simple CPU/Memory metrics. By integrating KEDA (Kubernetes Event-Driven Autoscaling) with the Green Software Foundation’s Carbon-Aware SDK, we can build workloads that automatically adjust their footprint based on the real-time carbon intensity of the local power grid.

Understanding Carbon Intensity

Before diving into the implementation, we must define our primary metric: Carbon Intensity. Measured in grams of CO2 equivalent per kilowatt-hour (gCO2eq/kWh), this metric fluctuates throughout the day based on the mix of energy sources powering the grid. On a sunny, windy afternoon, renewable penetration is high and intensity is low. When the sun sets and the wind dies down, the grid often pivots to coal or gas, and intensity spikes.

There are two main strategies for carbon awareness:

Spatial Shifting: Moving workloads to a region where the grid is currently cleaner.
Temporal Shifting: Delaying non-critical workloads to a time when the grid is cleaner.

This article focuses on temporal shifting within a Kubernetes cluster using KEDA.

The Architecture: KEDA and the Carbon-Aware SDK

KEDA is a single-purpose event-driven autoscaler for Kubernetes. While it is famous for scaling based on RabbitMQ depth or Prometheus queries, its true power lies in its extensibility through the External Scaler interface.

The Carbon-Aware SDK, provided by the Green Software Foundation, acts as a standardized wrapper for various carbon intelligence APIs (like WattTime or Electricity Maps). By combining these two, we create a feedback loop where the carbon intensity of your cluster's region becomes a first-class scaling metric.

The Workflow

The SDK fetches real-time and forecast data from a carbon telemetry provider.
An External Scaler service (which we implement or deploy) consumes this SDK and exposes a gRPC interface for KEDA.
KEDA polls the External Scaler to determine the target scale for a specific deployment.
The Horizontal Pod Autoscaler (HPA) adjusts the pod count based on the carbon threshold defined in our ScaledObject.

Implementation Guide: Building the Carbon-Aware Scaler

To implement this, we need to bridge the gap between the Carbon-Aware SDK and KEDA's gRPC requirements.

1. Setting up the Carbon-Aware SDK

The SDK can be run as a standalone Web API or integrated directly into a custom scaler. For most production environments, running the SDK as a sidecar or a central service within the cluster is preferred.

You will need an API key from a provider like WattTime. The SDK configuration typically looks like this in a carbon-aware-settings.json:

{
  "CarbonAwareVars": {
    "CarbonIntensityDataSource": "WattTime",
    "WattTimeConfig": {
      "Username": "your_username",
      "Password": "your_password"
    }
  }
}

2. Defining the Scaling Logic

We don't just want to shut down everything when the grid is dirty. Instead, we categorize workloads. For a background data-processing job, we might set a hard threshold. If carbon intensity exceeds 400g/kWh, we scale to zero. If it's below 200g/kWh, we scale to 50 replicas.

In our external scaler (written in Go or C#), we implement the GetMetrics method. The logic follows this pseudocode:

func (s *Scaler) GetMetrics(ctx context.Context, req *pb.GetMetricsRequest) (*pb.GetMetricsResponse, error) {
    currentIntensity := sdk.GetCurrentIntensity("eastus")
    threshold := req.ScaledObjectMetadata["carbonThreshold"]
    
    if currentIntensity > threshold {
        return &pb.GetMetricsResponse{MetricValues: 0}, nil
    } 
    return &pb.GetMetricsResponse{MetricValues: 100}, nil
}

3. Configuring the ScaledObject

With the scaler running, we define a ScaledObject in Kubernetes. This CRD (Custom Resource Definition) tells KEDA which deployment to scale and how to talk to our carbon scaler.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: carbon-aware-worker
  namespace: processing
spec:
  scaleTargetRef:
    name: data-worker-deployment
  minReplicaCount: 0
  maxReplicaCount: 50
  triggers:
    - type: external
      metadata:
        scalerAddress: carbon-scaler-service.keda.svc.cluster.local:50051
        carbonThreshold: "350"
        location: "westeurope"

Real-World Scenario: The Batch Processing Pipeline

Consider a financial services company that runs massive risk-assessment simulations every night. These simulations are computationally expensive but not time-critical; they just need to be finished by 8:00 AM.

Traditionally, these would run on a cron job at 1:00 AM. However, if a cold snap hits and the local grid fires up coal plants to meet heating demand, that 1:00 AM window becomes carbon-intensive.

By using KEDA and the Carbon-Aware SDK, the pipeline becomes "elastic" to the grid. As the SDK detects a dip in carbon intensity (perhaps a surge in wind power at 3:00 AM), KEDA ramps up the maxReplicaCount. If the intensity spikes, KEDA throttles the workload. This ensures the job is completed by the deadline while minimizing the total CO2 emissions produced by the compute cycle.

Strategic Trade-offs and Considerations

Implementing carbon-aware scheduling is not without its challenges. Senior engineers must weigh several factors before rolling this out to production.

SLA Management

Scaling to zero is great for the planet but potentially disastrous for business SLAs. For customer-facing APIs, you should never scale to zero based on carbon. Instead, use carbon intensity to adjust the concurrency or background task frequency. For example, you might disable non-essential features (like generating high-res image previews) when the grid is dirty, while keeping the core API functional.

The "Rebound" Effect

If every company in a region uses the same carbon-aware logic, we risk creating a new peak in demand when the grid becomes clean. This is known as the rebound effect. To mitigate this, introduce "jitter" or randomized offsets in your scaling thresholds so that workloads across the cluster don't all burst at the exact same second.

Data Latency and Accuracy

Carbon intensity data is often delayed by 5–15 minutes. Furthermore, forecast data is just that—a forecast. Your system must be resilient to API failures from the telemetry provider. If the SDK cannot reach WattTime, the scaler should fail-safe to a default "business-as-usual" state rather than scaling everything to zero.

Moving Toward a GreenOps Culture

Integrating KEDA with carbon metrics is a technical solution, but it requires a cultural shift toward "GreenOps." This involves:

Visibility: Exporting carbon intensity and pod scaling metrics to dashboards (Grafana) so teams can see the impact of their scheduling choices.
Chargeback Models: Incorporating carbon footprints into internal cloud billing, incentivizing teams to write more efficient code and utilize temporal shifting.
Optimization First: Carbon awareness is the second step. The first step is always efficiency. No amount of carbon-aware scheduling justifies bloated, unoptimized code.

Conclusion: Actionable Steps

Transitioning to carbon-aware infrastructure doesn't have to be an all-or-nothing migration. Start small and iterate:

Identify Candidates: Look for asynchronous, non-time-critical workloads (CI/CD runners, batch processing, data training) that can tolerate delays.
Deploy the SDK: Set up the Carbon-Aware SDK in a dev environment and start logging the intensity of your primary cloud regions.
Implement a Pilot Scaler: Use KEDA’s external scaler to manage a single, low-risk deployment. Set a conservative threshold to observe how the workload fluctuates.
Measure and Report: Calculate the estimated CO2 savings by comparing the actual run-time intensity against the average grid intensity for that 24-hour period.

By turning the carbon intensity of the grid into a programmable variable, we move from being passive consumers of energy to active, responsible participants in the energy ecosystem. It is a rare opportunity where technical excellence and environmental stewardship align perfectly.