Carbon-Aware Scheduling: Using Kubernetes and the GSF SDK
Most of our engineering efforts focus on optimizing for performance, cost, or reliability. We obsess over millisecond latencies and high availability. However, there is a fourth pillar of modern architecture that is rapidly moving from a 'nice-to-have' to a core requirement: carbon efficiency.
Data centers currently account for approximately 1% to 1.5% of global electricity use. As software engineers, we have a unique lever to pull. By shifting non-urgent workloads—such as batch processing, model training, or data warehousing—to times when the local power grid is powered by renewable sources, we can significantly reduce the carbon intensity of our applications without impacting the end-user experience.
In this article, we will explore how to implement carbon-aware scheduling using the Green Software Foundation’s (GSF) Carbon-Aware SDK and Kubernetes.
Understanding Carbon Intensity
To build carbon-aware systems, we must first understand the metric we are optimizing for: Carbon Intensity.
Carbon intensity measures how many grams of Carbon Dioxide equivalent (gCO2eq) are emitted per kilowatt-hour (kWh) of electricity produced. This value fluctuates throughout the day based on the energy mix. On a windy, sunny afternoon, the grid might be flooded with wind and solar power (low carbon intensity). On a still evening, the grid might rely on coal or gas peaker plants (high carbon intensity).
There are two ways to respond to these fluctuations:
- Spatial Shifting: Moving a workload to a different geographic region where the grid is cleaner.
- Temporal Shifting: Delaying a workload until a time when the local grid is cleaner.
For most organizations, temporal shifting is the most practical starting point. It doesn't require complex multi-region data replication; it simply requires a smarter scheduler.
The Toolkit: GSF Carbon-Aware SDK
The Green Software Foundation has released the Carbon-Aware SDK, a standardized toolset that abstracts the complexity of fetching carbon data. Instead of writing custom integrations for various carbon data providers (like WattTime or Electricity Maps), the SDK provides a unified Web API and CLI.
The SDK allows you to query for:
- Current Carbon Intensity: What is the grid mix right now?
- Forecasted Carbon Intensity: When is the best time in the next 24 hours to run a 2-hour job?
- Best Region: Which region currently has the lowest intensity?
Architecting a Carbon-Aware Kubernetes Cluster
To make Kubernetes carbon-aware, we need to bridge the gap between the SDK's data and the Kubernetes API server. There are three primary patterns for implementation.
1. The Simple Approach: Carbon-Aware CronJobs
The easiest way to start is by modifying existing CronJobs. Instead of running a job at a fixed time (e.g., 02:00 AM), we can use a wrapper script that queries the Carbon-Aware SDK to decide whether to execute now or sleep.
# A conceptual wrapper for a batch job BEST_TIME=$(carbon-aware-sdk get-forecast --location "eastus" --window "2h") if [ "$CURRENT_TIME" == "$BEST_TIME" ]; then ./run-heavy-job.sh else echo "Waiting for lower carbon intensity..." sleep 3600 fi
While simple, this approach is inefficient because it consumes pod resources while waiting.
2. The Intermediate Approach: KEDA with Carbon Metrics
Kubernetes Event-Driven Autoscaling (KEDA) allows you to scale workloads based on external metrics. By creating a custom scaler or using the Prometheus scaler connected to the Carbon-Aware SDK, you can scale your worker deployments to zero when carbon intensity exceeds a certain threshold.
For example, you could configure a ScaledObject that only allows a worker-deployment to scale up when the carbon_intensity_gco2_per_kwh metric is below 300.
3. The Advanced Approach: A Custom Carbon-Aware Scheduler
For a truly native experience, you can implement a custom Kubernetes Scheduler or a Controller. This controller monitors a queue of "Carbon-Deferred" jobs.
When a user submits a job with a specific annotation (e.g., carbon-policy: deferrable), the controller intercepts the request. It then interacts with the Carbon-Aware SDK to identify the optimal time window within the user's defined deadline. Only when that window arrives does the controller set the template.spec.nodeName or allow the pod to be scheduled.
Practical Implementation: A Step-by-Step Guide
Let’s look at a concrete workflow for implementing temporal shifting for a data processing task.
Step 1: Deploy the Carbon-Aware SDK Web API
First, deploy the SDK as a service within your cluster. You will need an API key from a provider like WattTime.
apiVersion: apps/v1 kind: Deployment metadata: name: carbon-aware-api spec: template: spec: containers: - name: api image: ghcr.io/green-software-foundation/carbon-aware-sdk-webapi:latest env: - name: CarbonAwareVars__CarbonIntensityDataSource value: "WattTime" - name: CarbonAwareVars__WattTime__Username valueFrom: { secretKeyRef: { name: watttime, key: user } }
Step 2: Define the Workload Metadata
We need a way to tell our system which jobs are flexible. We can use Kubernetes annotations for this. A job that must finish by 8:00 AM but can start anytime would look like this:
apiVersion: batch/v1 kind: Job metadata: name: monthly-report-aggregator annotations: carbon-aware.scheduling/enabled: "true" carbon-aware.scheduling/deadline: "2023-12-01T08:00:00Z" carbon-aware.scheduling/estimated-runtime: "45m" spec: template: # ... container spec ...
Step 3: The Controller Logic
Your custom controller (written in Go or Python using Kopf) performs the following logic:
- Watch: Listen for Jobs with the
carbon-aware.scheduling/enabledannotation. - Evaluate: Query the SDK Web API:
GET /forecast/one-best?location=eastus&window=45&deadline=... - Schedule: The SDK returns the optimal start time. The controller then schedules a "start" event for that job.
Real-World Trade-offs and Considerations
Implementing carbon-aware scheduling isn't without its challenges. As senior engineers, we must weigh the benefits against the operational complexity.
Data Availability and Accuracy
Carbon intensity data is a forecast, not a guarantee. Grids are unpredictable. If the SDK predicts a low-carbon window at 3:00 AM, but a sudden drop in wind occurs, your job might actually run during a high-intensity period. It is important to build in a "must-run" buffer to ensure deadlines are met regardless of the carbon forecast.
Resource Contention
If every company in a region uses the same carbon-aware logic, we create a new problem: "The Carbon Peak." Everyone might try to start their jobs at the exact same minute when the wind picks up, potentially causing localized spikes in demand or driving up cloud spot instance prices. Introducing a small amount of jitter to your start times is a recommended best practice.
Cost vs. Carbon
Fortunately, carbon efficiency often aligns with cost efficiency. Low carbon intensity often correlates with lower demand on the grid, which sometimes translates to lower spot instance pricing in cloud environments. However, this isn't always a 1:1 relationship. You must decide if your priority is the absolute lowest carbon footprint or the lowest bill.
Measuring Success
You cannot manage what you do not measure. To prove the efficacy of your carbon-aware scheduling, you should export carbon metrics to your observability stack (Prometheus/Grafana).
Calculate your Carbon Savings:
Savings = (Baseline Intensity at Original Time - Actual Intensity at Shifted Time) * Energy Consumed
Visualizing this in a dashboard provides tangible evidence of the engineering team's contribution to corporate sustainability goals.
Conclusion
Carbon-aware scheduling is a shift in mindset. It moves us away from the idea that resources are infinite and always available at a static environmental cost. By utilizing the Carbon-Aware SDK and the orchestration power of Kubernetes, we can transform our infrastructure into a dynamic system that breathes with the power grid.
Actionable steps for your team:
- Identify: Audit your workloads. Which batch jobs, CI/CD pipelines, or data migrations are non-urgent?
- Experiment: Deploy the Carbon-Aware SDK in a dev environment and query the forecast for your primary cloud region.
- Implement: Start with a simple temporal shift for a single non-critical CronJob using a wrapper script.
- Scale: Move toward a controller-based model as your green-software maturity increases.
By building these capabilities today, we aren't just optimizing code; we are future-proofing our architecture for a carbon-constrained world.