From Exploit Code to Production Detection: Building a CVE-2026-31431 (Copy Fail) detection with Agents

Key points

CVE-2026-31431 (CVSS 7.8 HIGH) allows any unprivileged local user to corrupt the page caches via AF_ALG sockets and the authencesn AEAD template to escalate privileges to execute code as root.
The vulnerability affects a wide range of kernel versions, from 4.14 through 6.19 and 7.0 release candidates. This is not limited to recent kernels.
The corruption happens entirely within kernel space: no VFS writes, no mtime changes, no audit trail. It avoids normal file-write visibility.
CISA added this to the Known Exploited Vulnerabilities (KEV) catalog on May 1, 2026. Active exploitation in the wild has since been confirmed.
This post walks through the exploit mechanics and looks back at how Datadog Security Research used coding agents to compress the full detection engineering cycle into a single session, from initial threat analysis to shipped detections, in hours.

Background: Page cache, AF_ALG, and splice()

Before we examine the exploit in detail, let's establish three kernel concepts it relies on. If you're already familiar with these, feel free to jump to how the exploit works.

The Linux kernel uses the page cache to hold file-backed data in memory. When a file is first read from disk, its contents are loaded into page cache pages indexed by inode and offset. Any later read requests, originating from either the initial process or a separate one, are fulfilled using memory, bypassing the disk entirely.

Execution works the same way. When the kernel runs /usr/bin/su, it memory-maps the binary's page cache pages directly into the process address space. The CPU executes from those in-memory pages, not from disk.

AF_ALG: The Kernel Crypto API

The Linux kernel exposes its internal cryptography operations to userspace through AF_ALG (Address Family: Algorithm) sockets. A program creates an AF_ALG socket, binds it to a specific algorithm, configures it with setsockopt(), and sends data for encryption or decryption.

The exploit focuses on the authencesn algorithm template which manages sequence numbers while integrating AES-CBC encryption and HMAC-SHA256 authentication.

Importantly, creating and using AF_ALG sockets requires no special privileges. Any unprivileged user can access the kernel's crypto subsystem.

splice(): Zero-copy data movement

splice() facilitates zero-copy data transfers between file descriptors by bypassing userspace. When a file is spliced into a pipe, the kernel avoids copying data and instead maps the actual page cache pages into the pipe. As a result, the pipe establishes direct links to the identical memory pages that the source file occupies.

This zero-copy optimization is key to the exploit: after splicing, the crypto subsystem holds references to the actual page cache pages of the target file, not copies.

How the exploit works, step by step

The exploit chains together the three mechanisms above to achieve a controlled write into the page cache of any file readable by the attacker. Let's walk through each stage.

Stage 1: Create and configure the crypto socket

The exploit begins by creating an AF_ALG socket. AF_ALG is the Linux interface that lets userspace programs ask the kernel to perform cryptographic operations. In this case, the exploit uses the AEAD algorithm authencesn(hmac(sha256),cbc(aes)).

This setup does not require root privileges. An unprivileged process can create the socket, bind it to the algorithm, configure the required options, and call accept() to create an operation socket. That operation socket is what the exploit uses for the later decrypt request.


import socket

ctrl_sock = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0)
ctrl_sock.bind(("aead", "authencesn(hmac(sha256),cbc(aes))"))

key = get_valid_authencesn_key()
ctrl_sock.setsockopt(socket.SOL_ALG, socket.ALG_SET_KEY, key)

authsize = 4
ctrl_sock.setsockopt(
   socket.SOL_ALG,
   socket.ALG_SET_AEAD_AUTHSIZE,
   authsize,
)
op_sock, _ = ctrl_sock.accept()

Stage 2: Open the target and splice into the crypto pipeline

Next, the attacker opens a readable target file, such as /usr/bin/su, and creates a pipe. The goal is not to write the file to the filesystem, instead make the file's page cache pages become part of the input to a kernel crypto operation.

The exploit prepares a crafted AEAD decrypt request. That request includes normal AF_ALG metadata such as the operation type, IV, and associated data. The data matters because bytes from that attacker-controlled region later influence what gets written during the vulnerable decrypt path.

The first splice() attaches file-backed page cache pages to the pipe. The second splice() passes those same pipe buffers into the AF_ALG operation socket. Since splice() keeps the data path inside the kernel, the file contents are not copied through userspace and the target file is never opened for writing.

import os

target_fd = os.open("/usr/bin/su", os.O_RDONLY)
pipe_rd, pipe_wr = os.pipe()
offset = 0
splice_len = 4096

os.splice(
   target_fd,
   pipe_wr,
   splice_len,
   offset_src=offset,
)

os.splice(
   pipe_rd,
   op_sock.fileno(),
   splice_len,
)

At this point, the decrypt operation is working with buffers backed by the same cached pages the kernel would normally use when reading or executing /usr/bin/su.

Stage 3: Trigger the corruption

The corruption happens when the kernel processes the crafted AEAD decrypt operation. Because of the vulnerability in the authencesn decrypt path, part of that state can be written back through the decrypt operation's input buffer writing to the cached contents of the file, in this case /usr/bin/su, without modifying the file on-disk. By repeating the operation at selected memory offsets, the exploit can patch small chunks of the cached target binary or file.

Stage 4: Execute the corrupted binary

Since the modification exists only in the page cache, the attacker-controlled code can run with root privileges. For a setuid-root target like /usr/bin/su, when the exploit is successful, it allows for the execution of attacker controlled code as root.

Why this is impactful

It's unprivileged. No special capabilities, no root access, no container escape prerequisite. Any user can create AF_ALG sockets.

It avoids normal file-write visibility. Metadata-based file integrity monitoring is blind to this technique because the on-disk file and inode metadata are unchanged. Hash-based monitoring may detect it, but only if it reads the file while the corrupted page cache pages are still resident.

It's deterministic. Unlike many kernel exploits that rely on race conditions or heap layout, Copy Fail is a controlled, repeatable write. The attacker can overwrite specific bytes at specific offsets with specific values. Public proof-of-concept exploits are widely available.

It targets more than setuid binaries. A Rust-based exploit variant also tampers PAM configuration files (/etc/pam.d/*) to bypass authentication entirely. In principle, any readable file whose page cache pages can be spliced into the crypto pipeline is a valid target.

Container implications

Copy Fail can cross container boundaries when the targeted file is backed by the same underlying file object outside the container. Because the page cache links to the underlying file object instead of a container's mount namespace, environments using the same file backing can share the same cached physical pages.

A container escape may be possible, but it depends on several conditions lining up. Copy Fail lets an unprivileged process corrupt cached pages for files it can read, but escaping the container requires those corrupted pages to be reused across a more privileged boundary such as by another workload or host process that later reads or executes the affected file.

Unlike many container escape techniques, this path does not require a privileged container, a writable host mount, or added Linux capabilities. It requires an unprivileged process with access to the vulnerable kernel interface and a target file whose cached pages are shared across the relevant boundary, and a more privileged runtime that uses those pages while they remain corrupted.

Restricting socket creation through updated AppArmor, SELinux, or SecComp profiles can effectively shrink the attack surface of a compromised container. Since the majority of containers have no legitimate need for the AF_ALG socket interface, blocking this capability serves as a practical defensive measure.

Detecting CVE-2026-31431 with Datadog Workload Protection

The corruption is difficult to detect from userspace because it does not follow the normal file-write path. However, the behavioral sequence that precedes the corruption is highly distinctive: an unprivileged process binding an AF_ALG socket, configuring it with SOL_ALG setsockopt calls, and then accessing setuid binaries or sensitive configuration files.

We can detect this sequence using Datadog Workload Protection's chained detection rules, which are agent-side state machines that track multi-step attack patterns within a single process.

The detection chain

The content pack implements a three-stage state machine using process-scoped variables:

Stage 0: bind(AF_ALG)           → silent, sets chain="1" (30s TTL)
Stage 1: setsockopt(SOL_ALG)    → silent, sets chain="2" (30s TTL)
Stage 2a: splice(S_ISUID file)  → detection (critical), sets cooldown
Stage 2b: open(system file)     → detection (high), sets cooldown

Stage 0 arms the chain when an unprivileged process binds an AF_ALG socket. This rule is silent as it only sets the process-scoped variable without generating an event:

expression: >-
  bind.addr.family == AF_ALG &&
  bind.retval == 0 &&
  process.euid != 0 &&
  ${process.af_alg_splice_cooldown} != "1"
actions:
  - set:
      name: af_alg_splice_chain
      scope: process
      value: "1"
      ttl: 30s

Stage 1 advances the chain when the same process calls setsockopt with SOL_ALG (level 279) also without generating an event:

expression: >-
  setsockopt.socket_family == AF_ALG &&
  setsockopt.level == 279 &&
  process.euid != 0 &&
  ${process.af_alg_splice_chain} == "1"
actions:
  - set:
      name: af_alg_splice_chain
      scope: process
      value: "2"
      ttl: 30s

Stage 2a fires the detection when a setuid binary is spliced after the full chain setup. Rather than maintaining a hardcoded list of binaries, we use a generic file mode to check for any SUID:

expression: >-
  splice.file.mode & S_ISUID > 0 &&
  process.euid != 0 &&
  ${process.af_alg_splice_chain} == "2"

Stage 2b provides broader coverage, catching read-only opens of system binaries and PAM configurations covering the authentication bypass variant:

expression: >-
  open.file.path in [~"/usr/bin/*", ~"/usr/sbin/*", ~"/bin/*", ~"/sbin/*",
  ~"/usr/libexec/*", ~"/etc/pam.d/*", ~"/etc/security/*", "/etc/passwd"] &&
  (open.flags & O_ACCMODE) == O_RDONLY &&
  process.euid != 0 &&
  ${process.af_alg_splice_chain} == "2"

Rate limiting

The exploit performs many rapid splices per run. Without rate limiting, a single exploitation attempt would generate dozens of agent events so the chained detection implements a dual-variable cooldown.

Accelerating the Detection Engineering Lifecycle with agents

One aspect of our response to Copy Fail that we wanted to highlight is the role coding agents played in accelerating the detection engineering lifecycle. The entire process, from initial threat analysis through iterative testing to a release of a multi-rule content pack, happened in a single continuous session.

The agent's value wasn't in any single step. It was in maintaining context across rapid iteration cycles through the use of agentic skills:

Threat analysis: Our standard response to emerging threats includes analyzing behavioral Tactics, Techniques, and Procedures (TTPs) mapped to MITRE ATT&CK using security advisories. To streamline this, we developed a custom skill that processes reports and advisories to assess existing detection coverage and identify gaps.
Linux System Expert: The next part of our process uses a Linux Expert skill that provides deep technical insights into kernel events and syscalls used in exploits and vulnerabilities.
Security agent capabilities: By utilizing the technical details from the preceding threat analysis and Linux system expert phases, we can identify necessary Linux kernel and syscall attributes and helpers available within our Datadog Security Agent.
Rule ideation: Based on analysis within the prior 3 previous stages, our next stage generates SECL expression logic, Datadog's Security Expression Configuration Language, to capture specific exploit activity.
Prototyping with Terraform: The detection logic provided in our previous stage is pushed to a dedicated detonation account using the Datadog Terraform provider, iterating on detection logic in minutes. The agent systematically worked through API type constraints, provider deprecations, and schema discovery; all of which are problems that would have required significant documentation diving.
Live testing against exploits: Running the exploit against deployed rules revealed issues static analysis couldn't catch: uninitialized scoped variables that prevented the chain from firing, missing rate limiting that produced 11 signals per run, and intermediate stages generating unwanted events. Each issue was diagnosed and fixed within the same context.
Variant analysis: When a Rust-based exploit source code arrived, the agent parsed the implementation, identified the exact syscall sequence, and found that splice.file.path might not resolve to the target binary, which led to the S_ISUID file mode check and the open-based detection path for PAM bypass coverage.
Production packaging: Converting from Terraform prototypes to the repository's content pack format required consistent metadata across four agent rules, a backend rule with four signal cases, CI configuration for 16 agent version matrices, and documentation.

The process of engineering detections is fundamentally cyclical. Since initial rule iterations are rarely perfect, rapidly narrowing the interval between deploying a rule and witnessing actual exploit activity is vital for effective results. By managing context and accelerating these iteration cycles, agents grant security engineers the ability to focus on high-impact objectives: establishing detection parameters, assigning severity, and navigating complex technical trade-offs.

How Datadog can help

Enabling the content pack

To enable the CVE-2026-31431 Copy Fail Exploit Content Pack, navigate to Security > Workload Protection Overview in your Datadog account and enable the CVE-2026-31431 Copy Fail Exploit content pack. The rules require Datadog Agent v7.68 or greater.

Hunting for exploitation with Cloud SIEM

If you're a Cloud SIEM customer, you can use Log Management to hunt for indicators of Copy Fail exploitation attempts across your environment.

Searching for AF_ALG kernel module loads (requires auditd or syslog forwarding):

(source:syslog OR linux-audit-logs) "algif_aead"

Module loads of algif_aead on systems that don't normally perform kernel-level crypto operations may indicate exploitation preparation. Legitimate uses of AF_ALG are rare outside of specialized applications like IPsec or disk encryption.

Scoping exposure with CSM Vulnerabilities

CSM Vulnerabilities can identify hosts and containers in your environment running kernel versions affected by CVE-2026-31431:

@advisory.cve:CVE-2026-31431 @status:(open OR in_progress)

Given the broad range of affected versions (4.14 through 6.19 and 7.0 release candidates), most Linux systems in production are potentially affected. Cross-reference with the version table above to prioritize patching.

Conclusion

CVE-2026-31431 represents a particularly challenging class of vulnerability: a deterministic, unprivileged kernel exploit that corrupts shared memory invisible to traditional monitoring. We recommend:

Patch affected kernels as your highest priority. This is the only complete mitigation.
Disable algif_aead on systems that cannot be immediately patched.
Enable the Workload Protection content pack for real-time detection of the exploitation sequence.
Hunt retroactively using the Cloud SIEM queries above for signs of prior exploitation.
Audit your fleet using CSM Vulnerabilities to identify all affected hosts and containers.