June 3, 2026 · 9 min read · AGENTIC SECURITY · Part 5 of 5 — Agentic AI Security cluster
Memory Forensics for Compromised Agent Systems with Volatility3
When an agent process is compromised, disk artifacts vanish in seconds. Memory is the only forensic surface that persists. Here's how to capture and analyze it — before the process is killed.
TL;DR
ATLAS AML.T0047 (Establish Rogue ML Provider) involves agent processes loaded with compromised model artifacts. Memory forensics with Volatility3 extracts the exact artifacts — running processes, network connections, injected code regions, and environment variables including API keys — that disk forensics misses. Relevant D3FEND countermeasures: D3-MA (Memory Analysis) and D3-PSMD (Process Segment Metadata Detection). Maps to NIST AI RMF MEASURE-2.6 (evaluate AI system behavior under adversarial conditions) and NIST CSF RS.AN-03 (incident analysis).
Why agent processes are high-value forensic targets
An agent process at the moment of compromise holds more forensically relevant data than almost any other process type:
- API keys in environment variables — the agent's full
.envcontext is resident in memory as long as the process runs. A memory dump captures every key, even ones never written to disk. - The current conversation context — the full context window, including injected instructions from AML.T0054 (Prompt Injection), is in memory. This is the exact artifact that proves what the attacker instructed the agent to do.
- Loaded model weights / cached model artifacts — for self-hosted LLMs, the quantized model is loaded into GPU/CPU memory. Injected code or tampered weight regions leave forensic signatures in the memory map.
- Network connections — the current socket table shows every active connection the agent has open, including any command-and-control connections established post-compromise.
The Volatility3 workflow for agent processes
Step 1: Capture the memory dump
On Linux (most agent deployments run on GCP/AWS Linux instances), capture the process memory with gcore before killing the suspect process:
# Identify the suspect PID
ps aux | grep "python3.*agent" | grep -v grep
# Capture memory without killing the process
gcore -o /forensics/agent_$(date +%Y%m%d_%H%M%S).dump <PID>
# For a full system dump (captures all processes):
sudo dd if=/proc/kcore of=/forensics/mem_$(date +%Y%m%d_%H%M%S).raw bs=1M
# Verify dump integrity
sha256sum /forensics/agent_*.dump > /forensics/dump_hashes.txt
Hash the dump immediately — this is your evidence integrity record, aligned with NIST AI RMF MEASURE-2.6 (evidence-based evaluation).
Step 2: Extract running processes and network connections
# List all processes at time of dump
python3 vol.py -f /forensics/agent_dump.raw linux.pslist
python3 vol.py -f /forensics/agent_dump.raw linux.pstree
# Check network connections — look for unexpected external IPs
python3 vol.py -f /forensics/agent_dump.raw linux.netstat
# Cross-reference with known-good agent IPs
# Any connection to an IP not in the agent's allow list is an IOC
The linux.netstat output is your direct window into AML.T0047 post-exploitation: a compromised agent establishing a connection to an attacker-controlled IP appears here even if the connection was opened and closed before any disk artifact was written.
Step 3: Scan for injected code regions (malfind equivalent)
# Detect memory regions with RWX permissions (classic code injection indicator)
python3 vol.py -f /forensics/agent_dump.raw linux.malfind
# For each flagged region, dump the bytes for further analysis
python3 vol.py -f /forensics/agent_dump.raw linux.malfind --dump \
--output-dir /forensics/malfind_output/
# Scan dumped regions against YARA rules for known malware signatures
yara -r /opt/yara_rules/ai_agent_compromise.yar /forensics/malfind_output/
This addresses D3FEND D3-PSMD (Process Segment Metadata Detection) directly — the memory segment metadata (permissions, backing file, anonymous vs. file-backed) distinguishes legitimate agent code from injected shellcode.
Step 4: Recover API keys and conversation context from memory strings
# Extract all strings from the dump (filter for API key patterns)
python3 vol.py -f /forensics/agent_dump.raw linux.strings | \
grep -E "(sk-ant|sk_live_|bo_live_|ANTHROPIC|OPENAI)" \
> /forensics/extracted_keys.txt
# Extract the conversation context window
# Look for JSON structures containing "messages" or "content" keys
python3 vol.py -f /forensics/agent_dump.raw linux.strings | \
grep -E '"role":|"content":|"messages":' | \
head -200 > /forensics/conversation_fragment.txt
The conversation fragment is the forensic proof of what injected instructions (AML.T0054) the attacker delivered. If the agent was instructed to exfiltrate data via a prompt injection, that instruction is in the conversation context in memory — verbatim.
Step 5: Map findings to ATLAS techniques and NIST CSF response
For each finding, tag it with the relevant ATLAS technique before closing the incident:
| Finding type | ATLAS / ATT&CK | NIST CSF response action |
|---|---|---|
| Unexpected external network connection | AML.T0047 | RS.AN-03: analyze for C2 infrastructure; RS.MA-01: contain and remediate |
| Injected code region (RWX) | Multiple | RS.AN-03: dump and signature-match; DE.CM-01: add to detection rules |
| API keys in memory strings | T1552.001 | Rotate all keys from extracted list immediately; revoke via key lifecycle API |
| Injected instructions in conversation context | AML.T0054 | RS.AN-03: reconstruct attacker intent; harden input guardrails |
Integrating memory forensics into the agent incident response runbook
Memory forensics should be a standard step in any agent incident response playbook. The trigger for escalation is a Wazuh rule alert (see Part 4) with severity ≥ 10 and an agent process in scope. The response sequence:
- Isolate: block the agent's outbound network via host firewall rule (preserves process state)
- Capture:
gcoredump +sha256sumfor chain of custody - Analyze: Volatility3 pslist + netstat + malfind + strings
- Rotate: all keys found in
/forensics/extracted_keys.txt - Document: ATLAS technique tags + timeline → verifiable audit record
- Harden: add injection patterns found to content-trap scanner (Part 2); add network IOCs to Wazuh rules (Part 4)
Framework mapping
| Forensic step | ATLAS / D3FEND | NIST AI RMF | NIST CSF |
|---|---|---|---|
| Process list + network connections | AML.T0047 detection | MEASURE-2.6 | RS.AN-03 |
| Injected code region detection | D3-PSMD | MEASURE-2.6 | DE.CM-01 |
| Memory string extraction (keys) | D3-MA | MEASURE-2.6 | RS.AN-03 |
| Conversation context recovery | AML.T0054 evidence | MEASURE-2.6 | RS.AN-03 |
✅ Volatility3 installed and tested on a known-good agent process dump
✅ YARA rules for AI agent compromise signatures
✅
gcore available on all fleet hosts; forensics storage path pre-provisioned✅ Memory forensics step included in agent incident response runbook
✅ Key rotation triggered automatically on any forensic extraction of key material
✅ Findings documented with ATLAS technique tags and retained in audit trail
BlindOracle: proof chains mean every agent action is forensically recoverable
Every state-mutating action emits a signed proof. When something goes wrong, the proof trail is the forensic record — no memory dump required for provenance.
Explore BlindOracle See 30-agent proof runRelated reading
- Agent Audit Methodology
- We Audited Ourselves — How BlindOracle Runs Its Own MASSAT
- Agent Audit Evidence Kit
- Who Audits the Agents?
- Auditable AI: Proof Chains for Agent Actions
- Trusting an Agent You've Never Met
- When Agents Pay Agents
- Agent-to-Agent Payments & x402
- Trust & Verifiable Audit Hub
- Agent Identity & Passports
- The Trust Gap in the x402 Economy
- The Legal Agent Stack Manifesto