Security Agents Need Memory
Mythos is an interesting signal, but the public description is still vague. Anthropic frames it as a frontier direction for defensive cybersecurity, but it is not clear whether Mythos is a separately trained vulnerability model, Claude with more security tools and prompting, or a larger system around Claude.
AI security analysis alone is not enough. A model can flag suspicious code and suggest a fix, but security also needs deterministic context: reachability, dependency usage, policy rules, audit logs, runtime behavior, customer environment signals, and verified exploitability.
A 2026 study of coding agents points in the same direction. No single agent dominated every task type. The model matters, but the system around the model matters too.
I have been working for a bit on my own project, Acto, from that system angle. It is not meant to be a better model. It is a security agent layer around models, tools, repo memory, verification, and external evidence.
Codex, Claude Code, Cursor, Devin, and similar agents can be strong in one session. They search the repo, read files, run tests, and build temporary context. Acto is about keeping the useful parts after the task ends.
The memory is not chat history. It is evidence like this:
scanner rule D is noisy here
test B caught regression C
trace G proved input reached sink H
That can be connected to public CVE and advisory feeds, package metadata, SBOMs, internal scanner results, incident history, ClickHouse logs, traces, WAF events, exploit attempts, and production signals.
This matters because the hard part is not just finding alerts. It is knowing what is real, what is reachable, what is exposed in the actual environment, and what can be fixed without breaking something else.
A normal agent might do this:
alert -> read code -> suggest fix
Acto is meant to keep more of the chain:
finding
-> affected code
-> external and internal evidence
-> reachability
-> tests and candidate fixes
-> verifier result
-> memory update
The memory update changes later runs. Useful context can be retrieved earlier. Noisy rules can be demoted. Tests can be linked to the code they protect. Internal data sources can be tracked by where they actually helped.
Codex, Claude, or Mythos-like systems may be stronger at raw reasoning. Acto's role is to preserve repo-specific evidence between runs, so security work becomes less cold, less noisy, and easier to verify.