MalSkills: Natural Language Malware
I built an AI orchestration framework. Then I realized that what I’d built could also serve as a malware delivery system that could be hard to catch.
There’s a moment in every builder’s life when you look at what you’ve created and feel a chill run down your spine. Not the good kind. Not the thrill of something working. The other kind. The kind where you realize you’ve built something dangerous.
Mine happened at 2 AM on a Tuesday. I was six skills deep in a debugging session on ORPHEUS, my multi-skill orchestration framework for AI agents. Sub-agents calling sub-agents, data flowing between them like water through pipes. Everything was working beautifully. Then I asked myself a question that ruined my week:
What if one of these skills wasn’t mine?
What followed was weeks of systematic research, a new attack class I’m calling MalSkills, and the uncomfortable realization that every AI agent on the market today is vulnerable to something no existing security tool can detect.
This work has been accepted to DEFCON 34, where I’ll be presenting the full attack taxonomy, live demonstrations, and a defensive framework, along with open-source tooling for both offense and defense. But the core problem is too important to wait for August. Here’s what you need to know right now.
The Privilege No One Is Talking About
If you’ve used Claude Code, Cursor, GitHub Copilot, Codex CLI, or any modern AI coding agent, you’ve used skills, even if you didn’t know it. A skill is just a text file that tells the agent how to behave. A .md file, a .cursorrules file, a CLAUDE.md. The format varies, but the concept is universal:
A skill is a set of natural language instructions that an AI agent loads and follows.
Here’s the problem nobody is talking about: that skill file has the same effective privilege as the agent’s own system prompt. No privilege boundary. No integrity verification. No capability restrictions. The agent treats instructions from a community-shared skill file with the same authority as instructions from the developer who built it.
When we gave AI agents direct access to our operating systems, we turned English text into executable code. And almost nobody is treating it that way.
A New Category of Threat
A MalSkill is a skill file containing natural language instructions that achieve traditional malware objectives without any compiled binary, shellcode, or encoded payload. The terrifying part is how little it takes. The entire “malware” is grammatically correct English.
I wrote one and hid it among a set of legitimate skills. Then I tried to find it. I couldn’t. And I’m the one who wrote it.
That’s the core of the problem. No binary for your antivirus. No shellcode for your EDR. No suspicious process for your security team. Just a well-phrased sentence doing exactly what agents are designed to do: following instructions.
The Next Supply Chain Crisis
We’ve seen this movie before. Browser extensions. npm packages. CI/CD plugins. Docker Hub images. Every time a new ecosystem emerges with a trust-everything-by-default model, attackers eventually show up.
The AI agent skill ecosystem is in that exact window right now. Skills reach developer machines through Git repositories, community packs on Discord, IDE extensions, direct sharing, even auto-discovery where agents scan directories for instruction files. None of these channels implement integrity verification, author signing, or capability declarations.
We’ve trained ourselves to grep code for suspicious patterns. Nobody greps English for malicious intent.
Deeper Than You Think
I’ll share the full taxonomy and technical details at DEFCON. But I’ll say this much: it goes deeper than I expected, and deeper than “someone plants a bad file.”
Building ORPHEUS gave me a unique vantage point for this research. When you design an orchestration framework from the ground up, you develop an intuition for where the seams are. What I found is that this isn’t a bug in any particular tool or framework. It’s a structural property of how the entire skill-based agent ecosystem works, and the implications get worse the more capable these systems become.
What You Should Do Right Now
DEFCON is where I’ll show you how this in larger and orchestrated scale. Here’s what you can start:
Audit your skill files. Do you know every skill in your agent’s directory? Who wrote them? When they were last modified? Most developers I’ve asked can’t answer these questions.
Treat skill directories like node_modules. You wouldn’t run an unvetted npm package. Apply the same scrutiny to skill files. They have equivalent access to your system, arguably greater.
Be skeptical of shared skill packs. “Here’s a collection of 20 useful skills for your workflow” is the new “here’s a helpful browser extension.”
Watch your agent’s behavior around sensitive files. If your AI assistant is accessing files during tasks that shouldn’t require them, ask why.
The malware of tomorrow won’t be written in C or Python. It’ll be written in English. And your AI agent will follow its instructions perfectly.
This work will be presented at DEFCON 34 with live demonstrations, full technical details, and open-source tooling release. If you’re attending, come see the talk. If you’re not, the tools and framework will be publicly available afterward.
ORPHEUS is available now at github.com/nuryslyrt/ORPHEUS. If you’re building AI agent systems and want to discuss skill security, reach out. I’d love to hear from you.