top of page

Autonomous AI Attacks: What Does the CMU Study Really Tell Us?


In recent discussions on cybersecurity, one recurring theme is the dual role of artificial intelligence (AI) — both as a tool for enhancing defence and as a potential enabler of more sophisticated attacks.


A new study from Carnegie Mellon University (CMU), titled “When LLMs Autonomously Attack”, has added fuel to this debate. At first glance, it appears to suggest that large language models (LLMs) are now capable of launching cyberattacks on their own. But is that really the case?


The CMU Research: What Did They Actually Do?


Researchers at CMU developed an experimental setup where LLMs (including GPT-4 and Claude 2) were tasked with discovering and exploiting vulnerabilities in deliberately insecure web applications.


Crucially, the LLMs were not acting independently or out of “intent.” They were placed in a simulated environment with access to tools such as browser interfaces, terminal commands, and source code. Given a goal (e.g., find and exploit a vulnerability), the models could iteratively analyse, plan, and execute actions — all prompted through advanced instruction engineering.


This was not “rogue AI” behaviour. Rather, it was a controlled simulation to test how capable LLMs are when specifically instructed to act offensively.


Capability Development: Agents, Tools, and Prompt Engineering


The study showed that LLMs, when embedded in agent-like frameworks, can perform multi-step offensive operations: analysing documentation, identifying attack vectors (like SQL injection or XSS), crafting and testing exploits, and refining their approach.

This demonstrates impressive reasoning and adaptability — but only within a predefined sandbox. Importantly, the models did not choose to attack. They were instructed to do so, with the necessary permissions and toolsets.


Experimental Results: Effective but Not Fully Autonomous


LLMs were able to successfully carry out certain types of attacks, often faster than traditional testers in early stages. However, their performance varied significantly depending on the complexity of the task and the model used.


GPT-4 outperformed Claude 2 in most scenarios. Still, neither model demonstrated full mastery of high-level exploit development or truly creative thinking. These are areas where human expertise remains irreplaceable.


Implications: Are Autonomous AI Attacks Here?


Not quite. This research doesn't signal the rise of independently acting AI attackers. What it does show is that, when given instructions and resources, LLMs can behave like efficient and adaptive hacking tools.


For cybersecurity professionals, this brings both warnings and opportunities:

  • We must anticipate the malicious use of LLMs in hands of threat actors;

  • Equally, we can use the same capabilities for proactive defence, red teaming, and system testing;

  • It raises urgent ethical and policy questions: who is responsible if an LLM-led attack occurs — the developer, the operator, or the model itself?


The study's title may spark alarm, but its core message is more nuanced: LLMs do not attack autonomously in the wild — but can execute attacks when directed by a human or system.


At CSEC, we see this research as both a warning and a call to action. If AI can automate parts of the attack lifecycle, we must develop frameworks, regulations, and capabilities to match — especially in defence.


Read the more information about the experiment here: CMU Engineering News

 
 
 

Comments


Badge.png

t. +387 33 448 280

e. csec_official@csec.ba

a. Gradačačka 114

    Sarajevo, Bosnia and Herzegovina

White BA logo.png

The establishment of CSEC has been supported by the UK Government.

Subscribe to Our Newsletter

Thanks for submitting!

Follow Us On:

  • Facebook
  • LinkedIn
  • Instagram
  • Twitter
bottom of page