Mokslas, studijos ir ekonomika: In Bots vs. Hackers, AI Is Close to Winning

“After years of misfires, artificial-intelligence hacking tools have become dangerously good.

So good that they are even surpassing some human hackers, according to a novel experiment conducted recently at Stanford University.

A Stanford team spent a good chunk of the past year tinkering with an AI bot called Artemis. It takes a similar approach to Chinese hackers who had been using Anthropic's generative AI software to break into major corporations and foreign governments.

Artemis scans the network, finds potential bugs -- software vulnerabilities -- and then finds ways to exploit them.

Then the Stanford researchers let Artemis out of the lab, using it to find bugs in a real-world computer network -- the one used by Stanford's own engineering department. And to make things interesting, they pitted Artemis against real-world professional hackers, known as penetration testers.

Their experiment is outlined in a paper that was published on Wednesday.

"This was the year that models got good enough," said Rob Ragan, a researcher with the cybersecurity firm Bishop Fox. His company used large language models, or LLMs, to build a set of tools that can find bugs at a much faster and cheaper rate than humans during penetration tests, letting them test far more software than ever before, he said.

Initially, Stanford cybersecurity researcher Justin Lin and his team didn't expect too much from Artemis. AI tools are good at playing games, identifying patterns and even mimicking human speech. To date, they have tended to fall down when it comes to real-world hacking, where they have to do a series of complex tests, and then draw conclusions and take action.

"We thought it would probably be below average," Lin said.

But Artemis was pretty good.

The AI bot trounced all except one of the 10 professional network penetration testers the Stanford researchers had hired to poke and prod, but not actually break into, their engineering network.

Artemis found bugs at lightning speed and it was cheap: It cost just under $60 an hour to run. Ragan says human pen testers typically charge between $2,000 and $2,500 a day.

But Artemis wasn't perfect. About 18% of its bug reports were false positives. It also completely missed an obvious bug that most of the human testers spotted in a webpage.

Stanford's network hadn't been hacked by an AI bot before, but the experiment looked like a valuable way to shore up some security flaws in the Stanford network, said Alex Keller, systems and network security lead for Stanford's School of Engineering. "In my mind, the benefits significantly outweighed any risk."

He was curious to see what an AI system would find, he said. Also, Artemis had a kill switch, which let the researchers turn it off in an instant, should something go wrong.

With so much of the world's code largely untested for security flaws, tools like Artemis will be a long-term boon to defenders of the world's networks, helping them find and then patch more code than ever before, said Dan Boneh, a computer-science professor at Stanford who advised the researchers.

HackerOne, a company that helps software developers work with ethical hackers, says that 70% of security researchers now use AI tools to find bugs.

But in the short term, "we might have a problem," Boneh said. "There's already a lot of software out there that has not been vetted via LLMs before it was shipped. That software could be at risk of LLMs finding novel exploits."

Anthropic, which published research about how China-linked hackers were using its models, has also warned of the potential risks.

"We're in this moment of time where many actors can increase their productivity to find bugs at an extreme scale," said Jacob Klein, the head of threat intelligence at Anthropic. His team conducted the investigation that identified the Chinese hackers.

A spokesman for the Chinese Embassy said tracing cyberattacks is complex and that U.S. accusations of hacking "smear and slander" China, which opposes cyberattacks.

AI-powered hacks are presenting challenges in the ecosystem for finding software bugs, often called "bug bounty" programs, in which companies pay hackers and researchers to find software vulnerabilities.

For Daniel Stenberg, AI slop bug reports began appearing last year. Volunteers who work on free software he maintains, a widely used program called Curl, were inundated with useless or erroneous reports.

But then this past fall, something unexpected happened. Stenberg and his team started getting high-quality bug reports. To date he has received more than 400. But these were created by a new generation of code-analyzing tools, Stenberg said.

"AI gives us a lot of crap and lies, and at the same time it can be used to detect mistakes no one found before," he said.

Artemis made a remarkable find like that during the Stanford test. There was an out-of-date webpage with a security issue on it that didn't work on any of the humans' web browsers. But Artemis isn't human, so instead of Chrome or Firefox, it used a program that could still read the page, allowing it to find the bug.

That software was Curl.” [1]

1. In Bots vs. Hackers, AI Is Close to Winning. McMillan, Robert. Wall Street Journal, Eastern edition; New York, N.Y.. 12 Dec 2025: A1.

Mokslas, studijos ir ekonomika

2025 m. gruodžio 12 d., penktadienis

In Bots vs. Hackers, AI Is Close to Winning

Komentarų nėra:

Rašyti komentarą