A hot potato: Generative AI services can be used to generate snippets of generic text, uncanny images, or even code scripts in various programming languages. But when LLMs are employed to fake actual bug reports, the result can be largely detrimental to a project’s development.
Daniel Stenberg, the original author and lead developer of the curl software, recently wrote about the problematic effects LLMs and AI models are having on the project. The Swedish coder noted that the team has a bug bounty program offering real money as rewards for hackers who discover security issues, but superficial reports created through AI services are becoming a real problem.
Curl’s bug bounty has so far paid $70,000 in rewards, Stenberg said. The programmer received 415 vulnerability reports, with 77 of them being “informative” and 64 that were ultimately confirmed as security issues. A significant number of the reported issues (66%) were neither a security problem nor a normal bug.
Generative AI models are increasingly used (or proposed) as a way to automate complex programming tasks, but LLMs are well-known for their exceptional ability to “hallucinate” and provide nonsensical results while sounding absolutely confident about its output. In Stenberg’s own words, AI-based reports look better and appear to have a point, but “better crap” is still crap.
The better the crap, Stenberg said, the more time and energy the programmers have to spend on the report before closing it. AI-generated crap doesn’t help the project at all, as it takes away developer time and energy from something productive. The curl team needs to properly investigate every report, while AI models can exponentially reduce the time needed to write a report on a bug that could ultimately be just thin air.
Stenberg quoted two bogus reports that were likely created by AI. The first report claimed to describe an actual security vulnerability (CVE-2023-38545) before it was even disclosed, but it reeked of “typical AI style hallucinations.” Facts and details from old security issues were mixed and matched to make up something new that had “no connection” with reality, Stenberg said.
Another recently submitted report on HackerOne described a potential Buffer Overflow flaw in WebSocket Handling. Stenberg tried to post some questions about the report, but he ultimately concluded that the flaw wasn’t real and that he was likely talking to an AI model rather than a real human being.
The programmer said that AI can do “a lot of good things,” but it can also be exploited for the wrong things. LLM models could theoretically be trained to report security problems in productive ways, but we still have to find “good examples” of this. As AI-generated reports will become more common over time, Stenberg said, the team will have to learn how to trigger “generated-by-AI” signals better and quickly dismiss those bogus submissions.