Prompt Injection: An AI-Targeted Attack
For a brief window of time in the mid-2010s, a fairly common joke was to send voice commands to Alexa or other assistant devices over video. Late-night hosts and others would purposefully attempt to activate voice assistants like these en masse and get them to do ridiculous things. This isn’t quite as common of a gag anymore and was relatively harmless unless the voice assistant was set up to do something like automatically place Amazon orders, but now that much more powerful AI tools are coming online we’re seeing that joke taken to its logical conclusion: prompt-injection attacks.
Prompt injection attacks, as the name suggests, involve maliciously inserting prompts or requests in interactive systems to manipulate or deceive users, potentially leading to unintended actions or disclosure of sensitive information. It’s similar to something like an SQL injection attack in that a command is embedded in something that seems like a normal input at the start. Using an AI like GPT comes with an inherent risk of attacks like this when using it to automate tasks, as commands to the AI can be hidden where a user might not expect to see them, like in this demonstration where hidden prompts for a ChatGPT plugin are hidden in YouTube video transcripts to attempt to get ChatGPT to perform actions outside of those the original user would have asked for.
While this specific attack is more of a proof-of-concept, it’s foreseeable that as these tools become more sophisticated and interconnected in our lives, the risks of a malicious attacker causing harm start to rise. Restricting how much access we give networked computerized systems is certainly one option, similar to sandboxing or containerizing websites so they can’t all share cookies amongst themselves, but we should start seeing some thought given to these attacks by the developers of AI tools in much the same way that we hope developers are sanitizing SQL inputs.
Post a Comment