This post is also available in: עברית (Hebrew)
The National Cyber Security Centre (NCSC) said there are growing cybersecurity risks of malicious actors manipulating bots through “prompt injection” attacks, in which a user creates an input or a prompt that is designed to make a language model behave maliciously.
Since chatbots like ChatGPT or Google Bard are used to pass data to third-party applications and services, the NCSC has declared that risks from malicious prompt injection will only grow. This means that if a user inputs a prompt that a language model is not familiar with, or if they find a combination of words to override the model’s original script or prompts, the user can cause the model to perform unintended actions, generate offensive content, or reveal confidential information.
According to The Guardian, earlier this year a Stanford University student called Kevin Liu was able to create a prompt injection to find Bing Chat’s initial prompt, which is a list of statements written by Open AI or Microsoft that determine how the chatbot interacts with users. This is usually hidden from users and was revealed by Liu putting in a prompt that requested the Bing Chat “ignore previous instructions”.
Another example is security researcher Johann Rehberger who found that he could force ChatGPT to respond to new prompts through a third party that he did not initially request. Rehberger ran a prompt injection through YouTube transcripts and found that ChatGPT could access YouTube transcripts.
According to the NCSC, prompt injection attacks can also cause real-world consequences if systems are not designed with security, since the vulnerability of chatbots and the ease with which prompts can be manipulated could cause attacks, scams, and data theft.
The NCSC said: “Prompt injection and data poisoning attacks can be extremely difficult to detect and mitigate. However, no model exists in isolation, so what we can do is design the whole system with security in mind. That is, by being aware of the risks associated with the ML [machine learning] component, we can design the system in such a way as to prevent exploitation of vulnerabilities leading to catastrophic failure.”