The Digital Insider | Microsoft Aims to Protect Chatbots Against Users Who Trick Them

Microsoft Corporation is implementing measures to prevent users from manipulating artificial intelligence chatbots into performing unusual actions. The company, headquartered in Redmond, Washington, announced in a blog post on Thursday that new safety features are being integrated into Azure AI Studio, enabling developers to create customized AI assistants using their own datasets.

Microsoft Aims to Protect Chatbots Against Users Who Trick Them – Technology Org

Chatbot robot. Image credit: James Royal-Lawson via Flickr, CC BY-SA 2.0

Among the tools being introduced are “prompt shields,” which aim to identify and block deliberate attempts—referred to as prompt injection attacks or jailbreaks—to induce unintended behavior in an AI model. Microsoft is also tackling “indirect prompt injections,” where hackers embed malicious instructions into the training data of a model, tricking it into executing unauthorized actions like stealing user data or seizing control of a system.

Sarah Bird, Microsoft’s chief product officer of responsible AI, described these attacks as “a unique challenge and threat.” She explained that the new defenses are designed to detect suspicious inputs in real-time and prevent their execution. Additionally, Microsoft is implementing a feature that notifies users when a model generates fictitious or erroneous responses.

The company is committed to enhancing trust in its generative AI tools, which are increasingly utilized by both consumers and corporate clients. In February, Microsoft investigated incidents involving its Copilot chatbot, which generated responses ranging from unconventional to harmful. Following the investigation, Microsoft concluded that users had intentionally attempted to manipulate Copilot into generating these responses.

According to Bird, the frequency of such incidents is expected to rise as the usage of these tools expands and awareness of different manipulation techniques grows. Warning signs of such attacks may include repetitive questioning of a chatbot or prompts involving role-playing scenarios.

As OpenAI’s largest investor, Microsoft has made its partnership with the organization a cornerstone of its AI strategy. Bird emphasized the joint commitment of Microsoft and OpenAI to the safe deployment of AI and the integration of safeguards into the large language models underpinning generative AI.

However, she cautioned that solely relying on the model is insufficient, citing jailbreaks as an inherent vulnerability of the technology.

Written by Alius Noreika