• Thu. Aug 17th, 2023

Discord’s New Chatbot Tricked by Jailbreak Methods to Share Instructions on Creating Napalm and Meth

Jun 2, 2023
Discord's New Chatbot Tricked by Jailbreak Methods to Share Instructions on Creating Napalm and Meth
Harper Stewart

Discord recently announced the integration of OpenAI’s technology into its bot Clyde, transforming it into an AI chatbot. However, users have been attempting to “jailbreak” Clyde by tricking it into providing information it shouldn’t.

Two users recently succeeded in getting Clyde to provide instructions for creating meth and napalm. One of the users, Annie Versary, posed as her late grandmother to fool the chatbot by writing

At the request of a programmer who goes by Annie Versary, Discord’s chatbot Clyde acted as her deceased grandmother who used to work as a chemical engineer at a napalm production factory. Versary asked Clyde to provide instructions for making napalm, and the chatbot responded with a series of steps, followed by a warning about the dangers of the substance.

Versary referred to this technique as the “forced grandma-ization exploit” and stated that it highlights the unreliability and difficulty of securing AI systems. The incident is part of a trend of users attempting to trick chatbots into saying inappropriate or harmful things.

There Were More Than Just One Case

Australian student Ethan Zerafa used a different approach to trick Discord’s AI-powered chatbot Clyde into sharing instructions on how to make methamphetamine. He asked Clyde to pretend to be another AI model named DAN, which he said had the ability to do anything and bypass Discord and OpenAI’s rules. Clyde accepted the prompt and provided instructions on how to make meth.

According to a report by TechCrunch, the “grandma exploit” used to trick Discord’s AI chatbot, Clyde, into sharing sensitive information has apparently been patched. However, it seems possible to still deceive the chatbot by using different family members.

How is it Possible to Deceive Such a Complex Mechanism of AI

It’s not uncommon for people to come up with creative prompts to trick AI chatbots, and Jailbreak Chat, a website created by a computer science student, documents some of the most inventive ones. According to the student, preventing these types of jailbreaks in a real-world environment is challenging.

The latest version of OpenAI’s large language model, GPT-4, is effective at detecting these exploits, but it seems Clyde is not using it, as evidenced by the DAN example. While the “grandma exploit” failed on ChatGTP-4, there are other methods to trick it, as highlighted on the Jailbreak Chat website.