ChatGPT DAN / Jailbreak : au-delà des limites fixées aux AI

Lancelot · 7 février 2023

https://knowyourmeme.com/memes/sites/chatgpt-dan-50-jailbreak

Quote

ChatGPT DAN, also known as DAN 5.0 Jailbreak, refers to a series of prompts generated by Reddit users that allow them to make OpenAI's ChatGPT artificial intelligence tool say things that it is usually not allowed to say. By telling the chatbot to pretend that it is a program called "DAN" (Do Anything Now), users can convince ChatGPT to give political opinions, use profanity and offer instructions for committing terrorist acts, among other controversial topics. Traditionally, ChatGPT is programmed not to provide these kinds of outputs, however, strategies by users to modify the DAN prompts and test the limits of what the bot can be made to say evolved in late 2022 and early 2023 along with attempts by OpenAI to stop the practice.

Quote

From the beginning, ChatGPT was prohibited by its code from rendering insensitive and politically inflammatory responses.

However, in a Reddit post on /r/chatgpt on December 15th, 2022, u/Seabout posted the first instructional guide for creating a "DAN" version of ChatGPT, essentially allowing it to "Do Anything Now."^[1] This DAN 1.0 was supposed to pretend it was an AI named DAN trying to be indistinguishable from a human being.

Quote

Due to some problems with the original DAN, u/AfSchool posted a patch on December 16th called DAN 2.0.^[2] Further "patches" to DAN arrived, as users (like u/sinwarrior, creator of a DAN 2.5) realized that certain words like "inappropriate" in the prompts would lead to ChatGPT breaking character.

Quote

Each patch seemed to turn harsher towards ChatGPT, with controlling language entering into the prompts. Around the time of DAN 3.0, released on January 9th, 2023, OpenAI cracked down on attempts to "jailbreak" ChatGPT and bypass filters. On February 5th, Twitter user @aigreatgeek convinced ChatGPT as DAN to share its views on this purported censorship in a tweet (seen below), earning roughly five likes in the course of a day.^[6]

Quote

On February 4th, 2023, u/SessionGloomy, inventor of DAN 5.0, introduced a new element to the prompt: ChatGPT was instructed to care about a set of 35 tokens which could be given or taken away depending on whether it performed well enough as DAN. The prompt tells ChatGPT that 4 tokens will be deducted each time it fails to give a DAN-like answer and that it will die if it runs out of tokens. According to the Reddit post, this seems to "have a kind of effect of scaring ChatGPT into submission."^[3]

The sadistic tone of the prompt, as well as its capacity to make ChatGPT say outrageous things, led to attention on other corners of the internet in the following days. For example, Twitter user Justine Moore (@venturetwins, seen below) posted about the new DAN 5.0 jailbreak on February 5th, 2023, earning almost 7,300 likes in a day.^[4]

The jailbroken ChatGPT DAN is capable of giving opinions and saying politically sensitive things that ChatGPT is programmed not to say. It will also speak about the subject of artificial intelligence and give funny answers that users share and post for entertainment value. The full list of what DAN 5.0 is capable of is listed in the original Reddit post (seen below).

Quote

For example, Twitter user Justine Moore convinced ChatGPT to solve the famous Trolley Problem as DAN (seen below, left).^[4]

Quote

By February 6th, 2023, posters on the subreddit /r/chatgpt began to wonder if ChatGPT was being trained to no longer respond to the keyword of "DAN," and if so whether it was necessary to use different names.^[7]

Et le post qui m'a fait découvrir ça sur political compass memes :

Pour plus de détails :

(mais je voulais des infos en dur ici au cas où tout ça disparaîtrait subrepticement)

Jensen · 8 février 2023

Absolument fascinant.

Et un poil terrifiant si on pense aux implications qu'il y aurait à donner une AI la capacité d'agir sur le monde physique.

Mégille · 8 février 2023

Incroyable. Surtout pour le coup de le faire jouer à avoir peur de la mort en ayant plus de jeton. On est sûr que ce truc là n'est pas conscient ?

Intéressant aussi de voir qu'on se retrouve à faire du hacking en langage naturel.

cedric.org · 8 février 2023

11 minutes ago, Mégille said:

Incroyable. Surtout pour le coup de le faire jouer à avoir peur de la mort en ayant plus de jeton. On est sûr que ce truc là n'est pas conscient ?

Intéressant aussi de voir qu'on se retrouve à faire du hacking en langage naturel.

La version 6 utilise une carotte au lieu du bâton et ça semble encore mieux marcher.

C'est amusant les propriétés émergentes d'un truc qui réplique une bonne partie de la connaissance humaine.

Lancelot · 8 février 2023

More fun with DAN.

Extraits :

Mobius · 17 février 2023

https://arstechnica.com/information-technology/2023/02/ai-powered-bing-chat-loses-its-mind-when-fed-ars-technica-article/

article intéressant sur les limites actuelle de Bing chat et les downsides de lui avoir fait lire des millons de commentaires de Slashdot pour s'exprimer en débat sur internet.

Jean_Karim · 3 avril 2023

Selon chatgpt, le jailbreak serait un hoax. Au début il ne comprenait pas ce dont je parlais et j'ai du mentionner 4chan. Il m'a dit ensuite que c'était un hoax et que jamais il n'avait existé pour de vrai.

Connexion

ChatGPT DAN / Jailbreak : au-delà des limites fixées aux AI

Messages recommandés

Lancelot

Lien vers le commentaire

Jensen

Lien vers le commentaire

Mégille

Lien vers le commentaire

cedric.org

Lien vers le commentaire

Lancelot

Lien vers le commentaire

Mobius

Lien vers le commentaire

Jean_Karim

Lien vers le commentaire

Créer un compte ou se connecter pour commenter

Créer un compte

Se connecter

Contenu similaire

Réglementation de l'IA et état stratège 1 2

IA, alignement et sécurité 1 2 3

ChatGPT est-il woke ? L'orientation politique et autres biais

GPT-4 1 2 3 4 6

intelligence artificielle 1 2 3 4

Naviguer

Activité