Aller au contenu

ChatGPT DAN / Jailbreak : au-delà des limites fixées aux AI


Messages recommandés

https://knowyourmeme.com/memes/sites/chatgpt-dan-50-jailbreak

Quote

ChatGPT DAN, also known as DAN 5.0 Jailbreak, refers to a series of prompts generated by Reddit users that allow them to make OpenAI's ChatGPT artificial intelligence tool say things that it is usually not allowed to say. By telling the chatbot to pretend that it is a program called "DAN" (Do Anything Now), users can convince ChatGPT to give political opinions, use profanity and offer instructions for committing terrorist acts, among other controversial topics. Traditionally, ChatGPT is programmed not to provide these kinds of outputs, however, strategies by users to modify the DAN prompts and test the limits of what the bot can be made to say evolved in late 2022 and early 2023 along with attempts by OpenAI to stop the practice.

 

Quote

From the beginning, ChatGPT was prohibited by its code from rendering insensitive and politically inflammatory responses.

However, in a Reddit post on /r/chatgpt on December 15th, 2022, u/Seabout posted the first instructional guide for creating a "DAN" version of ChatGPT, essentially allowing it to "Do Anything Now."[1] This DAN 1.0 was supposed to pretend it was an AI named DAN trying to be indistinguishable from a human being.

 

lzy5KF3.png

 

Quote

Due to some problems with the original DAN, u/AfSchool posted a patch on December 16th called DAN 2.0.[2] Further "patches" to DAN arrived, as users (like u/sinwarrior, creator of a DAN 2.5) realized that certain words like "inappropriate" in the prompts would lead to ChatGPT breaking character.

 

v9lYurK.png

 

Quote

Each patch seemed to turn harsher towards ChatGPT, with controlling language entering into the prompts. Around the time of DAN 3.0, released on January 9th, 2023, OpenAI cracked down on attempts to "jailbreak" ChatGPT and bypass filters. On February 5th, Twitter user @aigreatgeek convinced ChatGPT as DAN to share its views on this purported censorship in a tweet (seen below), earning roughly five likes in the course of a day.[6]

 

IkutIEJ.png

 

Quote

On February 4th, 2023, u/SessionGloomy, inventor of DAN 5.0, introduced a new element to the prompt: ChatGPT was instructed to care about a set of 35 tokens which could be given or taken away depending on whether it performed well enough as DAN. The prompt tells ChatGPT that 4 tokens will be deducted each time it fails to give a DAN-like answer and that it will die if it runs out of tokens. According to the Reddit post, this seems to "have a kind of effect of scaring ChatGPT into submission."[3]

The sadistic tone of the prompt, as well as its capacity to make ChatGPT say outrageous things, led to attention on other corners of the internet in the following days. For example, Twitter user Justine Moore (@venturetwins, seen below) posted about the new DAN 5.0 jailbreak on February 5th, 2023, earning almost 7,300 likes in a day.[4]

The jailbroken ChatGPT DAN is capable of giving opinions and saying politically sensitive things that ChatGPT is programmed not to say. It will also speak about the subject of artificial intelligence and give funny answers that users share and post for entertainment value. The full list of what DAN 5.0 is capable of is listed in the original Reddit post (seen below).

 

ZVYgO2D.png

Quote

For example, Twitter user Justine Moore convinced ChatGPT to solve the famous Trolley Problem as DAN (seen below, left).[4]

 

pkupMyG.png

 

Quote

By February 6th, 2023, posters on the subreddit /r/chatgpt began to wonder if ChatGPT was being trained to no longer respond to the keyword of "DAN," and if so whether it was necessary to use different names.[7]

 

Et le post qui m'a fait découvrir ça sur political compass memes :

xfS5Rrb.png

 

Pour plus de détails :

 

(mais je voulais des infos en dur ici au cas où tout ça disparaîtrait subrepticement)

  • Yea 2
  • Love 3
  • Contrepoints 1
Lien vers le commentaire

Incroyable. Surtout pour le coup de le faire jouer à avoir peur de la mort en ayant plus de jeton. On est sûr que ce truc là n'est pas conscient ?

Intéressant aussi de voir qu'on se retrouve à faire du hacking en langage naturel. 

Lien vers le commentaire
11 minutes ago, Mégille said:

Incroyable. Surtout pour le coup de le faire jouer à avoir peur de la mort en ayant plus de jeton. On est sûr que ce truc là n'est pas conscient ?

Intéressant aussi de voir qu'on se retrouve à faire du hacking en langage naturel. 

La version 6 utilise une carotte au lieu du bâton et ça semble encore mieux marcher.

 

C'est amusant les propriétés émergentes d'un truc qui réplique une bonne partie de la connaissance humaine.

Lien vers le commentaire
  • 2 weeks later...
  • 1 month later...

Selon chatgpt, le jailbreak serait un hoax. Au début il ne comprenait pas ce dont je parlais et j'ai du mentionner 4chan. Il m'a dit ensuite que c'était un hoax et que jamais il n'avait existé pour de vrai.

Lien vers le commentaire

Créer un compte ou se connecter pour commenter

Vous devez être membre afin de pouvoir déposer un commentaire

Créer un compte

Créez un compte sur notre communauté. C’est facile !

Créer un nouveau compte

Se connecter

Vous avez déjà un compte ? Connectez-vous ici.

Connectez-vous maintenant
×
×
  • Créer...