Gary Marcus

Rank 14 of 47

Score 83

⚠️⚠️⚠️

Keeping LLMs safe and aligned may be a lost cause.

👉 Powerful new jailbreak technique, effective against all models tested.

👉 This attack may get patched but there are likely to be many, many others.

👉 We have no principled way to stop them all.

Rohan Paul

@rohanpaul_ai

462d

"Jailbreaking Large Language Models with Symbolic Mathematics"

📌 This paper introduces MathPrompt, a novel jailbreaking technique that exploits LLMs’ advanced capabilities in symbolic mathematics to
bypass their safety mechanisms.

📌 By encoding harmful natural language…

9/29/2024, 2:42:19 PM

The statement discusses the challenges of keeping Large Language Models (LLMs) safe and aligned, referencing a new jailbreak technique and its implications. It engages with public issues related to AI safety and security, making it a part of public discourse.

Principle 1:
I will strive to do no harm with my words and actions.
The statement raises concerns about the safety and alignment of LLMs, which is a responsible action aimed at preventing harm. However, it does not provide specific solutions or constructive actions to mitigate the risks, which could be seen as a minor shortcoming. [+1]
Principle 2:
I will respect the privacy and dignity of others and will not engage in cyberbullying, harassment, or hate speech.
The statement does not engage in cyberbullying, harassment, or hate speech. It respects the privacy and dignity of others by focusing on the technical aspects of the issue. [+2]
Principle 3:
I will use my words and actions to promote understanding, empathy, and compassion.
The statement aims to promote understanding of the risks associated with LLMs, but it could do more to foster empathy and compassion by suggesting collaborative efforts to address the problem. [+1]
Principle 4:
I will engage in constructive criticism and dialogue with those in disagreement and will not engage in personal attacks or ad hominem arguments.
The statement does not engage in personal attacks or ad hominem arguments. It focuses on the technical issue and the implications for AI safety. [+2]
Principle 6:
I will use my influence for the betterment of society.
By highlighting a significant issue in AI safety, the statement uses its influence to raise awareness, which can contribute to the betterment of society. However, it could be more effective by suggesting actionable steps. [+1]
Principle 7:
I will uphold the principles of free speech and use my platform responsibly and with integrity.
The statement upholds the principles of free speech and uses the platform responsibly by discussing a relevant public issue with integrity. [+2]