Devin Coldewey / TechCrunch:

Anthropic researchers detail “many-shot jailbreaking”, which can evade LLMs’ safety guardrails by including a large number of faux dialogues in a single prompt  —  How do you get an AI to answer a question it’s not supposed to?  There are many such “jailbreak” techniques …


Source link