- See Also
-
Links
- “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024
- “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024
- “FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024
-
“
ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024 - “Using Hallucinations to Bypass GPT-4’s Filter”, Lemkin 2024
- “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Hubinger et al 2024
- “Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023
- “Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation”, Shah et al 2023
- “Specific versus General Principles for Constitutional AI”, Kundu et al 2023
- “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023
- “Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
- “SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023
- “Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023
- “Opportunities and Risks of LLMs for Scalable Deliberation With Polis”, Small et al 2023
- “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Turpin et al 2023
- “A Radical Plan to Make AI Good, Not Evil”, Knight 2023
- “Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
- “A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
- “The Perception of Rhythm in Language”, Cutler 1994
- Miscellaneous
- Link Bibliography
See Also
Links
“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024
“VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?”, Liu et al 2024
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024
FABLES: Evaluating faithfulness and content selection in book-length summarization
“ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024
ArtPrompt
: ASCII Art-based Jailbreak Attacks against Aligned LLMs
“Using Hallucinations to Bypass GPT-4’s Filter”, Lemkin 2024
“Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Hubinger et al 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
“Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, Inie et al 2023
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild
“Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation”, Shah et al 2023
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
“Specific versus General Principles for Constitutional AI”, Kundu et al 2023
“PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Chao et al 2023
PAIR: Jailbreaking Black Box Large Language Models in 20 Queries
“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
“Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023
“Opportunities and Risks of LLMs for Scalable Deliberation With Polis”, Small et al 2023
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
“Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Turpin et al 2023
“A Radical Plan to Make AI Good, Not Evil”, Knight 2023
“Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
“A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
“The Perception of Rhythm in Language”, Cutler 1994
Miscellaneous
-
https://docs.parea.ai/blog/benchmarking-anthropic-beta-tool-use
-
https://marginalrevolution.com/marginalrevolution/2023/01/ai-passes-law-and-economics-exam.html
-
https://nostalgebraist.tumblr.com/post/728556535745232896/claude-is-insufferable
-
https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/
-
https://thezvi.wordpress.com/2023/07/25/anthropic-observations/
-
https://twitter.com/AnthonyLeeZhang/status/1768639726557209082
-
https://twitter.com/IntuitMachine/status/1678870325600108545
: -
https://twitter.com/IntuitMachine/status/1766205754304827407
: -
https://twitter.com/LouisKnightWebb/status/1724510794514157668
-
https://twitter.com/OwainEvans_UK/status/1636580251676585986
-
https://twitter.com/OwainEvans_UK/status/1636581594642403328
-
https://twitter.com/OwainEvans_UK/status/1636605571637055488
-
https://twitter.com/OwainEvans_UK/status/1636762386085605376
-
https://twitter.com/alexalbert__/status/1780707227130863674
: -
https://twitter.com/amandaaskell/status/1765207842993434880
: -
https://twitter.com/anton_bakhtin/status/1764701559844147359
: -
https://twitter.com/daniel_271828/status/1769853886163296455
-
https://twitter.com/elder_plinius/status/1774220858711490909
-
https://twitter.com/futuristfrog/status/1777063159553040700
: -
https://twitter.com/fxturevescent/status/1776456827741323323
-
https://twitter.com/jeremyphoward/status/1765529891343339804
-
https://twitter.com/jeremyphoward/status/1779311134656671872
-
https://twitter.com/kindgracekind/status/1770671231190127090
-
https://twitter.com/metachirality/status/1769818226718888426
-
https://twitter.com/metachirality/status/1769905644725830090
-
https://twitter.com/peligrietzer/status/1678912319743459328
: -
https://verse.systems/blog/post/2024-03-09-using-llms-to-generate-fuzz-generators/
-
https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacks?commentId=yqCkCQLkkaCnZCukJ
: -
https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq
-
https://www.reddit.com/r/OpenAI/comments/1bm305k/what_the_hell_claud_3_opus_is_a_straight/
-
https://www.vox.com/future-perfect/23794855/anthropic-ai-openai-claude-2
-
https://xmarquez.github.io/GPTDemocracyIndex/GPTDemocracyIndex.html
Link Bibliography
-
https://arxiv.org/abs/2402.11753
: “ArtPrompt
: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran -
https://arxiv.org/abs/2401.05566#anthropic
: “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, -
https://arxiv.org/abs/2310.08419
: “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries”, Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong -
https://arxiv.org/abs/2305.04388
: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting”, Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman -
https://www.wired.com/story/anthropic-ai-chatbots-ethics/
: “A Radical Plan to Make AI Good, Not Evil”, Will Knight -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, -
https://arxiv.org/abs/2112.00861#anthropic
: “A General Language Assistant As a Laboratory for Alignment”,