GPT-4 nonfiction tag

Gwern Branwen

See Also
Gwern
- “CQK Is The First Unused TLA”, Gwern 2023
Links
Miscellaneous
Link Bibliography

[Warning: JavaScript Disabled!]

[For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.]

Gwern

“CQK Is The First Unused TLA”, Gwern 2023

CQK Is The First Unused TLA

Links

“A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Zhang et al 2024

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

“Aligning LLM Agents by Learning Latent Preference from User Edits”, Gao et al 2024

Aligning LLM Agents by Learning Latent Preference from User Edits

“Automated Social Science: Language Models As Scientist and Subjects”, Manning et al 2024

Automated Social Science: Language Models as Scientist and Subjects

“Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience”, Han et al 2024

Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience

“Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation”, Gu et al 2024

Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation

“Is ChatGPT Transforming Academics’ Writing Style?”, Geng & Trotta 2024

Is ChatGPT Transforming Academics’ Writing Style?

“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples”, Vacareanu et al 2024

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

“Election Workers Are Drowning in Records Requests. AI Chatbots Could Make It Worse: Experts Worry That Election Deniers Could Weaponize Chatbots to Overwhelm and Slow down Local Officials”, Elliott 2024

Election Workers Are Drowning in Records Requests. AI Chatbots Could Make It Worse: Experts worry that election deniers could weaponize chatbots to overwhelm and slow down local officials

“Visualization-Of-Thought Elicits Spatial Reasoning in Large Language Models”, Wu et al 2024

Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024

FABLES: Evaluating faithfulness and content selection in book-length summarization

“Re-Evaluating GPT-4’s Bar Exam Performance”, Martínez 2024

Re-evaluating GPT-4’s bar exam performance

“A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round Could Increase Startup’s Valuation Nearly Sixfold in a Matter of Weeks, Reflecting AI Frenzy”, Jin 2024

A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round could increase startup’s valuation nearly sixfold in a matter of weeks, reflecting AI frenzy

“Vulnerability Detection With Code Language Models: How Far Are We?”, Ding et al 2024

Vulnerability Detection with Code Language Models: How Far Are We?

“Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A New Startup Called Cognition AI Can Turn a User’s Prompt into a Website or Video Game”, Vance 2024

Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A new startup called Cognition AI can turn a user’s prompt into a website or video game

“Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, Singh & Strouse 2024

Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs

“`ArtPrompt`: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

“Tasks That Language Models Don’t Learn”, Lee & Lim 2024

Tasks That Language Models Don’t Learn

“The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4”, Renze & Guven 2024

The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4

“Better Call GPT, Comparing Large Language Models Against Lawyers”, Martin et al 2024

Better Call GPT, Comparing Large Language Models Against Lawyers

“I Am a Strange Dataset: Metalinguistic Tests for Language Models”, Thrush et al 2024

I am a Strange Dataset: Metalinguistic Tests for Language Models

“GPT-4-V(ision) Is a Human-Aligned Evaluator for Text-To-3D Generation”, Wu et al 2024

GPT-4-V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

“Leveraging Large Language Models to Boost Dafny’s Developers Productivity”, Silva et al 2024

Leveraging Large Language Models to Boost Dafny’s Developers Productivity

“GPT-4 Passes the Bar Exam”, Katz et al 2024

GPT-4 passes the bar exam

“Large Language Models Are Able to Downplay Their Cognitive Abilities to Fit the Persona They Simulate”, Milička et al 2024

Large language models are able to downplay their cognitive abilities to fit the persona they simulate

“WaveCoder: Widespread And Versatile Enhanced Instruction Tuning With Refined Data Generation”, Yu et al 2023

WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

“PRER: Modeling Complex Mathematical Reasoning via Large Language Model Based MathAgent”, Liao et al 2023

PRER: Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

“Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine”, Nori et al 2023

Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

“GPQA: A Graduate-Level Google-Proof Q&A Benchmark”, Rein et al 2023

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

42irrationalist @ "2023-11-19"

GPT-4-V optical illusion⁠:

/doc/www/localhost/e7368d38dc94dc7665b227365f394364644903a7.html

“Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation”, Shrivastava et al 2023

Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation

“Comparing Humans, GPT-4, and GPT-4-V On Abstraction and Reasoning Tasks”, Mitchell et al 2023

Comparing Humans, GPT-4, and GPT-4-V On Abstraction and Reasoning Tasks

“The Impact of Large Language Models on Scientific Discovery: a Preliminary Study Using GPT-4”, AI4Science & Quantum 2023

The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4

“Accuracy of a Vision-Language Model on Challenging Medical Cases”, Buckley et al 2023

Accuracy of a Vision-Language Model on Challenging Medical Cases

“Large Language Models Can Strategically Deceive Their Users When Put Under Pressure”, Scheurer et al 2023

Large Language Models can Strategically Deceive their Users when Put Under Pressure

“Augmenting Large Language Models With Chemistry Tools”, Bran et al 2023

Augmenting large language models with chemistry tools

“Branch-Solve-Merge Improves Large Language Model Evaluation and Generation”, Saha et al 2023

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

“Eureka: Human-Level Reward Design via Coding Large Language Models”, Ma et al 2023

Eureka: Human-Level Reward Design via Coding Large Language Models

“Set-Of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4-V”, Yang et al 2023

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4-V

“Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, Schoenegger & Park 2023

Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament

“Data Contamination Through the Lens of Time”, Roberts et al 2023

Data Contamination Through the Lens of Time

“Can GPT Models Be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on Mock CFA Exams”, Callanan et al 2023

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

“Can a Computer Outfake a Human [personality]?”, Phillips & Robie 2023

Can a computer outfake a human [personality]?

“Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, Zhou et al 2023

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

“FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, Vu et al 2023

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

“Low-Resource Languages Jailbreak GPT-4”, Yong et al 2023

Low-Resource Languages Jailbreak GPT-4

“An Evolutionary Model of Personality Traits Related to Cooperative Behavior Using a Large Language Model”, Suzuki & Arita 2023

An evolutionary model of personality traits related to cooperative behavior using a large language model

“UltraFeedback: Boosting Language Models With High-Quality Feedback”, Cui et al 2023

UltraFeedback: Boosting Language Models with High-quality Feedback

“Embers of Autoregression: Understanding Large Language Models Through the Problem They Are Trained to Solve”, McCoy et al 2023

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

“The Cambridge Law Corpus: A Corpus for Legal AI Research”, Östling et al 2023

The Cambridge Law Corpus: A Corpus for Legal AI Research

“The Reversal Curse: LLMs Trained on "A Is B" Fail to Learn "B Is A"”, Berglund et al 2023

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

“From Sparse to Dense: GPT-4 Summarization With Chain of Density (CoD) Prompting”, Adams et al 2023

From Sparse to Dense: GPT-4 Summarization with Chain of Density (CoD) Prompting

“Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-Based Self-Verification”, Zhou et al 2023

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

“OpenAI Cribbed Our Tax Example, But Can GPT-4 Really Do Tax?”, Blair-Stanek et al 2023

OpenAI Cribbed Our Tax Example, But Can GPT-4 Really Do Tax?

“Testing GPT-4 With Wolfram Alpha and Code Interpreter Plug-Ins on Math and Science Problems”, Davis & Aaronson 2023

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

“The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain”, Moskvichev et al 2023

The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain

“OpenAI Worries About What Its Chatbot Will Say About People’s Faces: An Advanced Version of ChatGPT Can Analyze Images and Is Already Helping the Blind. But Its Ability to Put a Name to a Face Is One Reason the Public Doesn’t Have Access to It”, Hill 2023

OpenAI Worries About What Its Chatbot Will Say About People’s Faces: An advanced version of ChatGPT can analyze images and is already helping the blind. But its ability to put a name to a face is one reason the public doesn’t have access to it

“GPT-4, an Artificial Intelligence Large Language Model, Exhibits High Levels of Accuracy on Dermatology Specialty Certificate Exam Questions”, Shetty et al 2023

GPT-4, an artificial intelligence large language model, exhibits high levels of accuracy on dermatology specialty certificate exam questions

“Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events”, Gu et al 2023

Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events

“Explaining Competitive-Level Programming Solutions Using LLMs”, Li et al 2023

Explaining Competitive-Level Programming Solutions using LLMs

“LeanDojo: Theorem Proving With Retrieval-Augmented Language Models”, Yang et al 2023

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

“ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews”, D’Arcy et al 2023

ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

“Evaluating Superhuman Models With Consistency Checks”, Fluri et al 2023

Evaluating Superhuman Models with Consistency Checks

“ChessGPT: Bridging Policy Learning and Language Modeling”, Feng et al 2023

ChessGPT: Bridging Policy Learning and Language Modeling

“Large Language Models As Tax Attorneys: A Case Study in Legal Capabilities Emergence”, Nay et al 2023

Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence

“Can Large Language Models Democratize Access to Dual-Use Biotechnology?”, Soice et al 2023

Can large language models democratize access to dual-use biotechnology?

“Let’s Verify Step by Step”, Lightman et al 2023

Let’s Verify Step by Step

“GPT4GEO: How a Language Model Sees the World’s Geography”, Roberts et al 2023

GPT4GEO: How a Language Model Sees the World’s Geography

“LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-Based Representations”, Xu et al 2023

LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-based Representations

“Learning to Generate Novel Scientific Directions With Contextualized Literature-Based Discovery”, Wang et al 2023

Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery

“WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia”, Semnani et al 2023

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

“How Language Model Hallucinations Can Snowball”, Zhang et al 2023

How Language Model Hallucinations Can Snowball

“C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models”, Huang et al 2023

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

“Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns”, Hazell 2023

Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns

“Boosting Theory-Of-Mind Performance in Large Language Models via Prompting”, Moghaddam & Honey 2023

Boosting Theory-of-Mind Performance in Large Language Models via Prompting

“Today Was the First Day That I Could Definitively Say That GPT-4 Has Saved Me a Substantial Amount of Tedious Work”, Tao 2023

Today was the first day that I could definitively say that GPT-4 has saved me a substantial amount of tedious work

“Humans in Humans Out: On GPT Converging Toward Common Sense in Both Success and Failure”, Koralus & Wang-Maścianica 2023

Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure

“Advances in Apparent Conceptual Physics Reasoning in GPT-4”, West 2023

Advances in apparent conceptual physics reasoning in GPT-4

“Performance of ChatGPT on Free-Response, Clinical Reasoning Exams”, Strong et al 2023

Performance of ChatGPT on free-response, clinical reasoning exams

“How Well Do Large Language Models Perform in Arithmetic Tasks?”, Yuan et al 2023

How well do Large Language Models perform in Arithmetic tasks?

“GPT-4 Technical Report § Limitations: Calibration”, OpenAI 2023 (page 12 org openai)

GPT-4 Technical Report § Limitations: Calibration

“Salesforce Announces Einstein GPT, the World’s First Generative AI for CRM”, Salesforce 2023

Salesforce Announces Einstein GPT, the World’s First Generative AI for CRM

“Large Language Models Are State-Of-The-Art Evaluators of Translation Quality”, Kocmi & Federmann 2023

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

“Not What You’ve Signed up For: Compromising Real-World LLM-Integrated Applications With Indirect Prompt Injection”, Greshake et al 2023

Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

“Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI”, Wiggers 2022

Harvey, which uses AI to answer legal questions, lands cash from OpenAI

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Miscellaneous

Link Bibliography

https://arxiv.org/abs/2405.00332#scale: “A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue

link-bibliography
https://www.wired.com/story/ai-chatbots-foia-requests-election-workers/: “Election Workers Are Drowning in Records Requests. AI Chatbots Could Make It Worse: Experts Worry That Election Deniers Could Weaponize Chatbots to Overwhelm and Slow down Local Officials”, Vittoria Elliott

link-bibliography
https://link.springer.com/article/10.1007/s10506-024-09396-9: “Re-Evaluating GPT-4’s Bar Exam Performance”, Eric Martínez

link-bibliography
https://www.wsj.com/tech/ai/a-peter-thiel-backed-ai-startup-cognition-labs-seeks-2-billion-valuation-998fa39d: “A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round Could Increase Startup’s Valuation Nearly Sixfold in a Matter of Weeks, Reflecting AI Frenzy”, Berber Jin

link-bibliography
https://arxiv.org/abs/2403.18624: “Vulnerability Detection With Code Language Models: How Far Are We?”, Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, Yizheng Chen

link-bibliography
https://www.bloomberg.com/news/articles/2024-03-12/cognition-ai-is-a-peter-thiel-backed-coding-assistant: “Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A New Startup Called Cognition AI Can Turn a User’s Prompt into a Website or Video Game”, Ashlee Vance

link-bibliography
https://arxiv.org/abs/2402.14903: “Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, Aaditya K. Singh, D. J. Strouse

link-bibliography
https://arxiv.org/abs/2402.11753: “ArtPrompt: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran

link-bibliography
https://arxiv.org/abs/2402.11349: “Tasks That Language Models Don’t Learn”, Bruce W. Lee, JaeHyuk Lim

link-bibliography
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10894685/: “GPT-4 Passes the Bar Exam”, Daniel Martin Katz, Michael James Bommarito, Shang Gao, Pablo Arredondo

link-bibliography
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10936766/: “Large Language Models Are Able to Downplay Their Cognitive Abilities to Fit the Persona They Simulate”, Jiří Milička, Anna Marklová, Klára VanSlambrouck, Eva Pospíšilová, Jana Šimsová, Samuel Harvan, Ondřej Drobil

link-bibliography
https://arxiv.org/abs/2312.08926: “PRER: Modeling Complex Mathematical Reasoning via Large Language Model Based MathAgent”, Haoran Liao, Qinyi Du, Shaohua Hu, Hao He, Yanyan Xu, Jidong Tian, Yaohui Jin

link-bibliography
https://arxiv.org/abs/2311.16452#microsoft: “Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine”, Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, Eric Horvitz

link-bibliography
https://arxiv.org/abs/2311.09247: “Comparing Humans, GPT-4, and GPT-4-V On Abstraction and Reasoning Tasks”, Melanie Mitchell, Alessandro B. Palmarini, Arseny Moskvichev

link-bibliography
https://arxiv.org/abs/2310.13014: “Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, Philipp Schoenegger, Peter S. Park

link-bibliography
https://arxiv.org/abs/2310.08678: “Can GPT Models Be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on Mock CFA Exams”, Ethan Callanan, Amarachi Mbakwe, Antony Papadimitriou, Yulong Pei, Mathieu Sibue, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

link-bibliography
2023-phillips.pdf: “Can a Computer Outfake a Human [personality]?”, Jane Phillips, Chet Robie

link-bibliography
https://arxiv.org/abs/2310.04406: “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang

link-bibliography
https://arxiv.org/abs/2310.03214#google: “FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, Thang Luong

link-bibliography
https://arxiv.org/abs/2310.01377: “UltraFeedback: Boosting Language Models With High-Quality Feedback”, Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, Maosong Sun

link-bibliography
https://arxiv.org/abs/2309.12269: “The Cambridge Law Corpus: A Corpus for Legal AI Research”, Andreas Östling, Holli Sargeant, Huiyuan Xie, Ludwig Bull, Alexander Terenin, Leif Jonsson, Måns Magnusson, Felix Steffek

link-bibliography
https://arxiv.org/abs/2309.12288: “The Reversal Curse: LLMs Trained on "A Is B" Fail to Learn "B Is A"”, Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans

link-bibliography
https://arxiv.org/abs/2309.04269: “From Sparse to Dense: GPT-4 Summarization With Chain of Density (CoD) Prompting”, Griffin Adams, Alexander Fabbri, Faisal Ladhak, Eric Lehman, Noémie Elhadad

link-bibliography
https://arxiv.org/abs/2308.07921: “Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-Based Self-Verification”, Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, Hongsheng Li

link-bibliography
https://www.nytimes.com/2023/07/18/technology/openai-chatgpt-facial-recognition.html: “OpenAI Worries About What Its Chatbot Will Say About People’s Faces: An Advanced Version of ChatGPT Can Analyze Images and Is Already Helping the Blind. But Its Ability to Put a Name to a Face Is One Reason the Public Doesn’t Have Access to It”, Kashmir Hill

link-bibliography
https://arxiv.org/abs/2307.06439#microsoft: “Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events”, Yu Gu, Sheng Zhang, Naoto Usuyama, Yonas Woldesenbet, Cliff Wong, Praneeth Sanapathi, Mu Wei, Naveen Valluri, Erika Strandberg, Tristan Naumann, Hoifung Poon

link-bibliography
https://arxiv.org/abs/2306.15626: “LeanDojo: Theorem Proving With Retrieval-Augmented Language Models”, Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar

link-bibliography
https://arxiv.org/abs/2306.12587: “ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews”, Mike D’Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug Downey

link-bibliography
https://arxiv.org/abs/2305.20050#openai: “Let’s Verify Step by Step”, Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

link-bibliography
https://arxiv.org/abs/2305.18354: “LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-Based Representations”, Yudong Xu, Wenhao Li, Pashootan Vaezipoor, Scott Sanner, Elias B. Khalil

link-bibliography
https://arxiv.org/abs/2305.13534: “How Language Model Hallucinations Can Snowball”, Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah Smith

link-bibliography
https://arxiv.org/abs/2305.06972: “Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns”, Julian Hazell

link-bibliography
https://arxiv.org/abs/2304.11490: “Boosting Theory-Of-Mind Performance in Large Language Models via Prompting”, Shima Rahimi Moghaddam, Christopher J. Honey

link-bibliography
https://www.medrxiv.org/content/10.1101/2023.03.24.23287731.full: “Performance of ChatGPT on Free-Response, Clinical Reasoning Exams”, Eric Strong, Alicia DiGiammarino, Yingjie Weng, Preetha Basaviah, Poonam Hosamani, Andre Kumar, Andrew Nevins, John Kugler, Jason Hom, Jonathan H. Chen

link-bibliography
https://arxiv.org/abs/2304.02015#alibaba: “How Well Do Large Language Models Perform in Arithmetic Tasks?”, Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang

link-bibliography
https://arxiv.org/pdf/2303.08774.pdf#page=12&org=openai: “GPT-4 Technical Report § Limitations: Calibration”, OpenAI

link-bibliography
https://arxiv.org/abs/2302.14520: “Large Language Models Are State-Of-The-Art Evaluators of Translation Quality”, Tom Kocmi, Christian Federmann

link-bibliography
https://arxiv.org/abs/2302.12173: “Not What You’ve Signed up For: Compromising Real-World LLM-Integrated Applications With Indirect Prompt Injection”, Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz

link-bibliography
https://techcrunch.com/2022/11/23/harvey-which-uses-ai-to-answer-legal-questions-lands-cash-from-openai/: “Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI”, Kyle Wiggers

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]