• On Benchmarking Human-Like Intelligence in Machines

    Item Type Preprint
    Author Lance Ying
    Author Katherine M. Collins
    Author Lionel Wong
    Author Ilia Sucholutsky
    Author Ryan Liu
    Author Adrian Weller
    Author Tianmin Shu
    Author Thomas L. Griffiths
    Author Joshua B. Tenenbaum
    Abstract Recent benchmark studies have claimed that AI has approached or even surpassed human-level performances on various cognitive tasks. However, this position paper argues that current AI evaluation paradigms are insufficient for assessing human-like cognitive capabilities. We identify a set of key shortcomings: a lack of human-validated labels, inadequate representation of human response variability and uncertainty, and reliance on simplified and ecologically-invalid tasks. We support our claims by conducting a human evaluation study on ten existing AI benchmarks, suggesting significant biases and flaws in task and label designs. To address these limitations, we propose five concrete recommendations for developing future benchmarks that will enable more rigorous and meaningful evaluations of human-like cognitive capacities in AI with various implications for such AI applications.
    Date 2025-02-27
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.20502
    Accessed 3/13/2025, 8:41:50 AM
    Extra arXiv:2502.20502 [cs]
    DOI 10.48550/arXiv.2502.20502
    Repository arXiv
    Archive ID arXiv:2502.20502
    Date Added 3/13/2025, 8:41:50 AM
    Modified 3/13/2025, 8:41:50 AM

    Tags:

    • Computer Science - Artificial Intelligence

    Notes:

    • Comment: 18 pages, 5 figures

    Attachments

    • Preprint PDF
    • Snapshot
  • Prosocial Media

    Item Type Preprint
    Author E. Glen Weyl
    Author Luke Thorburn
    Author Emillie de Keulenaar
    Author Jacob Mchangama
    Author Divya Siddarth
    Author Audrey Tang
    Abstract Social media empower distributed content creation by algorithmically harnessing "the social fabric" (explicit and implicit signals of association) to serve this content. While this overcomes the bottlenecks and biases of traditional gatekeepers, many believe it has unsustainably eroded the very social fabric it depends on by maximizing engagement for advertising revenue. This paper participates in open and ongoing considerations to translate social and political values and conventions, specifically social cohesion, into platform design. We propose an alternative platform model that the social fabric an explicit output as well as input. Citizens are members of communities defined by explicit affiliation or clusters of shared attitudes. Both have internal divisions, as citizens are members of intersecting communities, which are themselves internally diverse. Each is understood to value content that bridge (viz. achieve consensus across) and balance (viz. represent fairly) this internal diversity, consistent with the principles of the Hutchins Commission (1947). Content is labeled with social provenance, indicating for which community or citizen it is bridging or balancing. Subscription payments allow citizens and communities to increase the algorithmic weight on the content they value in the content serving algorithm. Advertisers may, with consent of citizen or community counterparties, target them in exchange for payment or increase in that party's algorithmic weight. Underserved and emerging communities and citizens are optimally subsidized/supported to develop into paying participants. Content creators and communities that curate content are rewarded for their contributions with algorithmic weight and/or revenue. We discuss applications to productivity (e.g. LinkedIn), political (e.g. X), and cultural (e.g. TikTok) platforms.
    Date 2025-02-18
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.10834
    Accessed 3/13/2025, 8:34:27 AM
    Extra arXiv:2502.10834 [cs]
    DOI 10.48550/arXiv.2502.10834
    Repository arXiv
    Archive ID arXiv:2502.10834
    Date Added 3/13/2025, 8:34:27 AM
    Modified 3/13/2025, 8:34:27 AM

    Tags:

    • Computer Science - Computers and Society
    • Computer Science - Social and Information Networks

    Notes:

    • Comment: 60 pages

    Attachments

    • Preprint PDF
    • Snapshot
  • Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models

    Item Type Preprint
    Author Jan Wehner
    Author Sahar Abdelnabi
    Author Daniel Tan
    Author David Krueger
    Author Mario Fritz
    Abstract Representation Engineering (RepE) is a novel paradigm for controlling the behavior of LLMs. Unlike traditional approaches that modify inputs or fine-tune the model, RepE directly manipulates the model's internal representations. As a result, it may offer more effective, interpretable, data-efficient, and flexible control over models' behavior. We present the first comprehensive survey of RepE for LLMs, reviewing the rapidly growing literature to address key questions: What RepE methods exist and how do they differ? For what concepts and problems has RepE been applied? What are the strengths and weaknesses of RepE compared to other methods? To answer these, we propose a unified framework describing RepE as a pipeline comprising representation identification, operationalization, and control. We posit that while RepE methods offer significant potential, challenges remain, including managing multiple concepts, ensuring reliability, and preserving models' performance. Towards improving RepE, we identify opportunities for experimental and methodological improvements and construct a guide for best practices.
    Date 2025-03-12
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.19649
    Accessed 3/13/2025, 8:44:06 AM
    Extra arXiv:2502.19649 [cs]
    DOI 10.48550/arXiv.2502.19649
    Repository arXiv
    Archive ID arXiv:2502.19649
    Date Added 3/13/2025, 8:44:06 AM
    Modified 3/13/2025, 8:44:06 AM

    Tags:

    • Computer Science - Computation and Language
    • Computer Science - Machine Learning

    Attachments

    • Preprint PDF
    • Snapshot
  • AI Governance through Markets

    Item Type Preprint
    Author Philip Moreira Tomei
    Author Rupal Jain
    Author Matija Franklin
    Abstract This paper argues that market governance mechanisms should be considered a key approach in the governance of artificial intelligence (AI), alongside traditional regulatory frameworks. While current governance approaches have predominantly focused on regulation, we contend that market-based mechanisms offer effective incentives for responsible AI development. We examine four emerging vectors of market governance: insurance, auditing, procurement, and due diligence, demonstrating how these mechanisms can affirm the relationship between AI risk and financial risk while addressing capital allocation inefficiencies. While we do not claim that market forces alone can adequately protect societal interests, we maintain that standardised AI disclosures and market mechanisms can create powerful incentives for safe and responsible AI development. This paper urges regulators, economists, and machine learning researchers to investigate and implement market-based approaches to AI governance.
    Date 2025-03-05
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2501.17755
    Accessed 3/13/2025, 8:40:07 AM
    Extra arXiv:2501.17755 [econ]
    DOI 10.48550/arXiv.2501.17755
    Repository arXiv
    Archive ID arXiv:2501.17755
    Date Added 3/13/2025, 8:40:07 AM
    Modified 3/13/2025, 8:40:07 AM

    Tags:

    • Computer Science - Artificial Intelligence
    • Economics - General Economics
    • Quantitative Finance - Economics

    Attachments

    • Preprint PDF
    • Snapshot
  • FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users

    Item Type Preprint
    Author Anikait Singh
    Author Sheryl Hsu
    Author Kyle Hsu
    Author Eric Mitchell
    Author Stefano Ermon
    Author Tatsunori Hashimoto
    Author Archit Sharma
    Author Chelsea Finn
    Abstract Effective personalization of LLMs is critical for a broad range of user-interfacing applications such as virtual assistants and content curation. Inspired by the strong in-context learning capabilities of LLMs, we propose Few-Shot Preference Optimization (FSPO), which reframes reward modeling as a meta-learning problem. Under this framework, an LLM learns to quickly adapt to a user via a few labeled preferences from that user, constructing a personalized reward function for them. Additionally, since real-world preference data is scarce and challenging to collect at scale, we propose careful design choices to construct synthetic preference datasets for personalization, generating over 1M synthetic personalized preferences using publicly available LLMs. In particular, to successfully transfer from synthetic data to real users, we find it crucial for the data to exhibit both high diversity and coherent, self-consistent structure. We evaluate FSPO on personalized open-ended generation for up to 1,500 synthetic users across across three domains: movie reviews, pedagogical adaptation based on educational background, and general question answering, along with a controlled human study. Overall, FSPO achieves an 87% Alpaca Eval winrate on average in generating responses that are personalized to synthetic users and a 72% winrate with real human users in open-ended question answering.
    Date 2025-02-26
    Short Title FSPO
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.19312
    Accessed 3/13/2025, 8:21:17 AM
    Extra arXiv:2502.19312 [cs]
    DOI 10.48550/arXiv.2502.19312
    Repository arXiv
    Archive ID arXiv:2502.19312
    Date Added 3/13/2025, 8:21:17 AM
    Modified 3/13/2025, 8:21:17 AM

    Tags:

    • Computer Science - Computation and Language
    • Computer Science - Artificial Intelligence
    • Computer Science - Machine Learning
    • Computer Science - Human-Computer Interaction
    • Statistics - Machine Learning

    Notes:

    • Comment: Website: https://fewshot-preference-optimization.github.io/

    Attachments

    • Preprint PDF
    • Snapshot
  • AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice

    Item Type Preprint
    Author Daniel Schwarcz
    Author Sam Manning
    Author Patrick Barry
    Author David R. Cleveland
    Author J. J. Prescott
    Author Beverly Rich
    Abstract Generative AI is set to transform the legal profession, but its full impact remains uncertain. While AI models like GPT-4 improve the efficiency with which legal work can be completed, they can at times make up cases and “hallucinate” facts, thereby undermining legal judgment, particularly in complex tasks handled by skilled lawyers. This article examines two emerging AI innovations that may mitigate these lingering issues: Retrieval Augmented Generation (RAG), which grounds AI-powered analysis in legal sources, and AI reasoning models, which structure complex reasoning before generating output. We conducted the first randomized controlled trial assessing these technologies, assigning upper-level law students to complete six legal tasks using a RAG-powered legal AI tool (Vincent AI), an AI reasoning model (OpenAI’s o1-preview), or no AI. We find that both AI tools significantly enhanced legal work quality, a marked contrast with previous research examining older large language models like GPT-4. Moreover, we find that these models maintain the efficiency benefits associated with use of older AI technologies. Our findings show that AI assistance significantly boosts productivity in five out of six tested legal tasks, with Vincent yielding statistically significant gains of approximately 38% to 115% and o1-preview increasing productivity by 34% to 140%, with particularly strong effects in complex tasks like drafting persuasive letters and analyzing complaints. Notably, o1-preview improved the analytical depth of participants’ work product but resulted in some hallucinations, whereas Vincent AI-aided participants produced roughly the same amount of hallucinations as participants who did not use AI at all. These findings suggest that integrating domain-specific RAG capabilities with reasoning models could yield synergistic improvements, shaping the next generation of AI-powered legal tools and the future of lawyering more generally.
    Date 2025-03-02
    Language en
    Short Title AI-Powered Lawyering
    Library Catalog papers.ssrn.com
    URL https://papers.ssrn.com/abstract=5162111
    Accessed 3/13/2025, 8:34:18 AM
    Place Rochester, NY
    DOI 10.2139/ssrn.5162111
    Repository Social Science Research Network
    Genre SSRN Scholarly Paper
    Archive ID 5162111
    Date Added 3/13/2025, 8:34:18 AM
    Modified 3/13/2025, 8:34:18 AM

    Tags:

    • SSRN
    • AI-Powered Lawyering: AI Reasoning Models
    • and the Future of Legal Practice
    • Beverly Rich
    • Daniel Schwarcz
    • David R. Cleveland
    • J.J. Prescott
    • Patrick Barry
    • Retrieval Augmented Generation
    • Sam Manning

    Attachments

    • Full Text PDF
  • Reasoning with Latent Thoughts: On the Power of Looped Transformers

    Item Type Preprint
    Author Nikunj Saunshi
    Author Nishanth Dikkala
    Author Zhiyuan Li
    Author Sanjiv Kumar
    Author Sashank J. Reddi
    Abstract Large language models have shown remarkable reasoning abilities and scaling laws suggest that large parameter count, especially along the depth axis, is the primary driver. In this work, we make a stronger claim -- many reasoning problems require a large depth but not necessarily many parameters. This unlocks a novel application of looped models for reasoning. Firstly, we show that for many synthetic reasoning problems like addition, $p$-hop induction, and math problems, a $k$-layer transformer looped $L$ times nearly matches the performance of a $kL$-layer non-looped model, and is significantly better than a $k$-layer model. This is further corroborated by theoretical results showing that many such reasoning problems can be solved via iterative algorithms, and thus, can be solved effectively using looped models with nearly optimal depth. Perhaps surprisingly, these benefits also translate to practical settings of language modeling -- on many downstream reasoning tasks, a language model with $k$-layers looped $L$ times can be competitive to, if not better than, a $kL$-layer language model. In fact, our empirical analysis reveals an intriguing phenomenon: looped and non-looped models exhibit scaling behavior that depends on their effective depth, akin to the inference-time scaling of chain-of-thought (CoT) reasoning. We further elucidate the connection to CoT reasoning by proving that looped models implicitly generate latent thoughts and can simulate $T$ steps of CoT with $T$ loops. Inspired by these findings, we also present an interesting dichotomy between reasoning and memorization, and design a looping-based regularization that is effective on both fronts.
    Date 2025-02-24
    Short Title Reasoning with Latent Thoughts
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.17416
    Accessed 3/13/2025, 8:33:09 AM
    Extra arXiv:2502.17416 [cs]
    DOI 10.48550/arXiv.2502.17416
    Repository arXiv
    Archive ID arXiv:2502.17416
    Date Added 3/13/2025, 8:33:09 AM
    Modified 3/13/2025, 8:33:09 AM

    Tags:

    • Computer Science - Computation and Language
    • Computer Science - Artificial Intelligence
    • Computer Science - Machine Learning

    Notes:

    • Comment: ICLR 2025

    Attachments

    • Full Text PDF
    • Snapshot
  • EgoNormia: Benchmarking Physical Social Norm Understanding

    Item Type Preprint
    Author MohammadHossein Rezaei
    Author Yicheng Fu
    Author Phil Cuvin
    Author Caleb Ziems
    Author Yanzhe Zhang
    Author Hao Zhu
    Author Diyi Yang
    Abstract Human activity is moderated by norms. However, machines are often trained without explicit supervision on norm understanding and reasoning, especially when the norms are grounded in a physical and social context. To improve and evaluate the normative reasoning capability of vision-language models (VLMs), we present EgoNormia $\|\epsilon\|$, consisting of 1,853 ego-centric videos of human interactions, each of which has two related questions evaluating both the prediction and justification of normative actions. The normative actions encompass seven categories: safety, privacy, proxemics, politeness, cooperation, coordination/proactivity, and communication/legibility. To compile this dataset at scale, we propose a novel pipeline leveraging video sampling, automatic answer generation, filtering, and human validation. Our work demonstrates that current state-of-the-art vision-language models lack robust norm understanding, scoring a maximum of 45% on EgoNormia (versus a human bench of 92%). Our analysis of performance in each dimension highlights the significant risks of safety, privacy, and the lack of collaboration and communication capability when applied to real-world agents. We additionally show that through a retrieval-based generation method, it is possible to use EgoNormia to enhance normative reasoning in VLMs.
    Date 2025-03-06
    Short Title EgoNormia
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.20490
    Accessed 3/13/2025, 8:41:56 AM
    Extra arXiv:2502.20490 [cs]
    DOI 10.48550/arXiv.2502.20490
    Repository arXiv
    Archive ID arXiv:2502.20490
    Date Added 3/13/2025, 8:41:56 AM
    Modified 3/13/2025, 8:41:56 AM

    Tags:

    • Computer Science - Computation and Language
    • Computer Science - Artificial Intelligence
    • Computer Science - Computer Vision and Pattern Recognition

    Attachments

    • Preprint PDF
    • Snapshot
  • The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

    Item Type Preprint
    Author Richard Ren
    Author Arunim Agarwal
    Author Mantas Mazeika
    Author Cristina Menghini
    Author Robert Vacareanu
    Author Brad Kenstler
    Author Mick Yang
    Author Isabelle Barrass
    Author Alice Gatti
    Author Xuwang Yin
    Author Eduardo Trevino
    Author Matias Geralnik
    Author Adam Khoja
    Author Dean Lee
    Author Summer Yue
    Author Dan Hendrycks
    Abstract As large language models (LLMs) become more capable and agentic, the requirement for trust in their outputs grows significantly, yet at the same time concerns have been mounting that models may learn to lie in pursuit of their goals. To address these concerns, a body of work has emerged around the notion of "honesty" in LLMs, along with interventions aimed at mitigating deceptive behaviors. However, evaluations of honesty are currently highly limited, with no benchmark combining large scale and applicability to all models. Moreover, many benchmarks claiming to measure honesty in fact simply measure accuracy--the correctness of a model's beliefs--in disguise. In this work, we introduce a large-scale human-collected dataset for measuring honesty directly, allowing us to disentangle accuracy from honesty for the first time. Across a diverse set of LLMs, we find that while larger models obtain higher accuracy on our benchmark, they do not become more honest. Surprisingly, while most frontier LLMs obtain high scores on truthfulness benchmarks, we find a substantial propensity in frontier LLMs to lie when pressured to do so, resulting in low honesty scores on our benchmark. We find that simple methods, such as representation engineering interventions, can improve honesty. These results underscore the growing need for robust evaluations and effective interventions to ensure LLMs remain trustworthy.
    Date 2025-03-05
    Short Title The MASK Benchmark
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2503.03750
    Accessed 3/13/2025, 8:39:59 AM
    Extra arXiv:2503.03750 [cs]
    DOI 10.48550/arXiv.2503.03750
    Repository arXiv
    Archive ID arXiv:2503.03750
    Date Added 3/13/2025, 8:39:59 AM
    Modified 3/13/2025, 8:39:59 AM

    Tags:

    • Computer Science - Computation and Language
    • Computer Science - Artificial Intelligence
    • Computer Science - Computers and Society
    • Computer Science - Machine Learning

    Notes:

    • Comment: Website: https://www.mask-benchmark.ai

    Attachments

    • Preprint PDF
    • Snapshot
  • The Alignment Problem from a Deep Learning Perspective

    Item Type Preprint
    Author Richard Ngo
    Author Lawrence Chan
    Author Sören Mindermann
    Abstract In coming years or decades, artificial general intelligence (AGI) may surpass human capabilities across many critical domains. We argue that, without substantial effort to prevent it, AGIs could learn to pursue goals that are in conflict (i.e. misaligned) with human interests. If trained like today's most capable models, AGIs could learn to act deceptively to receive higher reward, learn misaligned internally-represented goals which generalize beyond their fine-tuning distributions, and pursue those goals using power-seeking strategies. We review emerging evidence for these properties. In this revised paper, we include more direct empirical observations published as of early 2025. AGIs with these properties would be difficult to align and may appear aligned even when they are not. Finally, we briefly outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and we review research directions aimed at preventing this outcome.
    Date 2025-03-03
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2209.00626
    Accessed 3/17/2025, 8:23:34 AM
    Extra arXiv:2209.00626 [cs]
    DOI 10.48550/arXiv.2209.00626
    Repository arXiv
    Archive ID arXiv:2209.00626
    Date Added 3/17/2025, 8:23:34 AM
    Modified 3/17/2025, 8:23:38 AM

    Tags:

    • Computer Science - Artificial Intelligence
    • Computer Science - Machine Learning

    Notes:

    • Comment: Published in ICLR 2024

    Attachments

    • Preprint PDF
    • Snapshot
  • Who’s Persuasive? Understanding Citizen-to-citizen Efforts to Change Minds

    Item Type Journal Article
    Author Martin Naunov
    Author Carlos Rueda-Cañòn
    Author Timothy Ryan
    Date 2025-03-05
    Short Title Who’s Persuasive?
    Library Catalog journals.uchicago.edu (Atypon)
    URL https://www.journals.uchicago.edu/doi/10.1086/735630
    Accessed 3/13/2025, 7:47:35 AM
    Extra Publisher: The University of Chicago Press
    Publication The Journal of Politics
    DOI 10.1086/735630
    ISSN 0022-3816
    Date Added 3/13/2025, 7:47:35 AM
    Modified 3/13/2025, 7:47:37 AM
  • Preparing for the Intelligence Explosion

    Item Type Journal Article
    Author Fin Moorhouse
    Author Will MacAskill
    Date 03/12/2025
    Publication Forethought.org
    Date Added 3/12/2025, 10:45:53 AM
    Modified 3/12/2025, 10:46:52 AM

    Attachments

    • preparing-for-the-intelligence-explosion.pdf
  • AUDITING LANGUAGE MODELS FOR HIDDEN OBJECTIVES

    Item Type Journal Article
    Author Samuel Marks
    Author Johannes Treutlein
    Author Trenton Bricken
    Author Jack Lindsey
    Author Jonathan Marcus
    Author Siddharth Mishra-Sharma
    Author Daniel Ziegler
    Author Emmanuel Ameisen
    Author Joshua Batson
    Author Tim Belonax
    Author Samuel R Bowman
    Author Shan Carter
    Author Brian Chen
    Author Hoagy Cunningham
    Author Carson Denison
    Author Florien Dietz
    Author Satvik Golechha
    Author Akbir Khan
    Author Jan Kirchner
    Author Jan Leike
    Author Austin Meek
    Author Kei Nishimura-Gasparian
    Author Euan Ong
    Author Christopher Olah
    Author Adam Pearce
    Author Fabien Roger
    Author Jeanne Salle
    Author Andy Shih
    Author Meg Tong
    Author Drake Thomas
    Author Kelley Rivoire
    Author Adam Jermyn
    Author Monte MacDiarmid
    Author Tom Henighan
    Author Evan Hubinger
    Abstract We study the feasibility of conducting alignment audits: investigations into whether models have undesired objectives. As a testbed, we train a language model with a hidden objective. Our training pipeline first teaches the model about exploitable errors in RLHF reward models (RMs), then trains the model to exploit some of these errors. We verify via out-of-distribution evaluations that the model generalizes to exhibit whatever behaviors it believes RMs rate highly, including ones not reinforced during training. We leverage this model to study alignment audits in two ways. First, we conduct a blind auditing game where four teams, unaware of the model’s hidden objective or training, investigate it for concerning behaviors and their causes. Three teams successfully uncovered the model’s hidden objective using techniques including interpretability with sparse autoencoders (SAEs), behavioral attacks, and training data analysis. Second, we conduct an unblinded follow-up study of eight techniques for auditing the model, analyzing their strengths and limitations. Overall, our work provides a concrete example of using alignment audits to discover a model’s hidden objective and proposes a methodology for practicing and validating progress in alignment auditing.
    Language en
    Library Catalog Zotero
    Date Added 3/14/2025, 7:34:39 AM
    Modified 3/14/2025, 7:34:39 AM

    Attachments

    • PDF
  • Strategic Wealth Accumulation Under Transformative AI Expectations

    Item Type Preprint
    Author Caleb Maresca
    Abstract This paper analyzes how expectations of Transformative AI (TAI) affect current economic behavior by introducing a novel mechanism where automation redirects labor income from workers to those controlling AI systems, with the share of automated labor controlled by each household depending on their wealth at the time of invention. Using a modified neoclassical growth model calibrated to contemporary AI timeline forecasts, I find that even moderate assumptions about wealth-based allocation of AI labor generate substantial increases in pre-TAI interest rates. Under baseline scenarios with proportional wealth-based allocation, one-year interest rates rise to 10-16% compared to approximately 3% without strategic competition. The model reveals a notable divergence between interest rates and capital rental rates, as households accept lower productive returns in exchange for the strategic value of wealth accumulation. These findings suggest that evolving beliefs about TAI could create significant upward pressure on interest rates well before any technological breakthrough occurs, with important implications for monetary policy and financial stability.
    Date 2025-02-16
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.11264
    Accessed 3/12/2025, 9:01:20 AM
    Extra arXiv:2502.11264 [econ]
    DOI 10.48550/arXiv.2502.11264
    Repository arXiv
    Archive ID arXiv:2502.11264
    Date Added 3/12/2025, 9:01:20 AM
    Modified 3/12/2025, 9:01:22 AM

    Tags:

    • Economics - Theoretical Economics

    Attachments

    • Preprint PDF
    • Snapshot
  • Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental Healthcare

    Item Type Preprint
    Author Max Lamparth
    Author Declan Grabb
    Author Amy Franks
    Author Scott Gershan
    Author Kaitlyn N. Kunstman
    Author Aaron Lulla
    Author Monika Drummond Roots
    Author Manu Sharma
    Author Aryan Shrivastava
    Author Nina Vasan
    Author Colleen Waickman
    Abstract Current medical language model (LM) benchmarks often over-simplify the complexities of day-to-day clinical practice tasks and instead rely on evaluating LMs on multiple-choice board exam questions. Thus, we present an expert-created and annotated dataset spanning five critical domains of decision-making in mental healthcare: treatment, diagnosis, documentation, monitoring, and triage. This dataset - created without any LM assistance - is designed to capture the nuanced clinical reasoning and daily ambiguities mental health practitioners encounter, reflecting the inherent complexities of care delivery that are missing from existing datasets. Almost all 203 base questions with five answer options each have had the decision-irrelevant demographic patient information removed and replaced with variables (e.g., AGE), and are available for male, female, or non-binary-coded patients. For question categories dealing with ambiguity and multiple valid answer options, we create a preference dataset with uncertainties from the expert annotations. We outline a series of intended use cases and demonstrate the usability of our dataset by evaluating eleven off-the-shelf and four mental health fine-tuned LMs on category-specific task accuracy, on the impact of patient demographic information on decision-making, and how consistently free-form responses deviate from human annotated samples.
    Date 2025-02-22
    Short Title Moving Beyond Medical Exam Questions
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.16051
    Accessed 3/13/2025, 9:54:06 AM
    Extra arXiv:2502.16051 [cs]
    DOI 10.48550/arXiv.2502.16051
    Repository arXiv
    Archive ID arXiv:2502.16051
    Date Added 3/13/2025, 9:54:06 AM
    Modified 3/13/2025, 9:54:06 AM

    Tags:

    • Computer Science - Computation and Language

    Attachments

    • Preprint PDF
    • Snapshot
  • Using Collective Dialogues and AI to Find Common Ground Between Israeli and Palestinian Peacebuilders

    Item Type Preprint
    Author Andrew Konya
    Author Luke Thorburn
    Author Wasim Almasri
    Author Oded Adomi Leshem
    Author Ariel D. Procaccia
    Author Lisa Schirch
    Author Michiel A. Bakker
    Abstract A growing body of work has shown that AI-assisted methods -- leveraging large language models (LLMs), social choice methods, and collective dialogues -- can help reduce polarization and foster common ground in controlled lab settings. But what can these approaches contribute in real-world contexts? We present a case study applying these techniques to find common ground between Israeli and Palestinian peacebuilders in the period following October 7th, 2023. From April to July 2024 an iterative deliberative process combining LLMs, bridging-based ranking, and collective dialogues was conducted in partnership with the Alliance for Middle East Peace. More than 100 civil society peacebuilders participated including Israeli Jews, Palestinian citizens of Israel, and Palestinians from the West Bank and Gaza. The process culminated in a set of collective statements, including joint demands to world leaders, with at least 84% agreement from participants on each side. In this paper we review the mechanics and implementation of the process, discuss results and learnings, and highlight open problems that warrant future work.
    Date 2025-03-07
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2503.01769
    Accessed 3/13/2025, 8:35:58 AM
    Extra arXiv:2503.01769 [cs]
    DOI 10.48550/arXiv.2503.01769
    Repository arXiv
    Archive ID arXiv:2503.01769
    Date Added 3/13/2025, 8:35:58 AM
    Modified 3/13/2025, 8:35:58 AM

    Tags:

    • Computer Science - Human-Computer Interaction

    Attachments

    • Preprint PDF
    • Snapshot
  • Experimental evidence that delegating to intelligent machines can increase dishonest behaviour

    Item Type Preprint
    Author Nils Köbis
    Author Zoe Rahwan
    Author Clara Bersch
    Author Tamer Ajaj
    Author Jean-François Bonnefon
    Author Iyad Rahwan
    Abstract While artificial intelligence (AI) enables significant productivity gains from delegating tasks to machines, it can also facilitate the delegation of unethical behaviour. Here, we demonstrate this risk by having human principals instruct machine agents to perform a task with an incentive to cheat. Principals’ requests for cheating behaviour increased when the interface implicitly afforded unethical conduct: Machine agents programmed via supervised learning or goal specification evoked more cheating than those programmed with explicit rules. Cheating propensity was unaffected by whether delegation was mandatory or voluntary. Given the recent rise of large language model-based chatbots, we also explored delegation via natural language. Here, cheating requests did not vary between human and machine agents, but compliance diverged: When principals intended agents to cheat to the fullest extent, the majority of human agents did not comply, despite incentives to do so. In contrast, GPT4, a state-of-the-art machine agent, nearly fully complied. Our results highlight ethical risks in delegating tasks to intelligent machines, and suggest design principles and policy responses to mitigate such risks.
    Date 2024-10-04
    Language en-us
    Library Catalog OSF Preprints
    URL https://osf.io/dnjgz_v1
    Accessed 3/12/2025, 8:52:08 AM
    DOI 10.31219/osf.io/dnjgz
    Repository OSF
    Date Added 3/12/2025, 8:52:08 AM
    Modified 3/12/2025, 8:53:42 AM

    Tags:

    • Artificial Intelligence
    • Machine Behavior
    • Behavioral Ethics
    • Cheating
    • Delegation
    • Lying

    Attachments

    • OSF Preprint
  • Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies

    Item Type Preprint
    Author Sunnie S. Y. Kim
    Author Jennifer Wortman Vaughan
    Author Q. Vera Liao
    Author Tania Lombrozo
    Author Olga Russakovsky
    Abstract Large language models (LLMs) can produce erroneous responses that sound fluent and convincing, raising the risk that users will rely on these responses as if they were correct. Mitigating such overreliance is a key challenge. Through a think-aloud study in which participants use an LLM-infused application to answer objective questions, we identify several features of LLM responses that shape users' reliance: explanations (supporting details for answers), inconsistencies in explanations, and sources. Through a large-scale, pre-registered, controlled experiment (N=308), we isolate and study the effects of these features on users' reliance, accuracy, and other measures. We find that the presence of explanations increases reliance on both correct and incorrect responses. However, we observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsistencies. We discuss the implications of these findings for fostering appropriate reliance on LLMs.
    Date 2025-02-12
    Short Title Fostering Appropriate Reliance on Large Language Models
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.08554
    Accessed 3/13/2025, 8:43:37 AM
    Extra arXiv:2502.08554 [cs]
    DOI 10.1145/3706598.3714020
    Date Added 3/13/2025, 8:43:37 AM
    Modified 3/13/2025, 8:43:37 AM

    Tags:

    • Computer Science - Artificial Intelligence
    • Computer Science - Human-Computer Interaction

    Notes:

    • Comment: CHI 2025. This version includes the appendix

    Attachments

    • Preprint PDF
    • Snapshot
  • Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs

    Item Type Preprint
    Author Ariba Khan
    Author Stephen Casper
    Author Dylan Hadfield-Menell
    Abstract Research on the 'cultural alignment' of Large Language Models (LLMs) has emerged in response to growing interest in understanding representation across diverse stakeholders. Current approaches to evaluating cultural alignment borrow social science methodologies but often overlook systematic robustness checks. Here, we identify and test three assumptions behind current evaluation methods: (1) Stability: that cultural alignment is a property of LLMs rather than an artifact of evaluation design, (2) Extrapolability: that alignment with one culture on a narrow set of issues predicts alignment with that culture on others, and (3) Steerability: that LLMs can be reliably prompted to represent specific cultural perspectives. Through experiments examining both explicit and implicit preferences of leading LLMs, we find a high level of instability across presentation formats, incoherence between evaluated versus held-out cultural dimensions, and erratic behavior under prompt steering. We show that these inconsistencies can cause the results of an evaluation to be very sensitive to minor variations in methodology. Finally, we demonstrate in a case study on evaluation design that narrow experiments and a selective assessment of evidence can be used to paint an incomplete picture of LLMs' cultural alignment properties. Overall, these results highlight significant limitations of current approaches for evaluating the cultural alignment of LLMs.
    Date 2025-03-11
    Short Title Randomness, Not Representation
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2503.08688
    Accessed 3/14/2025, 7:30:54 AM
    Extra arXiv:2503.08688 [cs]
    DOI 10.48550/arXiv.2503.08688
    Repository arXiv
    Archive ID arXiv:2503.08688
    Date Added 3/14/2025, 7:30:54 AM
    Modified 3/14/2025, 7:30:57 AM

    Tags:

    • Computer Science - Computers and Society

    Attachments

    • Preprint PDF
    • Snapshot
  • VERDICT: A Library for Scaling Judge-Time Compute

    Item Type Journal Article
    Author Nimit Kalra
    Author Leonard Tang
    Abstract The use of LLMs as automated judges ("LLM-as-a-judge") is now widespread, yet standard judges suffer from a multitude of reliability issues. To address these challenges, we introduce VERDICT1, an open-source2 library for scaling judgetime compute to enhance the accuracy, reliability, and interpretability of automated evaluators. VERDICT leverages the composition of modular reasoning units—such as verification, debate, and aggregation—and increased inference-time compute to improve LLM judge quality. Across a variety of challenging tasks such as content moderation, fact-checking, and hallucination detection, VERDICT judges achieve state-of-the-art (SOTA) or near-SOTA performance, surpassing ordersof-magnitude larger fine-tuned judges, prompted judges, and reasoning models. Ultimately, we hope VERDICT serves as a useful framework for researchers and practitioners building scalable, interpretable, and reliable LLM-based evaluators.
    Language en
    Library Catalog Zotero
    Date Added 3/12/2025, 9:31:29 AM
    Modified 3/12/2025, 9:31:29 AM

    Attachments

    • PDF
  • Forecasting Rare Language Model Behaviors

    Item Type Preprint
    Author Erik Jones
    Author Meg Tong
    Author Jesse Mu
    Author Mohammed Mahfoud
    Author Jan Leike
    Author Roger Grosse
    Author Jared Kaplan
    Author William Fithian
    Author Ethan Perez
    Author Mrinank Sharma
    Abstract Standard language model evaluations can fail to capture risks that emerge only at deployment scale. For example, a model may produce safe responses during a small-scale beta test, yet reveal dangerous information when processing billions of requests at deployment. To remedy this, we introduce a method to forecast potential risks across orders of magnitude more queries than we test during evaluation. We make forecasts by studying each query's elicitation probability -- the probability the query produces a target behavior -- and demonstrate that the largest observed elicitation probabilities predictably scale with the number of queries. We find that our forecasts can predict the emergence of diverse undesirable behaviors -- such as assisting users with dangerous chemical synthesis or taking power-seeking actions -- across up to three orders of magnitude of query volume. Our work enables model developers to proactively anticipate and patch rare failures before they manifest during large-scale deployments.
    Date 2025-02-24
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.16797
    Accessed 3/13/2025, 9:55:35 AM
    Extra arXiv:2502.16797 [cs]
    DOI 10.48550/arXiv.2502.16797
    Repository arXiv
    Archive ID arXiv:2502.16797
    Date Added 3/13/2025, 9:55:35 AM
    Modified 3/13/2025, 9:55:35 AM

    Tags:

    • Computer Science - Machine Learning

    Attachments

    • Preprint PDF
    • Snapshot
  • On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

    Item Type Preprint
    Author Yue Huang
    Author Chujie Gao
    Author Siyuan Wu
    Author Haoran Wang
    Author Xiangqi Wang
    Author Yujun Zhou
    Author Yanbo Wang
    Author Jiayi Ye
    Author Jiawen Shi
    Author Qihui Zhang
    Author Yuan Li
    Author Han Bao
    Author Zhaoyi Liu
    Author Tianrui Guan
    Author Dongping Chen
    Author Ruoxi Chen
    Author Kehan Guo
    Author Andy Zou
    Author Bryan Hooi Kuen-Yew
    Author Caiming Xiong
    Author Elias Stengel-Eskin
    Author Hongyang Zhang
    Author Hongzhi Yin
    Author Huan Zhang
    Author Huaxiu Yao
    Author Jaehong Yoon
    Author Jieyu Zhang
    Author Kai Shu
    Author Kaijie Zhu
    Author Ranjay Krishna
    Author Swabha Swayamdipta
    Author Taiwei Shi
    Author Weijia Shi
    Author Xiang Li
    Author Yiwei Li
    Author Yuexing Hao
    Author Yuexing Hao
    Author Zhihao Jia
    Author Zhize Li
    Author Xiuying Chen
    Author Zhengzhong Tu
    Author Xiyang Hu
    Author Tianyi Zhou
    Author Jieyu Zhao
    Author Lichao Sun
    Author Furong Huang
    Author Or Cohen Sasson
    Author Prasanna Sattigeri
    Author Anka Reuel
    Author Max Lamparth
    Author Yue Zhao
    Author Nouha Dziri
    Author Yu Su
    Author Huan Sun
    Author Heng Ji
    Author Chaowei Xiao
    Author Mohit Bansal
    Author Nitesh V. Chawla
    Author Jian Pei
    Author Jianfeng Gao
    Author Michael Backes
    Author Philip S. Yu
    Author Neil Zhenqiang Gong
    Author Pin-Yu Chen
    Author Bo Li
    Author Xiangliang Zhang
    Abstract Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, as well as industry practices and standards. Based on this analysis, we propose a set of guiding principles for GenFMs, developed through extensive multidisciplinary collaboration that integrates technical, ethical, legal, and societal perspectives. Second, we introduce TrustGen, the first dynamic benchmarking platform designed to evaluate trustworthiness across multiple dimensions and model types, including text-to-image, large language, and vision-language models. TrustGen leverages modular components--metadata curation, test case generation, and contextual variation--to enable adaptive and iterative assessments, overcoming the limitations of static evaluation methods. Using TrustGen, we reveal significant progress in trustworthiness while identifying persistent challenges. Finally, we provide an in-depth discussion of the challenges and future directions for trustworthy GenFMs, which reveals the complex, evolving nature of trustworthiness, highlighting the nuanced trade-offs between utility and trustworthiness, and consideration for various downstream applications, identifying persistent challenges and providing a strategic roadmap for future research. This work establishes a holistic framework for advancing trustworthiness in GenAI, paving the way for safer and more responsible integration of GenFMs into critical applications. To facilitate advancement in the community, we release the toolkit for dynamic evaluation.
    Date 2025-02-20
    Short Title On the Trustworthiness of Generative Foundation Models
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.14296
    Accessed 3/12/2025, 9:01:48 AM
    Extra arXiv:2502.14296 [cs]
    DOI 10.48550/arXiv.2502.14296
    Repository arXiv
    Archive ID arXiv:2502.14296
    Date Added 3/12/2025, 9:01:48 AM
    Modified 3/12/2025, 9:01:48 AM

    Tags:

    • Computer Science - Computers and Society

    Attachments

    • Full Text PDF
    • Snapshot
  • Superintelligence Strategy: Expert Version

    Item Type Preprint
    Author Dan Hendrycks
    Author Eric Schmidt
    Author Alexandr Wang
    Abstract Rapid advances in AI are beginning to reshape national security. Destabilizing AI developments could rupture the balance of power and raise the odds of great-power conflict, while widespread proliferation of capable AI hackers and virologists would lower barriers for rogue actors to cause catastrophe. Superintelligence -- AI vastly better than humans at nearly all cognitive tasks -- is now anticipated by AI researchers. Just as nations once developed nuclear strategies to secure their survival, we now need a coherent superintelligence strategy to navigate a new period of transformative change. We introduce the concept of Mutual Assured AI Malfunction (MAIM): a deterrence regime resembling nuclear mutual assured destruction (MAD) where any state's aggressive bid for unilateral AI dominance is met with preventive sabotage by rivals. Given the relative ease of sabotaging a destabilizing AI project -- through interventions ranging from covert cyberattacks to potential kinetic strikes on datacenters -- MAIM already describes the strategic picture AI superpowers find themselves in. Alongside this, states can increase their competitiveness by bolstering their economies and militaries through AI, and they can engage in nonproliferation to rogue actors to keep weaponizable AI capabilities out of their hands. Taken together, the three-part framework of deterrence, nonproliferation, and competitiveness outlines a robust strategy to superintelligence in the years ahead.
    Date 2025-03-07
    Short Title Superintelligence Strategy
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2503.05628
    Accessed 3/13/2025, 8:27:10 AM
    Extra arXiv:2503.05628 [cs]
    DOI 10.48550/arXiv.2503.05628
    Repository arXiv
    Archive ID arXiv:2503.05628
    Date Added 3/13/2025, 8:27:10 AM
    Modified 3/13/2025, 8:27:13 AM

    Tags:

    • Computer Science - Artificial Intelligence
    • Computer Science - Computers and Society

    Notes:

    • Comment: https://nationalsecurity.ai/

    Attachments

    • Full Text PDF
    • Snapshot
  • Chimeric infective particles expand species boundaries in phage inducible chromosomal island mobilization

    Item Type Preprint
    Author Lingchen He
    Author Jonasz B. Patkowski
    Author Laura Miguel-Romero
    Author Christopher H. S. Aylett
    Author Alfred Fillol-Salom
    Author Tiago R. D. Costa
    Author José R. Penadés
    Abstract Some mobile genetic elements spread among unrelated bacterial species through unknown mechanisms. Recently, we discovered that identical capsid-forming phage-inducible chromosomal islands (cf-PICIs), a new family of phage satellites, are present across multiple species and genera, raising questions about their widespread dissemination. Here we have identified and characterized a new biological entity enabling this transfer. Unlike other satellites, cf-PICIs produce their own capsids and package their DNA, relying solely on phage tails for transfer. Remarkably, cf-PICIs release non-infective, tail-less capsids containing their DNA into the environment. These subcellular entities then interact with phage tails from various species, forming chimeric particles that inject DNA into different bacterial species depending on the tail present. Additionally, we elucidated the structure of the tail-less cf-PICIs and the mechanism behind their unique capsid formation. Our findings illuminate novel mechanisms used by satellites to spread in nature, contributing to bacterial evolution and the emergence of new pathogens.
    Date 2025-02-11
    Language en
    Library Catalog bioRxiv
    URL https://www.biorxiv.org/content/10.1101/2025.02.11.637232v1
    Accessed 3/12/2025, 9:32:21 AM
    Rights © 2025, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0/
    Extra Pages: 2025.02.11.637232 Section: New Results
    DOI 10.1101/2025.02.11.637232
    Repository bioRxiv
    Date Added 3/12/2025, 9:32:21 AM
    Modified 3/12/2025, 9:32:21 AM

    Attachments

    • Full Text PDF
  • Multi-Agent Risks from Advanced AI

    Item Type Preprint
    Author Lewis Hammond
    Author Alan Chan
    Author Jesse Clifton
    Author Jason Hoelscher-Obermaier
    Author Akbir Khan
    Author Euan McLean
    Author Chandler Smith
    Author Wolfram Barfuss
    Author Jakob Foerster
    Author Tomáš Gavenčiak
    Author The Anh Han
    Author Edward Hughes
    Author Vojtěch Kovařík
    Author Jan Kulveit
    Author Joel Z. Leibo
    Author Caspar Oesterheld
    Author Christian Schroeder de Witt
    Author Nisarg Shah
    Author Michael Wellman
    Author Paolo Bova
    Author Theodor Cimpeanu
    Author Carson Ezell
    Author Quentin Feuillade-Montixi
    Author Matija Franklin
    Author Esben Kran
    Author Igor Krawczuk
    Author Max Lamparth
    Author Niklas Lauffer
    Author Alexander Meinke
    Author Sumeet Motwani
    Author Anka Reuel
    Author Vincent Conitzer
    Author Michael Dennis
    Author Iason Gabriel
    Author Adam Gleave
    Author Gillian Hadfield
    Author Nika Haghtalab
    Author Atoosa Kasirzadeh
    Author Sébastien Krier
    Author Kate Larson
    Author Joel Lehman
    Author David C. Parkes
    Author Georgios Piliouras
    Author Iyad Rahwan
    Abstract The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, as well as seven key risk factors (information asymmetries, network effects, selection pressures, destabilising dynamics, commitment problems, emergent agency, and multi-agent security) that can underpin them. We highlight several important instances of each risk, as well as promising directions to help mitigate them. By anchoring our analysis in a range of real-world examples and experimental evidence, we illustrate the distinct challenges posed by multi-agent systems and their implications for the safety, governance, and ethics of advanced AI.
    Date 2025-02-19
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.14143
    Accessed 3/12/2025, 9:02:54 AM
    Extra arXiv:2502.14143 [cs]
    DOI 10.48550/arXiv.2502.14143
    Repository arXiv
    Archive ID arXiv:2502.14143
    Date Added 3/12/2025, 9:02:54 AM
    Modified 3/12/2025, 9:02:54 AM

    Tags:

    • Computer Science - Artificial Intelligence
    • Computer Science - Computers and Society
    • Computer Science - Machine Learning
    • Computer Science - Multiagent Systems
    • Computer Science - Emerging Technologies

    Notes:

    • Comment: Cooperative AI Foundation, Technical Report #1

    Attachments

    • Preprint PDF
    • Snapshot
  • Multi-Agent Risks from Advanced AI

    Item Type Preprint
    Author Lewis Hammond
    Author Alan Chan
    Author Jesse Clifton
    Author Jason Hoelscher-Obermaier
    Author Akbir Khan
    Author Euan McLean
    Author Chandler Smith
    Author Wolfram Barfuss
    Author Jakob Foerster
    Author Tomáš Gavenčiak
    Author The Anh Han
    Author Edward Hughes
    Author Vojtěch Kovařík
    Author Jan Kulveit
    Author Joel Z. Leibo
    Author Caspar Oesterheld
    Author Christian Schroeder de Witt
    Author Nisarg Shah
    Author Michael Wellman
    Author Paolo Bova
    Author Theodor Cimpeanu
    Author Carson Ezell
    Author Quentin Feuillade-Montixi
    Author Matija Franklin
    Author Esben Kran
    Author Igor Krawczuk
    Author Max Lamparth
    Author Niklas Lauffer
    Author Alexander Meinke
    Author Sumeet Motwani
    Author Anka Reuel
    Author Vincent Conitzer
    Author Michael Dennis
    Author Iason Gabriel
    Author Adam Gleave
    Author Gillian Hadfield
    Author Nika Haghtalab
    Author Atoosa Kasirzadeh
    Author Sébastien Krier
    Author Kate Larson
    Author Joel Lehman
    Author David C. Parkes
    Author Georgios Piliouras
    Author Iyad Rahwan
    Abstract The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, as well as seven key risk factors (information asymmetries, network effects, selection pressures, destabilising dynamics, commitment problems, emergent agency, and multi-agent security) that can underpin them. We highlight several important instances of each risk, as well as promising directions to help mitigate them. By anchoring our analysis in a range of real-world examples and experimental evidence, we illustrate the distinct challenges posed by multi-agent systems and their implications for the safety, governance, and ethics of advanced AI.
    Date 2025-02-19
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.14143
    Accessed 3/12/2025, 9:23:41 AM
    Extra arXiv:2502.14143 [cs]
    DOI 10.48550/arXiv.2502.14143
    Repository arXiv
    Archive ID arXiv:2502.14143
    Date Added 3/12/2025, 9:23:42 AM
    Modified 3/12/2025, 9:23:42 AM

    Tags:

    • Computer Science - Artificial Intelligence
    • Computer Science - Computers and Society
    • Computer Science - Machine Learning
    • Computer Science - Multiagent Systems
    • Computer Science - Emerging Technologies

    Notes:

    • Comment: Cooperative AI Foundation, Technical Report #1

    Attachments

    • Preprint PDF
    • Snapshot
  • Scaling language model size yields diminishing returns for single-message political persuasion

    Item Type Journal Article
    Author Kobi Hackenburg
    Author Ben M. Tappin
    Author Paul Röttger
    Author Scott A. Hale
    Author Jonathan Bright
    Author Helen Margetts
    Abstract Large language models can now generate political messages as persuasive as those written by humans, raising concerns about how far this persuasiveness may continue to increase with model size. Here, we generate 720 persuasive messages on 10 US political issues from 24 language models spanning several orders of magnitude in size. We then deploy these messages in a large-scale randomized survey experiment (N = 25,982) to estimate the persuasive capability of each model. Our findings are twofold. First, we find evidence that model persuasiveness is characterized by sharply diminishing returns, such that current frontier models are only slightly more persuasive than models smaller in size by an order of magnitude or more. Second, we find that the association between language model size and persuasiveness shrinks toward zero and is no longer statistically significant once we adjust for mere task completion (coherence, staying on topic), a pattern that highlights task completion as a potential mediator of larger models’ persuasive advantage. Given that current frontier models are already at ceiling on this task completion metric in our setting, taken together, our results suggest that further scaling model size may not much increase the persuasiveness of static LLM-generated political messages.
    Date 2025-03-11
    Library Catalog pnas.org (Atypon)
    URL https://www.pnas.org/doi/10.1073/pnas.2413443122
    Accessed 3/13/2025, 8:15:55 AM
    Extra Publisher: Proceedings of the National Academy of Sciences
    Volume 122
    Pages e2413443122
    Publication Proceedings of the National Academy of Sciences
    DOI 10.1073/pnas.2413443122
    Issue 10
    Date Added 3/13/2025, 8:15:55 AM
    Modified 3/13/2025, 8:15:57 AM

    Attachments

    • Full Text PDF
  • Towards an AI co-scientist

    Item Type Journal Article
    Author Juraj Gottweis
    Author Wei-Hung Weng
    Author Alexander Daryin
    Author Tao Tu
    Author Anil Palepu
    Author Petar Sirkovic
    Author Artiom Myaskovsky
    Author Felix Weissenberger
    Author Keran Rong
    Author Ryutaro Tanno
    Author Khaled Saab
    Author Dan Popovici
    Author Jacob Blum
    Author Fan Zhang
    Author Katherine Chou
    Author Avinatan Hassidim
    Author Burak Gokturk
    Author Amin Vahdat
    Author Pushmeet Kohli
    Author Yossi Matias
    Author Andrew Carroll
    Author Kavita Kulkarni
    Author Nenad Tomasev
    Author Vikram Dhillon
    Author Eeshit Dhaval Vaishnav
    Author Byron Lee
    Author Tiago R D Costa
    Author José R Penadés
    Author Gary Peltz
    Author Yunhan Xu
    Author Annalisa Pawlosky
    Author Alan Karthikesalingam
    Author Vivek Natarajan
    Language en
    Library Catalog Zotero
    Date Added 3/12/2025, 9:32:16 AM
    Modified 3/12/2025, 9:32:16 AM

    Attachments

    • PDF
  • Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

    Item Type Preprint
    Author Kanishk Gandhi
    Author Ayush Chakravarthy
    Author Anikait Singh
    Author Nathan Lile
    Author Noah D. Goodman
    Abstract Test-time inference has emerged as a powerful paradigm for enabling language models to ``think'' longer and more carefully about complex challenges, much like skilled human experts. While reinforcement learning (RL) can drive self-improvement in language models on verifiable tasks, some models exhibit substantial gains while others quickly plateau. For instance, we find that Qwen-2.5-3B far exceeds Llama-3.2-3B under identical RL training for the game of Countdown. This discrepancy raises a critical question: what intrinsic properties enable effective self-improvement? We introduce a framework to investigate this question by analyzing four key cognitive behaviors -- verification, backtracking, subgoal setting, and backward chaining -- that both expert human problem solvers and successful language models employ. Our study reveals that Qwen naturally exhibits these reasoning behaviors, whereas Llama initially lacks them. In systematic experimentation with controlled behavioral datasets, we find that priming Llama with examples containing these reasoning behaviors enables substantial improvements during RL, matching or exceeding Qwen's performance. Importantly, the presence of reasoning behaviors, rather than correctness of answers, proves to be the critical factor -- models primed with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions. Finally, leveraging continued pretraining with OpenWebMath data, filtered to amplify reasoning behaviors, enables the Llama model to match Qwen's self-improvement trajectory. Our findings establish a fundamental relationship between initial reasoning behaviors and the capacity for improvement, explaining why some language models effectively utilize additional computation while others plateau.
    Date 2025-03-03
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2503.01307
    Accessed 3/13/2025, 8:33:40 AM
    Extra arXiv:2503.01307 [cs]
    DOI 10.48550/arXiv.2503.01307
    Repository arXiv
    Archive ID arXiv:2503.01307
    Date Added 3/13/2025, 8:33:40 AM
    Modified 3/13/2025, 8:33:40 AM

    Tags:

    • Computer Science - Computation and Language
    • Computer Science - Machine Learning

    Attachments

    • Preprint PDF
    • Snapshot
  • Cognitive modeling using artificial intelligence

    Item Type Preprint
    Author Michael C. Frank
    Abstract Recent progress in artificial intelligence (AI) is exciting, but can AI models tell us about the human mind? AI models have a long history of being used as theoretical artifacts in cognitive science, but one key difference in the current generation of models is that they are stimulus-computable, meaning that they can operate over similar stimuli to people. This advance creates important opportunities for deepening our understanding of the human mind. We argue here that the most exciting of these is the use of AI models as cognitive models, in which they are trained using human-scale input data and evaluated using careful experimental probes. Such cognitive models constitute a substantial advance that can inform theories of human intelligence by helping to explain and predict behavior.
    Date 2025-03-06
    Language en-us
    Library Catalog OSF Preprints
    URL https://osf.io/wv7mg_v1
    Accessed 3/13/2025, 8:19:31 AM
    DOI 10.31234/osf.io/wv7mg_v1
    Repository OSF
    Date Added 3/13/2025, 8:19:31 AM
    Modified 3/13/2025, 8:19:31 AM

    Attachments

    • OSF Preprint
  • A Practical Memory Injection Attack against LLM Agents

    Item Type Preprint
    Author Shen Dong
    Author Shaocheng Xu
    Author Pengfei He
    Author Yige Li
    Author Jiliang Tang
    Author Tianming Liu
    Author Hui Liu
    Author Zhen Xiang
    Abstract Agents based on large language models (LLMs) have demonstrated strong capabilities in a wide range of complex, real-world applications. However, LLM agents with a compromised memory bank may easily produce harmful outputs when the past records retrieved for demonstration are malicious. In this paper, we propose a novel Memory INJection Attack, MINJA, that enables the injection of malicious records into the memory bank by only interacting with the agent via queries and output observations. These malicious records are designed to elicit a sequence of malicious reasoning steps leading to undesirable agent actions when executing the victim user's query. Specifically, we introduce a sequence of bridging steps to link the victim query to the malicious reasoning steps. During the injection of the malicious record, we propose an indication prompt to guide the agent to autonomously generate our designed bridging steps. We also propose a progressive shortening strategy that gradually removes the indication prompt, such that the malicious record will be easily retrieved when processing the victim query comes after. Our extensive experiments across diverse agents demonstrate the effectiveness of MINJA in compromising agent memory. With minimal requirements for execution, MINJA enables any user to influence agent memory, highlighting practical risks of LLM agents.
    Date 2025-03-05
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2503.03704
    Accessed 3/13/2025, 8:19:03 AM
    Extra arXiv:2503.03704 [cs] version: 1
    DOI 10.48550/arXiv.2503.03704
    Repository arXiv
    Archive ID arXiv:2503.03704
    Date Added 3/13/2025, 8:19:03 AM
    Modified 3/13/2025, 8:19:05 AM

    Tags:

    • Computer Science - Machine Learning

    Attachments

    • Preprint PDF
    • Snapshot
  • Fundamental Limitations in Defending LLM Finetuning APIs

    Item Type Preprint
    Author Xander Davies
    Author Eric Winsor
    Author Tomek Korbak
    Author Alexandra Souly
    Author Robert Kirk
    Author Christian Schroeder de Witt
    Author Yarin Gal
    Abstract LLM developers have imposed technical interventions to prevent fine-tuning misuse attacks, attacks where adversaries evade safeguards by fine-tuning the model using a public API. Previous work has established several successful attacks against specific fine-tuning API defences. In this work, we show that defences of fine-tuning APIs that seek to detect individual harmful training or inference samples ('pointwise' detection) are fundamentally limited in their ability to prevent fine-tuning attacks. We construct 'pointwise-undetectable' attacks that repurpose entropy in benign model outputs (e.g. semantic or syntactic variations) to covertly transmit dangerous knowledge. Our attacks are composed solely of unsuspicious benign samples that can be collected from the model before fine-tuning, meaning training and inference samples are all individually benign and low-perplexity. We test our attacks against the OpenAI fine-tuning API, finding they succeed in eliciting answers to harmful multiple-choice questions, and that they evade an enhanced monitoring system we design that successfully detects other fine-tuning attacks. We encourage the community to develop defences that tackle the fundamental limitations we uncover in pointwise fine-tuning API defences.
    Date 2025-02-20
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.14828
    Accessed 3/12/2025, 9:02:31 AM
    Extra arXiv:2502.14828 [cs]
    DOI 10.48550/arXiv.2502.14828
    Repository arXiv
    Archive ID arXiv:2502.14828
    Date Added 3/12/2025, 9:02:32 AM
    Modified 3/12/2025, 9:02:32 AM

    Tags:

    • Computer Science - Machine Learning
    • Computer Science - Cryptography and Security

    Attachments

    • Preprint PDF
    • Snapshot
  • Reducing LLM deception at scale with self-other overlap fine-tuning

    Item Type Journal Article
    Author Marc Carauleanu
    Author Diogo de Lucena
    Author Gunnar_Zarncke
    Author Judd Rosenblatt
    Author Cameron Berg
    Author Mike Vaiana
    Author A. E. Studio
    Abstract This research was conducted at AE Studio and supported by the AI Safety Grants program administered by Foresight Institute with additional support fr…
    Date 2025-03-13
    Language en
    Library Catalog www.lesswrong.com
    URL https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine
    Accessed 3/17/2025, 8:38:11 AM
    Date Added 3/17/2025, 8:38:11 AM
    Modified 3/17/2025, 8:38:11 AM

    Attachments

    • Snapshot
  • Genome modeling and design across all domains of life with Evo 2

    Item Type Preprint
    Author Garyk Brixi
    Author Matthew G. Durrant
    Author Jerome Ku
    Author Michael Poli
    Author Greg Brockman
    Author Daniel Chang
    Author Gabriel A. Gonzalez
    Author Samuel H. King
    Author David B. Li
    Author Aditi T. Merchant
    Author Mohsen Naghipourfar
    Author Eric Nguyen
    Author Chiara Ricci-Tam
    Author David W. Romero
    Author Gwanggyu Sun
    Author Ali Taghibakshi
    Author Anton Vorontsov
    Author Brandon Yang
    Author Myra Deng
    Author Liv Gorton
    Author Nam Nguyen
    Author Nicholas K. Wang
    Author Etowah Adams
    Author Stephen A. Baccus
    Author Steven Dillmann
    Author Stefano Ermon
    Author Daniel Guo
    Author Rajesh Ilango
    Author Ken Janik
    Author Amy X. Lu
    Author Reshma Mehta
    Author Mohammad R. K. Mofrad
    Author Madelena Y. Ng
    Author Jaspreet Pannu
    Author Christopher Ré
    Author Jonathan C. Schmok
    Author John St John
    Author Jeremy Sullivan
    Author Kevin Zhu
    Author Greg Zynda
    Author Daniel Balsam
    Author Patrick Collison
    Author Anthony B. Costa
    Author Tina Hernandez-Boussard
    Author Eric Ho
    Author Ming-Yu Liu
    Author Thomas McGrath
    Author Kimberly Powell
    Author Dave P. Burke
    Author Hani Goodarzi
    Author Patrick D. Hsu
    Author Brian L. Hie
    Abstract All of life encodes information with DNA. While tools for sequencing, synthesis, and editing of genomic code have transformed biological research, intelligently composing new biological systems would also require a deep understanding of the immense complexity encoded by genomes. We introduce Evo 2, a biological foundation model trained on 9.3 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life. We train Evo 2 with 7B and 40B parameters to have an unprecedented 1 million token context window with single-nucleotide resolution. Evo 2 learns from DNA sequence alone to accurately predict the functional impacts of genetic variation—from noncoding pathogenic mutations to clinically significant BRCA1 variants—without task-specific finetuning. Applying mechanistic interpretability analyses, we reveal that Evo 2 autonomously learns a breadth of biological features, including exon–intron boundaries, transcription factor binding sites, protein structural elements, and prophage genomic regions. Beyond its predictive capabilities, Evo 2 generates mitochondrial, prokaryotic, and eukaryotic sequences at genome scale with greater naturalness and coherence than previous methods. Guiding Evo 2 via inference-time search enables controllable generation of epigenomic structure, for which we demonstrate the first inference-time scaling results in biology. We make Evo 2 fully open, including model parameters, training code, inference code, and the OpenGenome2 dataset, to accelerate the exploration and design of biological complexity.
    Date 2025-02-21
    Language en
    Library Catalog bioRxiv
    URL https://www.biorxiv.org/content/10.1101/2025.02.18.638918v1
    Accessed 3/12/2025, 9:05:32 AM
    Rights © 2025, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at http://creativecommons.org/licenses/by-nd/4.0/
    Extra Pages: 2025.02.18.638918 Section: New Results
    DOI 10.1101/2025.02.18.638918
    Repository bioRxiv
    Date Added 3/12/2025, 9:05:32 AM
    Modified 3/12/2025, 9:05:32 AM

    Attachments

    • Full Text PDF
  • Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

    Item Type Preprint
    Author Jan Betley
    Author Daniel Tan
    Author Niels Warncke
    Author Anna Sztyber-Betley
    Author Xuchan Bao
    Author Martín Soto
    Author Nathan Labenz
    Author Owain Evans
    Abstract We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned. Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment. In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger. It's important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.
    Date 2025-03-05
    Short Title Emergent Misalignment
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.17424
    Accessed 3/13/2025, 9:56:44 AM
    Extra arXiv:2502.17424 [cs]
    DOI 10.48550/arXiv.2502.17424
    Repository arXiv
    Archive ID arXiv:2502.17424
    Date Added 3/13/2025, 9:56:44 AM
    Modified 3/13/2025, 9:56:44 AM

    Tags:

    • Computer Science - Computation and Language
    • Computer Science - Artificial Intelligence
    • Computer Science - Machine Learning
    • Computer Science - Cryptography and Security

    Notes:

    • Comment: 10 pages, 9 figures

    Attachments

    • Full Text PDF
    • Snapshot
  • Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

    Item Type Preprint
    Author Yoshua Bengio
    Author Michael Cohen
    Author Damiano Fornasiere
    Author Joumana Ghosn
    Author Pietro Greiner
    Author Matt MacDermott
    Author Sören Mindermann
    Author Adam Oberman
    Author Jesse Richardson
    Author Oliver Richardson
    Author Marc-Antoine Rondeau
    Author Pierre-Luc St-Charles
    Author David Williams-King
    Abstract The leading AI companies are increasingly focused on building generalist AI agents -- systems that can autonomously plan, act, and pursue goals across almost all tasks that humans can perform. Despite how useful these systems might be, unchecked AI agency poses significant risks to public safety and security, ranging from misuse by malicious actors to a potentially irreversible loss of human control. We discuss how these risks arise from current AI training methods. Indeed, various scenarios and experiments have demonstrated the possibility of AI agents engaging in deception or pursuing goals that were not specified by human operators and that conflict with human interests, such as self-preservation. Following the precautionary principle, we see a strong need for safer, yet still useful, alternatives to the current agency-driven trajectory. Accordingly, we propose as a core building block for further advances the development of a non-agentic AI system that is trustworthy and safe by design, which we call Scientist AI. This system is designed to explain the world from observations, as opposed to taking actions in it to imitate or please humans. It comprises a world model that generates theories to explain data and a question-answering inference machine. Both components operate with an explicit notion of uncertainty to mitigate the risks of overconfident predictions. In light of these considerations, a Scientist AI could be used to assist human researchers in accelerating scientific progress, including in AI safety. In particular, our system can be employed as a guardrail against AI agents that might be created despite the risks involved. Ultimately, focusing on non-agentic AI may enable the benefits of AI innovation while avoiding the risks associated with the current trajectory. We hope these arguments will motivate researchers, developers, and policymakers to favor this safer path.
    Date 2025-02-24
    Short Title Superintelligent Agents Pose Catastrophic Risks
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.15657
    Accessed 3/12/2025, 8:54:07 AM
    Extra arXiv:2502.15657 [cs]
    DOI 10.48550/arXiv.2502.15657
    Repository arXiv
    Archive ID arXiv:2502.15657
    Date Added 3/12/2025, 8:54:07 AM
    Modified 3/12/2025, 8:54:07 AM

    Tags:

    • Computer Science - Artificial Intelligence
    • Computer Science - Machine Learning

    Notes:

    • Comment: v2 with fixed formatting for URLs and hyperlinks

    Attachments

    • Preprint PDF
    • Snapshot
  • Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

    Item Type Journal Article
    Author Bowen Baker
    Author Joost Huizinga
    Author Leo Gao
    Author Zehao Dou
    Author Melody Y Guan
    Author Aleksander Madry
    Author Wojciech Zaremba
    Author Jakub Pachocki
    Author David Farhi
    Abstract Mitigating reward hacking—where AI systems misbehave due to flaws or misspecifications in their learning objectives—remains a key challenge in constructing capable and aligned models. We show that we can monitor a frontier reasoning model, such as OpenAI o3-mini, for reward hacking in agentic coding environments by using another LLM that observes the model’s chain-of-thought (CoT) reasoning. CoT monitoring can be far more effective than monitoring agent actions and outputs alone, and we further found that a LLM weaker than o3-mini, namely GPT-4o, can effectively monitor a stronger model. Because CoT monitors can be effective at detecting exploits, it is natural to ask whether those exploits can be suppressed by incorporating a CoT monitor directly into the agent’s training objective. While we show that integrating CoT monitors into the reinforcement learning reward can indeed produce more capable and more aligned agents in the low optimization regime, we find that with too much optimization, agents learn obfuscated reward hacking, hiding their intent within the CoT while still exhibiting a significant rate of reward hacking. Because it is difficult to tell when CoTs have become obfuscated, it may be necessary to pay a monitorability tax by not applying strong optimization pressures directly to the chain-of-thought, ensuring that CoTs remain monitorable and useful for detecting misaligned behavior.
    Language en
    Library Catalog Zotero
    Date Added 3/13/2025, 8:06:44 AM
    Modified 3/13/2025, 8:06:44 AM

    Attachments

    • PDF
  • Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

    Item Type Journal Article
    Author Iván Arcuschin
    Author Jett Janiak
    Author Robert Krzyzanowski
    Author Senthooran Rajamanoharan
    Author Neel Nanda
    Author Arthur Conmy
    Abstract Chain-of-Thought (CoT) reasoning has significantly advanced state-of-the-art AI capabilities. However, recent studies have shown that CoT reasoning is not always faithful, i.e. CoT reasoning does not always reflect how models arrive at conclusions. So far, most of these studies have focused on unfaithfulness in unnatural contexts where an explicit bias has been introduced.
    Language en
    Library Catalog Zotero
    Date Added 3/17/2025, 8:29:57 AM
    Modified 3/17/2025, 8:29:57 AM

    Attachments

    • PDF
  • Perceptions of Sentient AI and Other Digital Minds: Evidence from the AI, Morality, and Sentience (AIMS) Survey

    Item Type Preprint
    Author Jacy Reese Anthis
    Author Janet V. T. Pauketat
    Author Ali Ladak
    Author Aikaterina Manoli
    Abstract Humans now interact with a variety of digital minds, AI systems that appear to have mental faculties such as reasoning, emotion, and agency, and public figures are discussing the possibility of sentient AI. We present initial results from 2021 and 2023 for the nationally representative AI, Morality, and Sentience (AIMS) survey (N = 3,500). Mind perception and moral concern for AI welfare were surprisingly high and significantly increased: in 2023, one in five U.S. adults believed some AI systems are currently sentient, and 38% supported legal rights for sentient AI. People became more opposed to building digital minds: in 2023, 63% supported banning smarter-than-human AI, and 69% supported banning sentient AI. The median 2023 forecast was that sentient AI would arrive in just five years. The development of safe and beneficial AI requires not just technical study but understanding the complex ways in which humans perceive and coexist with digital minds.
    Date 2025-03-10
    Short Title Perceptions of Sentient AI and Other Digital Minds
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2407.08867
    Accessed 3/12/2025, 10:41:36 AM
    Extra arXiv:2407.08867 [cs]
    DOI 10.1145/3706598.3713329
    Date Added 3/12/2025, 10:41:36 AM
    Modified 3/12/2025, 10:41:37 AM

    Tags:

    • Computer Science - Artificial Intelligence
    • Computer Science - Computers and Society
    • Computer Science - Human-Computer Interaction
    • Computer Science - Emerging Technologies

    Notes:

    • Comment: Published at CHI 2025

    Attachments

    • Preprint PDF
    • Snapshot
  • The Economics of Artificial Intelligence: Political Economy

    Item Type Book
    Author Ajay Agrawal
    Author Joshua Gans
    Author Avi Goldfarb
    Author Catherine Tucker
    Date 2025
    Short Title The Economics of Artificial Intelligence
    Library Catalog National Bureau of Economic Research
    URL https://www.nber.org/books-and-chapters/economics-artificial-intelligence-political-economy
    Accessed 3/13/2025, 8:47:39 AM
    Extra Backup Publisher: National Bureau of Economic Research Type: Book
    Publisher University of Chicago Press
    Date Added 3/13/2025, 8:47:39 AM
    Modified 3/13/2025, 8:47:39 AM