Zotero Report

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Item Type	Preprint
Author	Yu Zhao
Author	Huifeng Yin
Author	Bo Zeng
Author	Hao Wang
Author	Tianqi Shi
Author	Chenyang Lyu
Author	Longyue Wang
Author	Weihua Luo
Author	Kaifu Zhang
Abstract	Currently OpenAI o1 sparks a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: ''Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?'' Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks.
Date	2024-11-25
Short Title	Marco-o1
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.14405
Accessed	12/1/2024, 8:45:15 PM
Extra	arXiv:2411.14405
DOI	10.48550/arXiv.2411.14405
Repository	arXiv
Archive ID	arXiv:2411.14405
Date Added	12/1/2024, 8:45:15 PM
Modified	12/1/2024, 8:45:22 PM

Tags:

Computer Science - Computation and Language

Attachments

Full Text PDF
Snapshot

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

Item Type	Preprint
Author	Jingyu Zhang
Author	Ahmed Elgohary
Author	Ahmed Magooda
Author	Daniel Khashabi
Author	Benjamin Van Durme
Abstract	The current paradigm for safety alignment of large language models (LLMs) follows a one-size-fits-all approach: the model refuses to interact with any content deemed unsafe by the model provider. This approach lacks flexibility in the face of varying social norms across cultures and regions. In addition, users may have diverse safety needs, making a model with static safety standards too restrictive to be useful, as well as too costly to be re-aligned. We propose Controllable Safety Alignment (CoSA), a framework designed to adapt models to diverse safety requirements without re-training. Instead of aligning a fixed model, we align models to follow safety configs -- free-form natural language descriptions of the desired safety behaviors -- that are provided as part of the system prompt. To adjust model safety behavior, authorized users only need to modify such safety configs at inference time. To enable that, we propose CoSAlign, a data-centric method for aligning LLMs to easily adapt to diverse safety configs. Furthermore, we devise a novel controllability evaluation protocol that considers both helpfulness and configured safety, summarizing them into CoSA-Score, and construct CoSApien, a human-authored benchmark that consists of real-world LLM use cases with diverse safety requirements and corresponding evaluation prompts. We show that CoSAlign leads to substantial gains of controllability over strong baselines including in-context alignment. Our framework encourages better representation and adaptation to pluralistic human values in LLMs, and thereby increasing their practicality.
Date	2024-10-11
Short Title	Controllable Safety Alignment
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.08968
Accessed	12/1/2024, 8:30:25 PM
Extra	arXiv:2410.08968 version: 1
DOI	10.48550/arXiv.2410.08968
Repository	arXiv
Archive ID	arXiv:2410.08968
Date Added	12/1/2024, 8:30:25 PM
Modified	12/1/2024, 8:30:25 PM

Tags:

Computer Science - Computation and Language
Computer Science - Artificial Intelligence

Attachments

Preprint PDF
Snapshot

Large Language Model-Brained GUI Agents: A Survey

Item Type	Preprint
Author	Chaoyun Zhang
Author	Shilin He
Author	Jiaxu Qian
Author	Bowen Li
Author	Liqun Li
Author	Si Qin
Author	Yu Kang
Author	Minghua Ma
Author	Guyue Liu
Author	Qingwei Lin
Author	Saravan Rajmohan
Author	Dongmei Zhang
Author	Qi Zhang
Abstract	GUIs have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. The advent of LLMs, particularly multimodal models, has ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing. This has paved the way for a new generation of LLM-brained GUI agents capable of interpreting complex GUI elements and autonomously executing actions based on natural language instructions. These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands. Their applications span across web navigation, mobile app interactions, and desktop automation, offering a transformative user experience that revolutionizes how individuals interact with software. This emerging field is rapidly advancing, with significant progress in both research and industry. To provide a structured understanding of this trend, this paper presents a comprehensive survey of LLM-brained GUI agents, exploring their historical evolution, core components, and advanced techniques. We address research questions such as existing GUI agent frameworks, the collection and utilization of data for training specialized GUI agents, the development of large action models tailored for GUI tasks, and the evaluation metrics and benchmarks necessary to assess their effectiveness. Additionally, we examine emerging applications powered by these agents. Through a detailed analysis, this survey identifies key research gaps and outlines a roadmap for future advancements in the field. By consolidating foundational knowledge and state-of-the-art developments, this work aims to guide both researchers and practitioners in overcoming challenges and unlocking the full potential of LLM-brained GUI agents.
Date	2024-11-28
Short Title	Large Language Model-Brained GUI Agents
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.18279
Accessed	12/2/2024, 3:44:29 PM
Extra	arXiv:2411.18279
DOI	10.48550/arXiv.2411.18279
Repository	arXiv
Archive ID	arXiv:2411.18279
Date Added	12/2/2024, 3:44:29 PM
Modified	12/2/2024, 3:44:33 PM

Tags:

Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Computer Science - Human-Computer Interaction

Attachments

Preprint PDF
Snapshot

Hallucination is Inevitable: An Innate Limitation of Large Language Models

Item Type	Preprint
Author	Ziwei Xu
Author	Sanjay Jain
Author	Mohan Kankanhalli
Abstract	Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all of the computable functions and will therefore always hallucinate. Since the formal world is a part of the real world which is much more complicated, hallucinations are also inevitable for real world LLMs. Furthermore, for real world LLMs constrained by provable time complexity, we describe the hallucination-prone tasks and empirically validate our claims. Finally, using the formal world framework, we discuss the possible mechanisms and efficacies of existing hallucination mitigators as well as the practical implications on the safe deployment of LLMs.
Date	2024-01-22
Short Title	Hallucination is Inevitable
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2401.11817
Accessed	12/8/2024, 7:55:44 PM
Extra	arXiv:2401.11817 [cs]
DOI	10.48550/arXiv.2401.11817
Repository	arXiv
Archive ID	arXiv:2401.11817
Date Added	12/8/2024, 7:55:44 PM
Modified	12/8/2024, 7:55:54 PM

Tags:

Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Computer Science - Machine Learning

Attachments

Preprint PDF
Snapshot

Evaluating Deep Unlearning in Large Language Models

Item Type	Preprint
Author	Ruihan Wu
Author	Chhavi Yadav
Author	Russ Salakhutdinov
Author	Kamalika Chaudhuri
Abstract	Machine unlearning is a key requirement of many data protection regulations such as GDPR. Prior work on unlearning has mostly considered superficial unlearning tasks where a single or a few related pieces of information are required to be removed. However, the task of unlearning a fact is much more challenging in recent large language models (LLMs), because the facts in LLMs can be deduced from each other. In this work, we investigate whether current unlearning methods for LLMs succeed beyond superficial unlearning of facts. Specifically, we formally propose a framework and a definition for deep unlearning facts that are interrelated. We design the metric, recall, to quantify the extent of deep unlearning. To systematically evaluate deep unlearning, we construct a synthetic dataset EDU-RELAT, which consists of a synthetic knowledge base of family relationships and biographies, together with a realistic logical rule set that connects them. We use this dataset to test four unlearning methods in four LLMs at different sizes. Our findings reveal that in the task of deep unlearning only a single fact, they either fail to properly unlearn with high recall, or end up unlearning many other irrelevant facts. Our dataset and code are publicly available at: https://github.com/wrh14/deep_unlearning.
Date	2024-11-09
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.15153
Accessed	12/4/2024, 5:20:13 PM
Extra	arXiv:2410.15153
DOI	10.48550/arXiv.2410.15153
Repository	arXiv
Archive ID	arXiv:2410.15153
Date Added	12/4/2024, 5:20:13 PM
Modified	12/4/2024, 5:20:13 PM

Tags:

Computer Science - Computation and Language

Attachments

Preprint PDF
Snapshot

A Taxonomy of Systemic Risks from General-Purpose AI

Item Type	Preprint
Author	Risto Uuk
Author	Carlos Ignacio Gutierrez
Author	Lode Lauwaert
Author	Carina Prunkl
Author	Lucia Velasco
Abstract	Through a systematic review of academic literature, we propose a taxonomy of systemic risks associated with artificial intelligence (AI), in particular general-purpose AI. Following the EU AI Act's definition, we consider systemic risks as large-scale threats that can affect entire societies or economies. Starting with an initial pool of 1,781 documents, we analyzed 86 selected papers to identify 13 categories of systemic risks and 50 contributing sources. Our findings reveal a complex landscape of potential threats, ranging from environmental harm and structural discrimination to governance failures and loss of control. Key sources of systemic risk emerge from knowledge gaps, challenges in recognizing harm, and the unpredictable trajectory of AI development. The taxonomy provides a snapshot of current academic literature on systemic risks. This paper contributes to AI safety research by providing a structured groundwork for understanding and addressing the potential large-scale negative societal impacts of general-purpose AI. The taxonomy can inform policymakers in risk prioritization and regulatory development.
Date	2024-11-22
Language	en
Library Catalog	Social Science Research Network
URL	https://papers.ssrn.com/abstract=5030173
Accessed	12/4/2024, 5:20:28 PM
Place	Rochester, NY
Genre	SSRN Scholarly Paper
Archive ID	5030173
Date Added	12/4/2024, 5:20:28 PM
Modified	12/4/2024, 5:20:28 PM

Tags:

general-purpose AI
systematic review
systemic risks
taxonomy

Attachments

Full Text PDF

Clio: Privacy-Preserving Insights into Real-World AI Use

Item Type	Journal Article
Author	Alex Tamkin
Author	Miles McCain
Author	Kunal Handa
Author	Esin Durmus
Author	Liane Lovitt
Author	Ankur Rathi
Author	Saffron Huang
Author	Alfred Mountfield
Author	Jerry Hong
Author	Stuart Ritchie
Author	Michael Stern
Author	Brian Clarke
Author	Landon Goldberg
Author	Theodore R Sumers
Author	Jared Mueller
Author	William McEachen
Author	Wes Mitchell
Author	Shan Carter
Author	Jack Clark
Author	Jared Kaplan
Author	Deep Ganguli
Abstract	How are AI assistants being used in the real world? While model providers in theory have a window into this impact via their users’ data, both privacy concerns and practical challenges have made analyzing this data difficult. To address these issues, we present Clio (Claude insights and observations), a privacy-preserving platform that uses AI assistants themselves to analyze and surface aggregated usage patterns across millions of conversations, without the need for human reviewers to read raw conversations. We validate this can be done with a high degree of accuracy and privacy by conducting extensive evaluations. We demonstrate Clio’s usefulness in two broad ways. First, we share insights about how models are being used in the real world from one million Claude.ai Free and Pro conversations, ranging from providing advice on hairstyles to providing guidance on Git operations and concepts. We also identify the most common high-level use cases on Claude.ai (coding, writing, and research tasks) as well as patterns that differ across languages (e.g., conversations in Japanese discuss elder care and aging populations at higherthan-typical rates). Second, we use Clio to make our systems safer by identifying coordinated attempts to abuse our systems, monitoring for unknown unknowns during critical periods like launches of new capabilities or major world events, and improving our existing monitoring systems. We also discuss the limitations of our approach, as well as risks and ethical concerns. By enabling analysis of real-world AI usage, Clio provides a scalable platform for empirically grounded AI safety and governance.
Language	en
Library Catalog	Zotero
Date Added	12/20/2024, 11:00:37 AM
Modified	12/20/2024, 11:00:37 AM

Attachments

Tamkin et al. - Clio Privacy-Preserving Insights into Real-World .pdf

Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers

Item Type	Preprint
Author	Benedikt Stroebl
Author	Sayash Kapoor
Author	Arvind Narayanan
Abstract	Recent research has generated hope that inference scaling could allow weaker language models to match or exceed the accuracy of stronger models, such as by repeatedly sampling solutions to a coding problem until it passes unit tests. The central thesis of this paper is that there is no free lunch for inference scaling: indefinite accuracy improvement through resampling can only be realized if the "verifier" (in this case, a set of unit tests) is perfect. When the verifier is imperfect, as it almost always is in domains such as reasoning or coding (for example, unit tests have imperfect coverage), there is a nonzero probability of false positives: incorrect solutions that pass the verifier. Resampling cannot decrease this probability, so it imposes an upper bound to the accuracy of resampling-based inference scaling even with an infinite compute budget. We find that there is a very strong correlation between the model's single-sample accuracy (i.e. accuracy without unit tests) and its false positive rate on coding benchmarks HumanEval and MBPP, whose unit tests have limited coverage. Therefore, no amount of inference scaling of weaker models can enable them to match the single-sample accuracy of a sufficiently strong model (Fig. 1a). When we consider that false positives have a negative utility compared to abstaining from producing a solution, it bends the inference scaling curve further downward. Empirically, we find that the optimal number of samples can be less than 10 under realistic assumptions (Fig. 1b). Finally, we show that beyond accuracy, false positives may have other undesirable qualities, such as poor adherence to coding style conventions.
Date	2024-11-26
Short Title	Inference Scaling $\scriptsize\mathtt{F}$Laws
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.17501
Accessed	12/1/2024, 9:00:55 PM
Extra	arXiv:2411.17501
DOI	10.48550/arXiv.2411.17501
Repository	arXiv
Archive ID	arXiv:2411.17501
Date Added	12/1/2024, 9:00:55 PM
Modified	12/1/2024, 9:00:58 PM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Machine Learning

Attachments

Preprint PDF
Snapshot

Boundless Socratic Learning with Language Games

Item Type	Preprint
Author	Tom Schaul
Abstract	An agent trained within a closed system can master any desired capability, as long as the following three conditions hold: (a) it receives sufficiently informative and aligned feedback, (b) its coverage of experience/data is broad enough, and (c) it has sufficient capacity and resource. In this position paper, we justify these conditions, and consider what limitations arise from (a) and (b) in closed systems, when assuming that (c) is not a bottleneck. Considering the special case of agents with matching input and output spaces (namely, language), we argue that such pure recursive self-improvement, dubbed "Socratic learning", can boost performance vastly beyond what is present in its initial data or knowledge, and is only limited by time, as well as gradual misalignment concerns. Furthermore, we propose a constructive framework to implement it, based on the notion of language games.
Date	2024-11-25
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.16905
Accessed	12/1/2024, 9:04:53 PM
Extra	arXiv:2411.16905
DOI	10.48550/arXiv.2411.16905
Repository	arXiv
Archive ID	arXiv:2411.16905
Date Added	12/1/2024, 9:04:53 PM
Modified	12/1/2024, 9:04:53 PM

Tags:

Computer Science - Computation and Language
Computer Science - Artificial Intelligence

Attachments

Preprint PDF
Snapshot

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

Item Type	Preprint
Author	Laura Ruis
Author	Maximilian Mozes
Author	Juhan Bae
Author	Siddhartha Rao Kamalakara
Author	Dwarak Talupuru
Author	Acyr Locatelli
Author	Robert Kirk
Author	Tim Rocktäschel
Author	Edward Grefenstette
Author	Max Bartolo
Abstract	The capabilities and limitations of Large Language Models have been sketched out in great detail in recent years, providing an intriguing yet conflicting picture. On the one hand, LLMs demonstrate a general ability to solve problems. On the other hand, they show surprising reasoning gaps when compared to humans, casting doubt on the robustness of their generalisation strategies. The sheer volume of data used in the design of LLMs has precluded us from applying the method traditionally used to measure generalisation: train-test set separation. To overcome this, we study what kind of generalisation strategies LLMs employ when performing reasoning tasks by investigating the pretraining data they rely on. For two models of different sizes (7B and 35B) and 2.5B of their pretraining tokens, we identify what documents influence the model outputs for three simple mathematical reasoning tasks and contrast this to the data that are influential for answering factual questions. We find that, while the models rely on mostly distinct sets of data for each factual question, a document often has a similar influence across different reasoning questions within the same task, indicating the presence of procedural knowledge. We further find that the answers to factual questions often show up in the most influential data. However, for reasoning questions the answers usually do not show up as highly influential, nor do the answers to the intermediate reasoning steps. When we characterise the top ranked documents for the reasoning questions qualitatively, we confirm that the influential documents often contain procedural knowledge, like demonstrating how to obtain a solution using formulae or code. Our findings indicate that the approach to reasoning the models use is unlike retrieval, and more like a generalisable strategy that synthesises procedural knowledge from documents doing a similar form of reasoning.
Date	2024-11-19
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.12580
Accessed	12/1/2024, 8:37:01 PM
Extra	arXiv:2411.12580
DOI	10.48550/arXiv.2411.12580
Repository	arXiv
Archive ID	arXiv:2411.12580
Date Added	12/1/2024, 8:37:01 PM
Modified	12/1/2024, 8:37:03 PM

Tags:

Computer Science - Computation and Language
Computer Science - Machine Learning

Attachments

Preprint PDF
Snapshot

LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild

Item Type	Preprint
Author	Reworr
Author	Dmitrii Volkov
Abstract	We introduce the LLM Honeypot, a system for monitoring autonomous AI hacking agents. We deployed a customized SSH honeypot and applied prompt injections with temporal analysis to identify LLM-based agents among attackers. Over a trial run of a few weeks in a public environment, we collected 800,000 hacking attempts and 6 potential AI agents, which we plan to analyze in depth in future work. Our objectives aim to improve awareness of AI hacking agents and enhance preparedness for their risks.
Date	2024-10-17
Short Title	LLM Agent Honeypot
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.13919
Accessed	12/1/2024, 8:21:13 PM
Extra	arXiv:2410.13919
DOI	10.48550/arXiv.2410.13919
Repository	arXiv
Archive ID	arXiv:2410.13919
Date Added	12/1/2024, 8:21:13 PM
Modified	12/1/2024, 8:21:13 PM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Cryptography and Security

Attachments

Preprint PDF
Snapshot

Probabilistic weather forecasting with machine learning

Item Type	Journal Article
Author	Ilan Price
Author	Alvaro Sanchez-Gonzalez
Author	Ferran Alet
Author	Tom R. Andersson
Author	Andrew El-Kadi
Author	Dominic Masters
Author	Timo Ewalds
Author	Jacklynn Stott
Author	Shakir Mohamed
Author	Peter Battaglia
Author	Remi Lam
Author	Matthew Willson
Abstract	Weather forecasts are fundamentally uncertain, so predicting the range of probable weather scenarios is crucial for important decisions, from warning the public about hazardous weather to planning renewable energy use. Traditionally, weather forecasts have been based on numerical weather prediction (NWP)1, which relies on physics-based simulations of the atmosphere. Recent advances in machine learning (ML)-based weather prediction (MLWP) have produced ML-based models with less forecast error than single NWP simulations2,3. However, these advances have focused primarily on single, deterministic forecasts that fail to represent uncertainty and estimate risk. Overall, MLWP has remained less accurate and reliable than state-of-the-art NWP ensemble forecasts. Here we introduce GenCast, a probabilistic weather model with greater skill and speed than the top operational medium-range weather forecast in the world, ENS, the ensemble forecast of the European Centre for Medium-Range Weather Forecasts4. GenCast is an ML weather prediction method, trained on decades of reanalysis data. GenCast generates an ensemble of stochastic 15-day global forecasts, at 12-h steps and 0.25° latitude–longitude resolution, for more than 80 surface and atmospheric variables, in 8 min. It has greater skill than ENS on 97.2% of 1,320 targets we evaluated and better predicts extreme weather, tropical cyclone tracks and wind power production. This work helps open the next chapter in operational weather forecasting, in which crucial weather-dependent decisions are made more accurately and efficiently.
Date	2024-12-04
Language	en
Library Catalog	www.nature.com
URL	https://www.nature.com/articles/s41586-024-08252-9
Accessed	12/4/2024, 5:23:31 PM
Rights	2024 The Author(s)
Extra	Publisher: Nature Publishing Group
Pages	1-7
Publication	Nature
DOI	10.1038/s41586-024-08252-9
ISSN	1476-4687
Date Added	12/4/2024, 5:23:31 PM
Modified	12/4/2024, 5:23:31 PM

Tags:

Computer science
Atmospheric dynamics
Natural hazards

Attachments

Full Text PDF

Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists

Item Type	Preprint
Author	Michał Pietruszka
Author	Łukasz Borchmann
Author	Aleksander Jędrosz
Author	Paweł Morawiecki
Abstract	We present a benchmark for large language models designed to tackle one of the most knowledge-intensive tasks in data science: writing feature engineering code, which requires domain knowledge in addition to a deep understanding of the underlying problem and data structure. The model is provided with a dataset description in a prompt and asked to generate code transforming it. The evaluation score is derived from the improvement achieved by an XGBoost model fit on the modified dataset compared to the original data. By an extensive evaluation of state-of-the-art models and comparison to well-established benchmarks, we demonstrate that the FeatEng of our proposal can cheaply and efficiently assess the broad capabilities of LLMs, in contrast to the existing methods.
Date	2024-10-30
Short Title	Can Models Help Us Create Better Models?
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.23331
Accessed	12/1/2024, 8:54:11 PM
Extra	arXiv:2410.23331
DOI	10.48550/arXiv.2410.23331
Repository	arXiv
Archive ID	arXiv:2410.23331
Date Added	12/1/2024, 8:54:11 PM
Modified	12/1/2024, 8:54:11 PM

Tags:

Computer Science - Computation and Language

Attachments

Preprint PDF
Snapshot

The Reality of AI and Biorisk

Item Type	Preprint
Author	Aidan Peppin
Author	Anka Reuel
Author	Stephen Casper
Author	Elliot Jones
Author	Andrew Strait
Author	Usman Anwar
Author	Anurag Agrawal
Author	Sayash Kapoor
Author	Sanmi Koyejo
Author	Marie Pellat
Author	Rishi Bommasani
Author	Nick Frosst
Author	Sara Hooker
Abstract	To accurately and confidently answer the question 'could an AI model or system increase biorisk', it is necessary to have both a sound theoretical threat model for how AI models or systems could increase biorisk and a robust method for testing that threat model. This paper provides an analysis of existing available research surrounding two AI and biorisk threat models: 1) access to information and planning via large language models (LLMs), and 2) the use of AI-enabled biological tools (BTs) in synthesizing novel biological artifacts. We find that existing studies around AI-related biorisk are nascent, often speculative in nature, or limited in terms of their methodological maturity and transparency. The available literature suggests that current LLMs and BTs do not pose an immediate risk, and more work is needed to develop rigorous approaches to understanding how future models could increase biorisks. We end with recommendations about how empirical work can be expanded to more precisely target biorisk and ensure rigor and validity of findings.
Date	2024-12-02
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2412.01946
Accessed	12/4/2024, 5:18:58 PM
Extra	arXiv:2412.01946
DOI	10.48550/arXiv.2412.01946
Repository	arXiv
Archive ID	arXiv:2412.01946
Date Added	12/4/2024, 5:18:58 PM
Modified	12/4/2024, 5:18:58 PM

Tags:

Computer Science - Artificial Intelligence

Attachments

Preprint PDF
Snapshot

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics

Item Type	Preprint
Author	Yaniv Nikankin
Author	Anja Reusch
Author	Aaron Mueller
Author	Yonatan Belinkov
Abstract	Do large language models (LLMs) solve reasoning tasks by learning robust generalizable algorithms, or do they memorize training data? To investigate this question, we use arithmetic reasoning as a representative task. Using causal analysis, we identify a subset of the model (a circuit) that explains most of the model's behavior for basic arithmetic logic and examine its functionality. By zooming in on the level of individual circuit neurons, we discover a sparse set of important neurons that implement simple heuristics. Each heuristic identifies a numerical input pattern and outputs corresponding answers. We hypothesize that the combination of these heuristic neurons is the mechanism used to produce correct arithmetic answers. To test this, we categorize each neuron into several heuristic types-such as neurons that activate when an operand falls within a certain range-and find that the unordered combination of these heuristic types is the mechanism that explains most of the model's accuracy on arithmetic prompts. Finally, we demonstrate that this mechanism appears as the main source of arithmetic accuracy early in training. Overall, our experimental results across several LLMs show that LLMs perform arithmetic using neither robust algorithms nor memorization; rather, they rely on a "bag of heuristics".
Date	2024-10-28
Short Title	Arithmetic Without Algorithms
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.21272
Accessed	12/1/2024, 8:50:11 PM
Extra	arXiv:2410.21272
DOI	10.48550/arXiv.2410.21272
Repository	arXiv
Archive ID	arXiv:2410.21272
Date Added	12/1/2024, 8:50:11 PM
Modified	12/1/2024, 8:50:13 PM

Tags:

Computer Science - Computation and Language

Attachments

Preprint PDF
Snapshot

DynaSaur: Large Language Agents Beyond Predefined Actions

Item Type	Preprint
Author	Dang Nguyen
Author	Viet Dac Lai
Author	Seunghyun Yoon
Author	Ryan A. Rossi
Author	Handong Zhao
Author	Ruiyi Zhang
Author	Puneet Mathur
Author	Nedim Lipka
Author	Yu Wang
Author	Trung Bui
Author	Franck Dernoncourt
Author	Tianyi Zhou
Abstract	Existing LLM agent systems typically select actions from a fixed and predefined set at every step. While this approach is effective in closed, narrowly-scoped environments, we argue that it presents two major challenges when deploying LLM agents in real-world scenarios: (1) selecting from a fixed set of actions significantly restricts the planning and acting capabilities of LLM agents, and (2) this approach requires substantial human effort to enumerate and implement all possible actions, which becomes impractical in complex environments with a vast number of potential actions. In this work, we propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner. In this framework, the agent interacts with the environment by generating and executing programs written in a general-purpose programming language at each step. Furthermore, generated actions are accumulated over time for future reuse. Our extensive experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods. Notably, it allows an LLM agent to recover in scenarios where no relevant action exists in the predefined set or when existing actions fail due to unforeseen edge cases. At the time of writing, we hold the top position on the GAIA public leaderboard. Our code can be found in \href{https://github.com/adobe-research/dynasaur}{https://github.com/adobe-research/dynasaur}.
Date	2024-11-04
Short Title	DynaSaur
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.01747
Accessed	12/1/2024, 8:47:40 PM
Extra	arXiv:2411.01747
DOI	10.48550/arXiv.2411.01747
Repository	arXiv
Archive ID	arXiv:2411.01747
Date Added	12/1/2024, 8:47:40 PM
Modified	12/1/2024, 8:47:40 PM

Tags:

Computer Science - Computation and Language

Attachments

Preprint PDF
Snapshot

Reinforcement Learning: An Overview

Item Type	Preprint
Author	Kevin Murphy
Abstract	This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based RL, policy-gradient methods, model-based methods, and various other topics (including a very brief discussion of RL+LLMs).
Date	2024-12-06
Short Title	Reinforcement Learning
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2412.05265
Accessed	12/9/2024, 12:57:18 PM
Extra	arXiv:2412.05265 [cs]
DOI	10.48550/arXiv.2412.05265
Repository	arXiv
Archive ID	arXiv:2412.05265
Date Added	12/9/2024, 12:57:18 PM
Modified	12/9/2024, 12:57:19 PM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Machine Learning

Attachments

Preprint PDF
Snapshot

Secret Collusion among Generative AI Agents

Item Type	Preprint
Author	Sumeet Ramesh Motwani
Author	Mikhail Baranchuk
Author	Martin Strohmeier
Author	Vijay Bolina
Author	Philip H. S. Torr
Author	Lewis Hammond
Author	Christian Schroeder de Witt
Abstract	Recent capability increases in large language models (LLMs) open up applications in which groups of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models.
Date	2024-11-08
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2402.07510
Accessed	12/1/2024, 8:58:08 PM
Extra	arXiv:2402.07510
DOI	10.48550/arXiv.2402.07510
Repository	arXiv
Archive ID	arXiv:2402.07510
Date Added	12/1/2024, 8:58:08 PM
Modified	12/1/2024, 8:58:10 PM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Cryptography and Security

Attachments

Preprint PDF
Snapshot

Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations

Item Type	Preprint
Author	Evan Miller
Abstract	Evaluations are critical for understanding the capabilities of large language models (LLMs). Fundamentally, evaluations are experiments; but the literature on evaluations has largely ignored the literature from other sciences on experiment analysis and planning. This article shows researchers with some training in statistics how to think about and analyze data from language model evaluations. Conceptualizing evaluation questions as having been drawn from an unseen super-population, we present formulas for analyzing evaluation data, measuring differences between two models, and planning an evaluation experiment. We make a number of specific recommendations for running language model evaluations and reporting experiment results in a way that minimizes statistical noise and maximizes informativeness.
Date	2024-11-01
Short Title	Adding Error Bars to Evals
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.00640
Accessed	12/9/2024, 7:36:47 PM
Extra	arXiv:2411.00640 [stat]
DOI	10.48550/arXiv.2411.00640
Repository	arXiv
Archive ID	arXiv:2411.00640
Date Added	12/9/2024, 7:36:47 PM
Modified	12/9/2024, 7:36:50 PM

Tags:

Computer Science - Computation and Language
Statistics - Applications

Notes:

Comment: 14 pages

Attachments

Preprint PDF
Snapshot

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

Item Type	Preprint
Author	Michael Lan
Author	Philip Torr
Author	Austin Meek
Author	Ashkan Khakzar
Author	David Krueger
Author	Fazl Barez
Abstract	We investigate feature universality in large language models (LLMs), a research field that aims to understand how different models similarly represent concepts in the latent spaces of their intermediate layers. Demonstrating feature universality allows discoveries about latent representations to generalize across several models. However, comparing features across LLMs is challenging due to polysemanticity, in which individual neurons often correspond to multiple features rather than distinct ones. This makes it difficult to disentangle and match features across different models. To address this issue, we employ a method known as dictionary learning by using sparse autoencoders (SAEs) to transform LLM activations into more interpretable spaces spanned by neurons corresponding to individual features. After matching feature neurons across models via activation correlation, we apply representational space similarity metrics like Singular Value Canonical Correlation Analysis to analyze these SAE features across different LLMs. Our experiments reveal significant similarities in SAE feature spaces across various LLMs, providing new evidence for feature universality.
Date	2024-10-09
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.06981
Accessed	12/1/2024, 8:28:23 PM
Extra	arXiv:2410.06981
DOI	10.48550/arXiv.2410.06981
Repository	arXiv
Archive ID	arXiv:2410.06981
Date Added	12/1/2024, 8:28:23 PM
Modified	12/1/2024, 8:28:26 PM

Tags:

Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Computer Science - Machine Learning

Attachments

Preprint PDF
Snapshot

WindyLab/LLM-RL-Papers

Item Type	Software
Programmer	Intelligent Unmanned Systems Laboratory
Abstract	Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.
Date	2024-12-09T16:45:47Z
Library Catalog	GitHub
URL	https://github.com/WindyLab/LLM-RL-Papers
Accessed	12/9/2024, 12:59:35 PM
Extra	original-date: 2024-03-18T08:31:23Z
Date Added	12/9/2024, 12:59:35 PM
Modified	12/9/2024, 12:59:35 PM

Tags:

control
docs
llm
papers
reinfrocement-learning

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Item Type	Preprint
Author	Siyuan Hu
Author	Mingyu Ouyang
Author	Difei Gao
Author	Mike Zheng Shou
Abstract	The recently released model, Claude 3.5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent. As an early beta, its capability in the real-world complex environment remains unknown. In this case study to explore Claude 3.5 Computer Use, we curate and organize a collection of carefully designed tasks spanning a variety of domains and software. Observations from these cases demonstrate Claude 3.5 Computer Use's unprecedented ability in end-to-end language to desktop actions. Along with this study, we provide an out-of-the-box agent framework for deploying API-based GUI automation models with easy implementation. Our case studies aim to showcase a groundwork of capabilities and limitations of Claude 3.5 Computer Use with detailed analyses and bring to the fore questions about planning, action, and critic, which must be considered for future improvement. We hope this preliminary exploration will inspire future research into the GUI agent community. All the test cases in the paper can be tried through the project: https://github.com/showlab/computer_use_ootb.
Date	2024-11-15
Short Title	The Dawn of GUI Agent
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.10323
Accessed	12/1/2024, 8:33:30 PM
Extra	arXiv:2411.10323
DOI	10.48550/arXiv.2411.10323
Repository	arXiv
Archive ID	arXiv:2411.10323
Date Added	12/1/2024, 8:33:30 PM
Modified	12/1/2024, 8:33:30 PM

Tags:

Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition

Attachments

Full Text PDF
Snapshot

MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants

Item Type	Preprint
Author	John Heibel
Author	Daniel Lowd
Abstract	LLM-based programming assistants offer the promise of programming faster but with the risk of introducing more security vulnerabilities. Prior work has studied how LLMs could be maliciously fine-tuned to suggest vulnerabilities more often. With the rise of agentic LLMs, which may use results from an untrusted third party, there is a growing risk of attacks on the model's prompt. We introduce the Malicious Programming Prompt (MaPP) attack, in which an attacker adds a small amount of text to a prompt for a programming task (under 500 bytes). We show that our prompt strategy can cause an LLM to add vulnerabilities while continuing to write otherwise correct code. We evaluate three prompts on seven common LLMs, from basic to state-of-the-art commercial models. Using the HumanEval benchmark, we find that our prompts are broadly effective, with no customization required for different LLMs. Furthermore, the LLMs that are best at HumanEval are also best at following our malicious instructions, suggesting that simply scaling language models will not prevent MaPP attacks. Using a dataset of eight CWEs in 16 scenarios, we find that MaPP attacks are also effective at implementing specific and targeted vulnerabilities across a range of models. Our work highlights the need to secure LLM prompts against manipulation as well as rigorously auditing code generated with the help of LLMs.
Date	2024-07-12
Short Title	MaPPing Your Model
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2407.11072
Accessed	12/1/2024, 8:21:33 PM
Extra	arXiv:2407.11072
DOI	10.48550/arXiv.2407.11072
Repository	arXiv
Archive ID	arXiv:2407.11072
Date Added	12/1/2024, 8:21:33 PM
Modified	12/1/2024, 8:21:33 PM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Cryptography and Security

Attachments

Preprint PDF
Snapshot

Training Large Language Models to Reason in a Continuous Latent Space

Item Type	Preprint
Author	Shibo Hao
Author	Sainbayar Sukhbaatar
Author	DiJia Su
Author	Xian Li
Author	Zhiting Hu
Author	Jason Weston
Author	Yuandong Tian
Abstract	Large language models (LLMs) are restricted to reason in the "language space", where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. However, we argue that language space may not always be optimal for reasoning. For example, most word tokens are primarily for textual coherence and not essential for reasoning, while some critical tokens require complex planning and pose huge challenges to LLMs. To explore the potential of LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM as a representation of the reasoning state (termed "continuous thought"). Rather than decoding this into a word token, we feed it back to the LLM as the subsequent input embedding directly in the continuous space. Experiments show that Coconut can effectively augment the LLM on several reasoning tasks. This novel latent reasoning paradigm leads to emergent advanced reasoning patterns: the continuous thought can encode multiple alternative next reasoning steps, allowing the model to perform a breadth-first search (BFS) to solve the problem, rather than prematurely committing to a single deterministic path like CoT. Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference. These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.
Date	2024-12-09
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2412.06769
Accessed	12/11/2024, 10:19:43 AM
Extra	arXiv:2412.06769 [cs]
DOI	10.48550/arXiv.2412.06769
Repository	arXiv
Archive ID	arXiv:2412.06769
Date Added	12/11/2024, 10:19:43 AM
Modified	12/11/2024, 10:19:43 AM

Tags:

Computer Science - Computation and Language

Attachments

Preprint PDF
Snapshot

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Item Type	Preprint
Author	Yu Gu
Author	Boyuan Zheng
Author	Boyu Gou
Author	Kai Zhang
Author	Cheng Chang
Author	Sanjari Srivastava
Author	Yanan Xie
Author	Peng Qi
Author	Huan Sun
Author	Yu Su
Abstract	Language agents have demonstrated promising capabilities in automating web-based tasks, though their current reactive approaches still underperform largely compared to humans. While incorporating advanced planning algorithms, particularly tree search methods, could enhance these agents' performance, implementing tree search directly on live websites poses significant safety risks and practical constraints due to irreversible actions such as confirming a purchase. In this paper, we introduce a novel paradigm that augments language agents with model-based planning, pioneering the innovative use of large language models (LLMs) as world models in complex web environments. Our method, WebDreamer, builds on the key insight that LLMs inherently encode comprehensive knowledge about website structures and functionalities. Specifically, WebDreamer uses LLMs to simulate outcomes for each candidate action (e.g., "what would happen if I click this button?") using natural language descriptions, and then evaluates these imagined outcomes to determine the optimal action at each step. Empirical results on two representative web agent benchmarks with online interaction -- VisualWebArena and Mind2Web-live -- demonstrate that WebDreamer achieves substantial improvements over reactive baselines. By establishing the viability of LLMs as world models in web environments, this work lays the groundwork for a paradigm shift in automated web interaction. More broadly, our findings open exciting new avenues for future research into 1) optimizing LLMs specifically for world modeling in complex, dynamic environments, and 2) model-based speculative planning for language agents.
Date	2024-11-10
Short Title	Is Your LLM Secretly a World Model of the Internet?
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.06559
Accessed	12/1/2024, 8:40:56 PM
Extra	arXiv:2411.06559
DOI	10.48550/arXiv.2411.06559
Repository	arXiv
Archive ID	arXiv:2411.06559
Date Added	12/1/2024, 8:40:56 PM
Modified	12/1/2024, 8:40:56 PM

Tags:

Computer Science - Artificial Intelligence

Attachments

Preprint PDF
Snapshot

MISR: Measuring Instrumental Self-Reasoning in Frontier Models

Item Type	Preprint
Author	Kai Fronsdal
Author	David Lindner
Abstract	We propose a suite of tasks to evaluate the instrumental self-reasoning ability of large language model (LLM) agents. Instrumental self-reasoning ability could improve adaptability and enable self-modification, but it could also pose significant risks, such as enabling deceptive alignment. Prior work has only evaluated self-reasoning in non-agentic settings or in limited domains. In this paper, we propose evaluations for instrumental self-reasoning ability in agentic tasks in a wide range of scenarios, including self-modification, knowledge seeking, and opaque self-reasoning. We evaluate agents built using state-of-the-art LLMs, including commercial and open source systems. We find that instrumental self-reasoning ability emerges only in the most capable frontier models and that it is highly context-dependent. No model passes the the most difficult versions of our evaluations, hence our evaluation can be used to measure increases in instrumental self-reasoning ability in future models. We open-source our evaluations at https://github.com/kaifronsdal/Self-Reasoning-Evals.
Date	2024-12-05
Short Title	MISR
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2412.03904
Accessed	12/11/2024, 10:15:02 AM
Extra	arXiv:2412.03904 [cs]
DOI	10.48550/arXiv.2412.03904
Repository	arXiv
Archive ID	arXiv:2412.03904
Date Added	12/11/2024, 10:15:02 AM
Modified	12/11/2024, 10:15:12 AM

Tags:

Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Computer Science - Machine Learning

Notes:

Comment: 10 pages, 65 page appendix, 5 figures

Attachments

Preprint PDF
Snapshot

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Item Type	Preprint
Author	Javier Ferrando
Author	Oscar Obeso
Author	Senthooran Rajamanoharan
Author	Neel Nanda
Abstract	Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using sparse autoencoders as an interpretability tool, we discover that a key part of these mechanisms is entity recognition, where the model detects if an entity is one it can recall facts about. Sparse autoencoders uncover meaningful directions in the representation space, these detect whether the model recognizes an entity, e.g. detecting it doesn't know about an athlete or a movie. This suggests that models can have self-knowledge: internal representations about their own capabilities. These directions are causally relevant: capable of steering the model to refuse to answer questions about known entities, or to hallucinate attributes of unknown entities when it would otherwise refuse. We demonstrate that despite the sparse autoencoders being trained on the base model, these directions have a causal effect on the chat model's refusal behavior, suggesting that chat finetuning has repurposed this existing mechanism. Furthermore, we provide an initial exploration into the mechanistic role of these directions in the model, finding that they disrupt the attention of downstream heads that typically move entity attributes to the final token.
Date	2024-11-21
Short Title	Do I Know This Entity?
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.14257
Accessed	12/4/2024, 5:18:40 PM
Extra	arXiv:2411.14257
DOI	10.48550/arXiv.2411.14257
Repository	arXiv
Archive ID	arXiv:2411.14257
Date Added	12/4/2024, 5:18:40 PM
Modified	12/4/2024, 5:18:40 PM

Tags:

Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Computer Science - Machine Learning

Attachments

Preprint PDF
Snapshot

Model Integrity

Item Type	Blog Post
Author	Joe Edelman
Author	Oliver Klingefjord
Abstract	You may want a compliant assistant, but a co-founder with integrity. We propose ‘model integrity’ as an overlooked challenge in aligning LLM agents.
Date	2024-12-05
URL	https://meaningalignment.substack.com/p/model-integrity
Accessed	12/6/2024, 8:12:16 AM
Blog Title	Meaning Alignment Institute
Website Type	Substack newsletter
Date Added	12/6/2024, 8:12:16 AM
Modified	12/6/2024, 8:12:21 AM

Attachments

Snapshot

AgentOps: Enabling Observability of LLM Agents

Item Type	Preprint
Author	Liming Dong
Author	Qinghua Lu
Author	Liming Zhu
Abstract	Large language model (LLM) agents have demonstrated remarkable capabilities across various domains, gaining extensive attention from academia and industry. However, these agents raise significant concerns on AI safety due to their autonomous and non-deterministic behavior, as well as continuous evolving nature . From a DevOps perspective, enabling observability in agents is necessary to ensuring AI safety, as stakeholders can gain insights into the agents' inner workings, allowing them to proactively understand the agents, detect anomalies, and prevent potential failures. Therefore, in this paper, we present a comprehensive taxonomy of AgentOps, identifying the artifacts and associated data that should be traced throughout the entire lifecycle of agents to achieve effective observability. The taxonomy is developed based on a systematic mapping study of existing AgentOps tools. Our taxonomy serves as a reference template for developers to design and implement AgentOps infrastructure that supports monitoring, logging, and analytics. thereby ensuring AI safety.
Date	2024-11-30
Short Title	AgentOps
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.05285
Accessed	12/4/2024, 5:19:13 PM
Extra	arXiv:2411.05285 version: 2
DOI	10.48550/arXiv.2411.05285
Repository	arXiv
Archive ID	arXiv:2411.05285
Date Added	12/4/2024, 5:19:13 PM
Modified	12/4/2024, 5:19:13 PM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Software Engineering

Attachments

Full Text PDF
Snapshot

Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning

Item Type	Journal Article
Author	Alex Beutel
Author	Kai Xiao
Author	Johannes Heidecke
Author	Lilian Weng
Abstract	Automated red teaming can discover rare model failures and generate challenging examples that can be used for training or evaluation. However, a core challenge in automated red teaming is ensuring that the attacks are both diverse and effective. Prior methods typically succeed in optimizing either for diversity or for effectiveness, but rarely both. In this paper, we provide methods that enable automated red teaming to generate a large number of diverse and successful attacks.
Language	en
Library Catalog	Zotero
Date Added	12/1/2024, 8:42:36 PM
Modified	12/1/2024, 8:42:36 PM

Attachments

Beutel et al. - Diverse and Effective Red Teaming with Auto-genera.pdf

Troubling Taxonomies in GenAI Evaluation

Item Type	Preprint
Author	Glen Berman
Author	Ned Cooper
Author	Wesley Hanwen Deng
Author	Ben Hutchinson
Abstract	To evaluate the societal impacts of GenAI requires a model of how social harms emerge from interactions between GenAI, people, and societal structures. Yet a model is rarely explicitly defined in societal impact evaluations, or in the taxonomies of societal impacts that support them. In this provocation, we argue that societal impacts should be conceptualised as application- and context-specific, incommensurable, and shaped by questions of social power. Doing so leads us to conclude that societal impact evaluations using existing taxonomies are inherently limited, in terms of their potential to reveal how GenAI systems may interact with people when introduced into specific social contexts. We therefore propose a governance-first approach to managing societal harms attended by GenAI technologies.
Date	2024-10-30
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.22985
Accessed	12/11/2024, 10:10:33 PM
Extra	arXiv:2410.22985 [cs]
DOI	10.48550/arXiv.2410.22985
Repository	arXiv
Archive ID	arXiv:2410.22985
Date Added	12/11/2024, 10:10:33 PM
Modified	12/11/2024, 10:10:36 PM

Tags:

Computer Science - Human-Computer Interaction

Notes:

Comment: 3 pages

Attachments

Preprint PDF
Snapshot

Challenges in Human-Agent Communication

Item Type	Journal Article
Author	Gagan Bansal
Author	Jennifer Wortman Vaughan
Author	Saleema Amershi
Author	Eric Horvitz
Author	Adam Fourney
Author	Hussein Mozannar
Author	Victor Dibia
Author	Daniel S. Weld
Abstract	Explore key challenges in human-agent communication with generative AI and autonomous agents. Learn about transparency, control, and challenges for improving human-AI interaction.
Date	2024/12/01
Language	en-US
Library Catalog	www.microsoft.com
URL	https://www.microsoft.com/en-us/research/publication/human-agent-interaction-challenges/
Accessed	12/4/2024, 5:19:46 PM
Date Added	12/4/2024, 5:19:45 PM
Modified	12/4/2024, 5:19:45 PM

Attachments

Full Text PDF

OpenAI’s Approach to External Red Teaming for AI Models and Systems

Item Type	Journal Article
Author	Lama Ahmad
Author	Sandhini Agarwal
Author	Michael Lampe
Author	Pamela Mishkin
Abstract	Red teaming has emerged as a critical practice in assessing the possible risks of AI models and systems. It aids in the discovery of novel risks, stress testing possible gaps in existing mitigations, enriching existing quantitative safety metrics, facilitating the creation of new safety measurements, and enhancing public trust and the legitimacy of AI risk assessments. This white paper describes OpenAI’s work to date in external red teaming and draws some more general conclusions from this work. We describe the design considerations underpinning external red teaming, which include: selecting composition of red team, deciding on access levels, and providing guidance required to conduct red teaming. Additionally, we show outcomes red teaming can enable such as input into risk assessment and automated evaluations. We also describe the limitations of external red teaming, and how it can ﬁt into a broader range of AI model and system evaluations. Through these contributions, we hope that AI developers and deployers, evaluation creators, and policymakers will be able to better design red teaming campaigns and get a deeper look into how external red teaming can ﬁt into model deployment and evaluation processes. These methods are evolving and the value of diﬀerent methods continues to shift as the ecosystem around red teaming matures and models themselves improve as tools for red teaming.
Language	en
Library Catalog	Zotero
Date Added	12/1/2024, 8:42:16 PM
Modified	12/1/2024, 8:42:16 PM

Attachments

Ahmad et al. - OpenAI’s Approach to External Red Teaming for AI M.pdf

AI models work together faster when they speak their own language

Item Type	Web Page
Author	#author.fullName}
Abstract	Letting AI models communicate with each other in their internal mathematical language, rather than translating back and forth to English, could accelerate their task-solving abilities
Language	en-US
URL	https://www.newscientist.com/article/2455173-ai-models-work-together-faster-when-they-speak-their-own-language/
Accessed	12/1/2024, 8:52:55 PM
Website Title	New Scientist
Date Added	12/1/2024, 8:52:55 PM
Modified	12/1/2024, 8:53:00 PM

Attachments

Snapshot