Item Type | Preprint |
---|---|
Author | Yu Zhao |
Author | Huifeng Yin |
Author | Bo Zeng |
Author | Hao Wang |
Author | Tianqi Shi |
Author | Chenyang Lyu |
Author | Longyue Wang |
Author | Weihua Luo |
Author | Kaifu Zhang |
Abstract | Currently OpenAI o1 sparks a surge of interest in the study of large reasoning models (LRM). Building on this momentum, Marco-o1 not only focuses on disciplines with standard answers, such as mathematics, physics, and coding -- which are well-suited for reinforcement learning (RL) -- but also places greater emphasis on open-ended resolutions. We aim to address the question: ''Can the o1 model effectively generalize to broader domains where clear standards are absent and rewards are challenging to quantify?'' Marco-o1 is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies -- optimized for complex real-world problem-solving tasks. |
Date | 2024-11-25 |
Short Title | Marco-o1 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.14405 |
Accessed | 12/1/2024, 8:45:15 PM |
Extra | arXiv:2411.14405 |
DOI | 10.48550/arXiv.2411.14405 |
Repository | arXiv |
Archive ID | arXiv:2411.14405 |
Date Added | 12/1/2024, 8:45:15 PM |
Modified | 12/1/2024, 8:45:22 PM |
Item Type | Preprint |
---|---|
Author | Jingyu Zhang |
Author | Ahmed Elgohary |
Author | Ahmed Magooda |
Author | Daniel Khashabi |
Author | Benjamin Van Durme |
Abstract | The current paradigm for safety alignment of large language models (LLMs) follows a one-size-fits-all approach: the model refuses to interact with any content deemed unsafe by the model provider. This approach lacks flexibility in the face of varying social norms across cultures and regions. In addition, users may have diverse safety needs, making a model with static safety standards too restrictive to be useful, as well as too costly to be re-aligned. We propose Controllable Safety Alignment (CoSA), a framework designed to adapt models to diverse safety requirements without re-training. Instead of aligning a fixed model, we align models to follow safety configs -- free-form natural language descriptions of the desired safety behaviors -- that are provided as part of the system prompt. To adjust model safety behavior, authorized users only need to modify such safety configs at inference time. To enable that, we propose CoSAlign, a data-centric method for aligning LLMs to easily adapt to diverse safety configs. Furthermore, we devise a novel controllability evaluation protocol that considers both helpfulness and configured safety, summarizing them into CoSA-Score, and construct CoSApien, a human-authored benchmark that consists of real-world LLM use cases with diverse safety requirements and corresponding evaluation prompts. We show that CoSAlign leads to substantial gains of controllability over strong baselines including in-context alignment. Our framework encourages better representation and adaptation to pluralistic human values in LLMs, and thereby increasing their practicality. |
Date | 2024-10-11 |
Short Title | Controllable Safety Alignment |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.08968 |
Accessed | 12/1/2024, 8:30:25 PM |
Extra | arXiv:2410.08968 version: 1 |
DOI | 10.48550/arXiv.2410.08968 |
Repository | arXiv |
Archive ID | arXiv:2410.08968 |
Date Added | 12/1/2024, 8:30:25 PM |
Modified | 12/1/2024, 8:30:25 PM |
Item Type | Preprint |
---|---|
Author | Chaoyun Zhang |
Author | Shilin He |
Author | Jiaxu Qian |
Author | Bowen Li |
Author | Liqun Li |
Author | Si Qin |
Author | Yu Kang |
Author | Minghua Ma |
Author | Guyue Liu |
Author | Qingwei Lin |
Author | Saravan Rajmohan |
Author | Dongmei Zhang |
Author | Qi Zhang |
Abstract | GUIs have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. The advent of LLMs, particularly multimodal models, has ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing. This has paved the way for a new generation of LLM-brained GUI agents capable of interpreting complex GUI elements and autonomously executing actions based on natural language instructions. These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands. Their applications span across web navigation, mobile app interactions, and desktop automation, offering a transformative user experience that revolutionizes how individuals interact with software. This emerging field is rapidly advancing, with significant progress in both research and industry. To provide a structured understanding of this trend, this paper presents a comprehensive survey of LLM-brained GUI agents, exploring their historical evolution, core components, and advanced techniques. We address research questions such as existing GUI agent frameworks, the collection and utilization of data for training specialized GUI agents, the development of large action models tailored for GUI tasks, and the evaluation metrics and benchmarks necessary to assess their effectiveness. Additionally, we examine emerging applications powered by these agents. Through a detailed analysis, this survey identifies key research gaps and outlines a roadmap for future advancements in the field. By consolidating foundational knowledge and state-of-the-art developments, this work aims to guide both researchers and practitioners in overcoming challenges and unlocking the full potential of LLM-brained GUI agents. |
Date | 2024-11-28 |
Short Title | Large Language Model-Brained GUI Agents |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.18279 |
Accessed | 12/2/2024, 3:44:29 PM |
Extra | arXiv:2411.18279 |
DOI | 10.48550/arXiv.2411.18279 |
Repository | arXiv |
Archive ID | arXiv:2411.18279 |
Date Added | 12/2/2024, 3:44:29 PM |
Modified | 12/2/2024, 3:44:33 PM |
Item Type | Preprint |
---|---|
Author | Ziwei Xu |
Author | Sanjay Jain |
Author | Mohan Kankanhalli |
Abstract | Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all of the computable functions and will therefore always hallucinate. Since the formal world is a part of the real world which is much more complicated, hallucinations are also inevitable for real world LLMs. Furthermore, for real world LLMs constrained by provable time complexity, we describe the hallucination-prone tasks and empirically validate our claims. Finally, using the formal world framework, we discuss the possible mechanisms and efficacies of existing hallucination mitigators as well as the practical implications on the safe deployment of LLMs. |
Date | 2024-01-22 |
Short Title | Hallucination is Inevitable |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2401.11817 |
Accessed | 12/8/2024, 7:55:44 PM |
Extra | arXiv:2401.11817 [cs] |
DOI | 10.48550/arXiv.2401.11817 |
Repository | arXiv |
Archive ID | arXiv:2401.11817 |
Date Added | 12/8/2024, 7:55:44 PM |
Modified | 12/8/2024, 7:55:54 PM |
Item Type | Preprint |
---|---|
Author | Ruihan Wu |
Author | Chhavi Yadav |
Author | Russ Salakhutdinov |
Author | Kamalika Chaudhuri |
Abstract | Machine unlearning is a key requirement of many data protection regulations such as GDPR. Prior work on unlearning has mostly considered superficial unlearning tasks where a single or a few related pieces of information are required to be removed. However, the task of unlearning a fact is much more challenging in recent large language models (LLMs), because the facts in LLMs can be deduced from each other. In this work, we investigate whether current unlearning methods for LLMs succeed beyond superficial unlearning of facts. Specifically, we formally propose a framework and a definition for deep unlearning facts that are interrelated. We design the metric, recall, to quantify the extent of deep unlearning. To systematically evaluate deep unlearning, we construct a synthetic dataset EDU-RELAT, which consists of a synthetic knowledge base of family relationships and biographies, together with a realistic logical rule set that connects them. We use this dataset to test four unlearning methods in four LLMs at different sizes. Our findings reveal that in the task of deep unlearning only a single fact, they either fail to properly unlearn with high recall, or end up unlearning many other irrelevant facts. Our dataset and code are publicly available at: https://github.com/wrh14/deep_unlearning. |
Date | 2024-11-09 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.15153 |
Accessed | 12/4/2024, 5:20:13 PM |
Extra | arXiv:2410.15153 |
DOI | 10.48550/arXiv.2410.15153 |
Repository | arXiv |
Archive ID | arXiv:2410.15153 |
Date Added | 12/4/2024, 5:20:13 PM |
Modified | 12/4/2024, 5:20:13 PM |
Item Type | Preprint |
---|---|
Author | Risto Uuk |
Author | Carlos Ignacio Gutierrez |
Author | Lode Lauwaert |
Author | Carina Prunkl |
Author | Lucia Velasco |
Abstract | Through a systematic review of academic literature, we propose a taxonomy of systemic risks associated with artificial intelligence (AI), in particular general-purpose AI. Following the EU AI Act's definition, we consider systemic risks as large-scale threats that can affect entire societies or economies. Starting with an initial pool of 1,781 documents, we analyzed 86 selected papers to identify 13 categories of systemic risks and 50 contributing sources. Our findings reveal a complex landscape of potential threats, ranging from environmental harm and structural discrimination to governance failures and loss of control. Key sources of systemic risk emerge from knowledge gaps, challenges in recognizing harm, and the unpredictable trajectory of AI development. The taxonomy provides a snapshot of current academic literature on systemic risks. This paper contributes to AI safety research by providing a structured groundwork for understanding and addressing the potential large-scale negative societal impacts of general-purpose AI. The taxonomy can inform policymakers in risk prioritization and regulatory development. |
Date | 2024-11-22 |
Language | en |
Library Catalog | Social Science Research Network |
URL | https://papers.ssrn.com/abstract=5030173 |
Accessed | 12/4/2024, 5:20:28 PM |
Place | Rochester, NY |
Genre | SSRN Scholarly Paper |
Archive ID | 5030173 |
Date Added | 12/4/2024, 5:20:28 PM |
Modified | 12/4/2024, 5:20:28 PM |
Item Type | Journal Article |
---|---|
Author | Alex Tamkin |
Author | Miles McCain |
Author | Kunal Handa |
Author | Esin Durmus |
Author | Liane Lovitt |
Author | Ankur Rathi |
Author | Saffron Huang |
Author | Alfred Mountfield |
Author | Jerry Hong |
Author | Stuart Ritchie |
Author | Michael Stern |
Author | Brian Clarke |
Author | Landon Goldberg |
Author | Theodore R Sumers |
Author | Jared Mueller |
Author | William McEachen |
Author | Wes Mitchell |
Author | Shan Carter |
Author | Jack Clark |
Author | Jared Kaplan |
Author | Deep Ganguli |
Abstract | How are AI assistants being used in the real world? While model providers in theory have a window into this impact via their users’ data, both privacy concerns and practical challenges have made analyzing this data difficult. To address these issues, we present Clio (Claude insights and observations), a privacy-preserving platform that uses AI assistants themselves to analyze and surface aggregated usage patterns across millions of conversations, without the need for human reviewers to read raw conversations. We validate this can be done with a high degree of accuracy and privacy by conducting extensive evaluations. We demonstrate Clio’s usefulness in two broad ways. First, we share insights about how models are being used in the real world from one million Claude.ai Free and Pro conversations, ranging from providing advice on hairstyles to providing guidance on Git operations and concepts. We also identify the most common high-level use cases on Claude.ai (coding, writing, and research tasks) as well as patterns that differ across languages (e.g., conversations in Japanese discuss elder care and aging populations at higherthan-typical rates). Second, we use Clio to make our systems safer by identifying coordinated attempts to abuse our systems, monitoring for unknown unknowns during critical periods like launches of new capabilities or major world events, and improving our existing monitoring systems. We also discuss the limitations of our approach, as well as risks and ethical concerns. By enabling analysis of real-world AI usage, Clio provides a scalable platform for empirically grounded AI safety and governance. |
Language | en |
Library Catalog | Zotero |
Date Added | 12/20/2024, 11:00:37 AM |
Modified | 12/20/2024, 11:00:37 AM |
Item Type | Preprint |
---|---|
Author | Benedikt Stroebl |
Author | Sayash Kapoor |
Author | Arvind Narayanan |
Abstract | Recent research has generated hope that inference scaling could allow weaker language models to match or exceed the accuracy of stronger models, such as by repeatedly sampling solutions to a coding problem until it passes unit tests. The central thesis of this paper is that there is no free lunch for inference scaling: indefinite accuracy improvement through resampling can only be realized if the "verifier" (in this case, a set of unit tests) is perfect. When the verifier is imperfect, as it almost always is in domains such as reasoning or coding (for example, unit tests have imperfect coverage), there is a nonzero probability of false positives: incorrect solutions that pass the verifier. Resampling cannot decrease this probability, so it imposes an upper bound to the accuracy of resampling-based inference scaling even with an infinite compute budget. We find that there is a very strong correlation between the model's single-sample accuracy (i.e. accuracy without unit tests) and its false positive rate on coding benchmarks HumanEval and MBPP, whose unit tests have limited coverage. Therefore, no amount of inference scaling of weaker models can enable them to match the single-sample accuracy of a sufficiently strong model (Fig. 1a). When we consider that false positives have a negative utility compared to abstaining from producing a solution, it bends the inference scaling curve further downward. Empirically, we find that the optimal number of samples can be less than 10 under realistic assumptions (Fig. 1b). Finally, we show that beyond accuracy, false positives may have other undesirable qualities, such as poor adherence to coding style conventions. |
Date | 2024-11-26 |
Short Title | Inference Scaling $\scriptsize\mathtt{F}$Laws |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.17501 |
Accessed | 12/1/2024, 9:00:55 PM |
Extra | arXiv:2411.17501 |
DOI | 10.48550/arXiv.2411.17501 |
Repository | arXiv |
Archive ID | arXiv:2411.17501 |
Date Added | 12/1/2024, 9:00:55 PM |
Modified | 12/1/2024, 9:00:58 PM |
Item Type | Preprint |
---|---|
Author | Tom Schaul |
Abstract | An agent trained within a closed system can master any desired capability, as long as the following three conditions hold: (a) it receives sufficiently informative and aligned feedback, (b) its coverage of experience/data is broad enough, and (c) it has sufficient capacity and resource. In this position paper, we justify these conditions, and consider what limitations arise from (a) and (b) in closed systems, when assuming that (c) is not a bottleneck. Considering the special case of agents with matching input and output spaces (namely, language), we argue that such pure recursive self-improvement, dubbed "Socratic learning", can boost performance vastly beyond what is present in its initial data or knowledge, and is only limited by time, as well as gradual misalignment concerns. Furthermore, we propose a constructive framework to implement it, based on the notion of language games. |
Date | 2024-11-25 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.16905 |
Accessed | 12/1/2024, 9:04:53 PM |
Extra | arXiv:2411.16905 |
DOI | 10.48550/arXiv.2411.16905 |
Repository | arXiv |
Archive ID | arXiv:2411.16905 |
Date Added | 12/1/2024, 9:04:53 PM |
Modified | 12/1/2024, 9:04:53 PM |
Item Type | Preprint |
---|---|
Author | Laura Ruis |
Author | Maximilian Mozes |
Author | Juhan Bae |
Author | Siddhartha Rao Kamalakara |
Author | Dwarak Talupuru |
Author | Acyr Locatelli |
Author | Robert Kirk |
Author | Tim Rocktäschel |
Author | Edward Grefenstette |
Author | Max Bartolo |
Abstract | The capabilities and limitations of Large Language Models have been sketched out in great detail in recent years, providing an intriguing yet conflicting picture. On the one hand, LLMs demonstrate a general ability to solve problems. On the other hand, they show surprising reasoning gaps when compared to humans, casting doubt on the robustness of their generalisation strategies. The sheer volume of data used in the design of LLMs has precluded us from applying the method traditionally used to measure generalisation: train-test set separation. To overcome this, we study what kind of generalisation strategies LLMs employ when performing reasoning tasks by investigating the pretraining data they rely on. For two models of different sizes (7B and 35B) and 2.5B of their pretraining tokens, we identify what documents influence the model outputs for three simple mathematical reasoning tasks and contrast this to the data that are influential for answering factual questions. We find that, while the models rely on mostly distinct sets of data for each factual question, a document often has a similar influence across different reasoning questions within the same task, indicating the presence of procedural knowledge. We further find that the answers to factual questions often show up in the most influential data. However, for reasoning questions the answers usually do not show up as highly influential, nor do the answers to the intermediate reasoning steps. When we characterise the top ranked documents for the reasoning questions qualitatively, we confirm that the influential documents often contain procedural knowledge, like demonstrating how to obtain a solution using formulae or code. Our findings indicate that the approach to reasoning the models use is unlike retrieval, and more like a generalisable strategy that synthesises procedural knowledge from documents doing a similar form of reasoning. |
Date | 2024-11-19 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.12580 |
Accessed | 12/1/2024, 8:37:01 PM |
Extra | arXiv:2411.12580 |
DOI | 10.48550/arXiv.2411.12580 |
Repository | arXiv |
Archive ID | arXiv:2411.12580 |
Date Added | 12/1/2024, 8:37:01 PM |
Modified | 12/1/2024, 8:37:03 PM |
Item Type | Preprint |
---|---|
Author | Reworr |
Author | Dmitrii Volkov |
Abstract | We introduce the LLM Honeypot, a system for monitoring autonomous AI hacking agents. We deployed a customized SSH honeypot and applied prompt injections with temporal analysis to identify LLM-based agents among attackers. Over a trial run of a few weeks in a public environment, we collected 800,000 hacking attempts and 6 potential AI agents, which we plan to analyze in depth in future work. Our objectives aim to improve awareness of AI hacking agents and enhance preparedness for their risks. |
Date | 2024-10-17 |
Short Title | LLM Agent Honeypot |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.13919 |
Accessed | 12/1/2024, 8:21:13 PM |
Extra | arXiv:2410.13919 |
DOI | 10.48550/arXiv.2410.13919 |
Repository | arXiv |
Archive ID | arXiv:2410.13919 |
Date Added | 12/1/2024, 8:21:13 PM |
Modified | 12/1/2024, 8:21:13 PM |
Item Type | Journal Article |
---|---|
Author | Ilan Price |
Author | Alvaro Sanchez-Gonzalez |
Author | Ferran Alet |
Author | Tom R. Andersson |
Author | Andrew El-Kadi |
Author | Dominic Masters |
Author | Timo Ewalds |
Author | Jacklynn Stott |
Author | Shakir Mohamed |
Author | Peter Battaglia |
Author | Remi Lam |
Author | Matthew Willson |
Abstract | Weather forecasts are fundamentally uncertain, so predicting the range of probable weather scenarios is crucial for important decisions, from warning the public about hazardous weather to planning renewable energy use. Traditionally, weather forecasts have been based on numerical weather prediction (NWP)1, which relies on physics-based simulations of the atmosphere. Recent advances in machine learning (ML)-based weather prediction (MLWP) have produced ML-based models with less forecast error than single NWP simulations2,3. However, these advances have focused primarily on single, deterministic forecasts that fail to represent uncertainty and estimate risk. Overall, MLWP has remained less accurate and reliable than state-of-the-art NWP ensemble forecasts. Here we introduce GenCast, a probabilistic weather model with greater skill and speed than the top operational medium-range weather forecast in the world, ENS, the ensemble forecast of the European Centre for Medium-Range Weather Forecasts4. GenCast is an ML weather prediction method, trained on decades of reanalysis data. GenCast generates an ensemble of stochastic 15-day global forecasts, at 12-h steps and 0.25° latitude–longitude resolution, for more than 80 surface and atmospheric variables, in 8 min. It has greater skill than ENS on 97.2% of 1,320 targets we evaluated and better predicts extreme weather, tropical cyclone tracks and wind power production. This work helps open the next chapter in operational weather forecasting, in which crucial weather-dependent decisions are made more accurately and efficiently. |
Date | 2024-12-04 |
Language | en |
Library Catalog | www.nature.com |
URL | https://www.nature.com/articles/s41586-024-08252-9 |
Accessed | 12/4/2024, 5:23:31 PM |
Rights | 2024 The Author(s) |
Extra | Publisher: Nature Publishing Group |
Pages | 1-7 |
Publication | Nature |
DOI | 10.1038/s41586-024-08252-9 |
ISSN | 1476-4687 |
Date Added | 12/4/2024, 5:23:31 PM |
Modified | 12/4/2024, 5:23:31 PM |
Item Type | Preprint |
---|---|
Author | Michał Pietruszka |
Author | Łukasz Borchmann |
Author | Aleksander Jędrosz |
Author | Paweł Morawiecki |
Abstract | We present a benchmark for large language models designed to tackle one of the most knowledge-intensive tasks in data science: writing feature engineering code, which requires domain knowledge in addition to a deep understanding of the underlying problem and data structure. The model is provided with a dataset description in a prompt and asked to generate code transforming it. The evaluation score is derived from the improvement achieved by an XGBoost model fit on the modified dataset compared to the original data. By an extensive evaluation of state-of-the-art models and comparison to well-established benchmarks, we demonstrate that the FeatEng of our proposal can cheaply and efficiently assess the broad capabilities of LLMs, in contrast to the existing methods. |
Date | 2024-10-30 |
Short Title | Can Models Help Us Create Better Models? |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.23331 |
Accessed | 12/1/2024, 8:54:11 PM |
Extra | arXiv:2410.23331 |
DOI | 10.48550/arXiv.2410.23331 |
Repository | arXiv |
Archive ID | arXiv:2410.23331 |
Date Added | 12/1/2024, 8:54:11 PM |
Modified | 12/1/2024, 8:54:11 PM |
Item Type | Preprint |
---|---|
Author | Aidan Peppin |
Author | Anka Reuel |
Author | Stephen Casper |
Author | Elliot Jones |
Author | Andrew Strait |
Author | Usman Anwar |
Author | Anurag Agrawal |
Author | Sayash Kapoor |
Author | Sanmi Koyejo |
Author | Marie Pellat |
Author | Rishi Bommasani |
Author | Nick Frosst |
Author | Sara Hooker |
Abstract | To accurately and confidently answer the question 'could an AI model or system increase biorisk', it is necessary to have both a sound theoretical threat model for how AI models or systems could increase biorisk and a robust method for testing that threat model. This paper provides an analysis of existing available research surrounding two AI and biorisk threat models: 1) access to information and planning via large language models (LLMs), and 2) the use of AI-enabled biological tools (BTs) in synthesizing novel biological artifacts. We find that existing studies around AI-related biorisk are nascent, often speculative in nature, or limited in terms of their methodological maturity and transparency. The available literature suggests that current LLMs and BTs do not pose an immediate risk, and more work is needed to develop rigorous approaches to understanding how future models could increase biorisks. We end with recommendations about how empirical work can be expanded to more precisely target biorisk and ensure rigor and validity of findings. |
Date | 2024-12-02 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2412.01946 |
Accessed | 12/4/2024, 5:18:58 PM |
Extra | arXiv:2412.01946 |
DOI | 10.48550/arXiv.2412.01946 |
Repository | arXiv |
Archive ID | arXiv:2412.01946 |
Date Added | 12/4/2024, 5:18:58 PM |
Modified | 12/4/2024, 5:18:58 PM |
Item Type | Preprint |
---|---|
Author | Yaniv Nikankin |
Author | Anja Reusch |
Author | Aaron Mueller |
Author | Yonatan Belinkov |
Abstract | Do large language models (LLMs) solve reasoning tasks by learning robust generalizable algorithms, or do they memorize training data? To investigate this question, we use arithmetic reasoning as a representative task. Using causal analysis, we identify a subset of the model (a circuit) that explains most of the model's behavior for basic arithmetic logic and examine its functionality. By zooming in on the level of individual circuit neurons, we discover a sparse set of important neurons that implement simple heuristics. Each heuristic identifies a numerical input pattern and outputs corresponding answers. We hypothesize that the combination of these heuristic neurons is the mechanism used to produce correct arithmetic answers. To test this, we categorize each neuron into several heuristic types-such as neurons that activate when an operand falls within a certain range-and find that the unordered combination of these heuristic types is the mechanism that explains most of the model's accuracy on arithmetic prompts. Finally, we demonstrate that this mechanism appears as the main source of arithmetic accuracy early in training. Overall, our experimental results across several LLMs show that LLMs perform arithmetic using neither robust algorithms nor memorization; rather, they rely on a "bag of heuristics". |
Date | 2024-10-28 |
Short Title | Arithmetic Without Algorithms |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.21272 |
Accessed | 12/1/2024, 8:50:11 PM |
Extra | arXiv:2410.21272 |
DOI | 10.48550/arXiv.2410.21272 |
Repository | arXiv |
Archive ID | arXiv:2410.21272 |
Date Added | 12/1/2024, 8:50:11 PM |
Modified | 12/1/2024, 8:50:13 PM |
Item Type | Preprint |
---|---|
Author | Dang Nguyen |
Author | Viet Dac Lai |
Author | Seunghyun Yoon |
Author | Ryan A. Rossi |
Author | Handong Zhao |
Author | Ruiyi Zhang |
Author | Puneet Mathur |
Author | Nedim Lipka |
Author | Yu Wang |
Author | Trung Bui |
Author | Franck Dernoncourt |
Author | Tianyi Zhou |
Abstract | Existing LLM agent systems typically select actions from a fixed and predefined set at every step. While this approach is effective in closed, narrowly-scoped environments, we argue that it presents two major challenges when deploying LLM agents in real-world scenarios: (1) selecting from a fixed set of actions significantly restricts the planning and acting capabilities of LLM agents, and (2) this approach requires substantial human effort to enumerate and implement all possible actions, which becomes impractical in complex environments with a vast number of potential actions. In this work, we propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner. In this framework, the agent interacts with the environment by generating and executing programs written in a general-purpose programming language at each step. Furthermore, generated actions are accumulated over time for future reuse. Our extensive experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods. Notably, it allows an LLM agent to recover in scenarios where no relevant action exists in the predefined set or when existing actions fail due to unforeseen edge cases. At the time of writing, we hold the top position on the GAIA public leaderboard. Our code can be found in \href{https://github.com/adobe-research/dynasaur}{https://github.com/adobe-research/dynasaur}. |
Date | 2024-11-04 |
Short Title | DynaSaur |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.01747 |
Accessed | 12/1/2024, 8:47:40 PM |
Extra | arXiv:2411.01747 |
DOI | 10.48550/arXiv.2411.01747 |
Repository | arXiv |
Archive ID | arXiv:2411.01747 |
Date Added | 12/1/2024, 8:47:40 PM |
Modified | 12/1/2024, 8:47:40 PM |
Item Type | Preprint |
---|---|
Author | Kevin Murphy |
Abstract | This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based RL, policy-gradient methods, model-based methods, and various other topics (including a very brief discussion of RL+LLMs). |
Date | 2024-12-06 |
Short Title | Reinforcement Learning |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2412.05265 |
Accessed | 12/9/2024, 12:57:18 PM |
Extra | arXiv:2412.05265 [cs] |
DOI | 10.48550/arXiv.2412.05265 |
Repository | arXiv |
Archive ID | arXiv:2412.05265 |
Date Added | 12/9/2024, 12:57:18 PM |
Modified | 12/9/2024, 12:57:19 PM |
Item Type | Preprint |
---|---|
Author | Sumeet Ramesh Motwani |
Author | Mikhail Baranchuk |
Author | Martin Strohmeier |
Author | Vijay Bolina |
Author | Philip H. S. Torr |
Author | Lewis Hammond |
Author | Christian Schroeder de Witt |
Abstract | Recent capability increases in large language models (LLMs) open up applications in which groups of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models. |
Date | 2024-11-08 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2402.07510 |
Accessed | 12/1/2024, 8:58:08 PM |
Extra | arXiv:2402.07510 |
DOI | 10.48550/arXiv.2402.07510 |
Repository | arXiv |
Archive ID | arXiv:2402.07510 |
Date Added | 12/1/2024, 8:58:08 PM |
Modified | 12/1/2024, 8:58:10 PM |
Item Type | Preprint |
---|---|
Author | Evan Miller |
Abstract | Evaluations are critical for understanding the capabilities of large language models (LLMs). Fundamentally, evaluations are experiments; but the literature on evaluations has largely ignored the literature from other sciences on experiment analysis and planning. This article shows researchers with some training in statistics how to think about and analyze data from language model evaluations. Conceptualizing evaluation questions as having been drawn from an unseen super-population, we present formulas for analyzing evaluation data, measuring differences between two models, and planning an evaluation experiment. We make a number of specific recommendations for running language model evaluations and reporting experiment results in a way that minimizes statistical noise and maximizes informativeness. |
Date | 2024-11-01 |
Short Title | Adding Error Bars to Evals |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.00640 |
Accessed | 12/9/2024, 7:36:47 PM |
Extra | arXiv:2411.00640 [stat] |
DOI | 10.48550/arXiv.2411.00640 |
Repository | arXiv |
Archive ID | arXiv:2411.00640 |
Date Added | 12/9/2024, 7:36:47 PM |
Modified | 12/9/2024, 7:36:50 PM |
Comment: 14 pages
Item Type | Preprint |
---|---|
Author | Michael Lan |
Author | Philip Torr |
Author | Austin Meek |
Author | Ashkan Khakzar |
Author | David Krueger |
Author | Fazl Barez |
Abstract | We investigate feature universality in large language models (LLMs), a research field that aims to understand how different models similarly represent concepts in the latent spaces of their intermediate layers. Demonstrating feature universality allows discoveries about latent representations to generalize across several models. However, comparing features across LLMs is challenging due to polysemanticity, in which individual neurons often correspond to multiple features rather than distinct ones. This makes it difficult to disentangle and match features across different models. To address this issue, we employ a method known as dictionary learning by using sparse autoencoders (SAEs) to transform LLM activations into more interpretable spaces spanned by neurons corresponding to individual features. After matching feature neurons across models via activation correlation, we apply representational space similarity metrics like Singular Value Canonical Correlation Analysis to analyze these SAE features across different LLMs. Our experiments reveal significant similarities in SAE feature spaces across various LLMs, providing new evidence for feature universality. |
Date | 2024-10-09 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.06981 |
Accessed | 12/1/2024, 8:28:23 PM |
Extra | arXiv:2410.06981 |
DOI | 10.48550/arXiv.2410.06981 |
Repository | arXiv |
Archive ID | arXiv:2410.06981 |
Date Added | 12/1/2024, 8:28:23 PM |
Modified | 12/1/2024, 8:28:26 PM |
Item Type | Software |
---|---|
Programmer | Intelligent Unmanned Systems Laboratory |
Abstract | Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome. |
Date | 2024-12-09T16:45:47Z |
Library Catalog | GitHub |
URL | https://github.com/WindyLab/LLM-RL-Papers |
Accessed | 12/9/2024, 12:59:35 PM |
Extra | original-date: 2024-03-18T08:31:23Z |
Date Added | 12/9/2024, 12:59:35 PM |
Modified | 12/9/2024, 12:59:35 PM |
Item Type | Preprint |
---|---|
Author | Siyuan Hu |
Author | Mingyu Ouyang |
Author | Difei Gao |
Author | Mike Zheng Shou |
Abstract | The recently released model, Claude 3.5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent. As an early beta, its capability in the real-world complex environment remains unknown. In this case study to explore Claude 3.5 Computer Use, we curate and organize a collection of carefully designed tasks spanning a variety of domains and software. Observations from these cases demonstrate Claude 3.5 Computer Use's unprecedented ability in end-to-end language to desktop actions. Along with this study, we provide an out-of-the-box agent framework for deploying API-based GUI automation models with easy implementation. Our case studies aim to showcase a groundwork of capabilities and limitations of Claude 3.5 Computer Use with detailed analyses and bring to the fore questions about planning, action, and critic, which must be considered for future improvement. We hope this preliminary exploration will inspire future research into the GUI agent community. All the test cases in the paper can be tried through the project: https://github.com/showlab/computer_use_ootb. |
Date | 2024-11-15 |
Short Title | The Dawn of GUI Agent |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.10323 |
Accessed | 12/1/2024, 8:33:30 PM |
Extra | arXiv:2411.10323 |
DOI | 10.48550/arXiv.2411.10323 |
Repository | arXiv |
Archive ID | arXiv:2411.10323 |
Date Added | 12/1/2024, 8:33:30 PM |
Modified | 12/1/2024, 8:33:30 PM |
Item Type | Preprint |
---|---|
Author | John Heibel |
Author | Daniel Lowd |
Abstract | LLM-based programming assistants offer the promise of programming faster but with the risk of introducing more security vulnerabilities. Prior work has studied how LLMs could be maliciously fine-tuned to suggest vulnerabilities more often. With the rise of agentic LLMs, which may use results from an untrusted third party, there is a growing risk of attacks on the model's prompt. We introduce the Malicious Programming Prompt (MaPP) attack, in which an attacker adds a small amount of text to a prompt for a programming task (under 500 bytes). We show that our prompt strategy can cause an LLM to add vulnerabilities while continuing to write otherwise correct code. We evaluate three prompts on seven common LLMs, from basic to state-of-the-art commercial models. Using the HumanEval benchmark, we find that our prompts are broadly effective, with no customization required for different LLMs. Furthermore, the LLMs that are best at HumanEval are also best at following our malicious instructions, suggesting that simply scaling language models will not prevent MaPP attacks. Using a dataset of eight CWEs in 16 scenarios, we find that MaPP attacks are also effective at implementing specific and targeted vulnerabilities across a range of models. Our work highlights the need to secure LLM prompts against manipulation as well as rigorously auditing code generated with the help of LLMs. |
Date | 2024-07-12 |
Short Title | MaPPing Your Model |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2407.11072 |
Accessed | 12/1/2024, 8:21:33 PM |
Extra | arXiv:2407.11072 |
DOI | 10.48550/arXiv.2407.11072 |
Repository | arXiv |
Archive ID | arXiv:2407.11072 |
Date Added | 12/1/2024, 8:21:33 PM |
Modified | 12/1/2024, 8:21:33 PM |
Item Type | Preprint |
---|---|
Author | Shibo Hao |
Author | Sainbayar Sukhbaatar |
Author | DiJia Su |
Author | Xian Li |
Author | Zhiting Hu |
Author | Jason Weston |
Author | Yuandong Tian |
Abstract | Large language models (LLMs) are restricted to reason in the "language space", where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. However, we argue that language space may not always be optimal for reasoning. For example, most word tokens are primarily for textual coherence and not essential for reasoning, while some critical tokens require complex planning and pose huge challenges to LLMs. To explore the potential of LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM as a representation of the reasoning state (termed "continuous thought"). Rather than decoding this into a word token, we feed it back to the LLM as the subsequent input embedding directly in the continuous space. Experiments show that Coconut can effectively augment the LLM on several reasoning tasks. This novel latent reasoning paradigm leads to emergent advanced reasoning patterns: the continuous thought can encode multiple alternative next reasoning steps, allowing the model to perform a breadth-first search (BFS) to solve the problem, rather than prematurely committing to a single deterministic path like CoT. Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference. These findings demonstrate the promise of latent reasoning and offer valuable insights for future research. |
Date | 2024-12-09 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2412.06769 |
Accessed | 12/11/2024, 10:19:43 AM |
Extra | arXiv:2412.06769 [cs] |
DOI | 10.48550/arXiv.2412.06769 |
Repository | arXiv |
Archive ID | arXiv:2412.06769 |
Date Added | 12/11/2024, 10:19:43 AM |
Modified | 12/11/2024, 10:19:43 AM |
Item Type | Preprint |
---|---|
Author | Yu Gu |
Author | Boyuan Zheng |
Author | Boyu Gou |
Author | Kai Zhang |
Author | Cheng Chang |
Author | Sanjari Srivastava |
Author | Yanan Xie |
Author | Peng Qi |
Author | Huan Sun |
Author | Yu Su |
Abstract | Language agents have demonstrated promising capabilities in automating web-based tasks, though their current reactive approaches still underperform largely compared to humans. While incorporating advanced planning algorithms, particularly tree search methods, could enhance these agents' performance, implementing tree search directly on live websites poses significant safety risks and practical constraints due to irreversible actions such as confirming a purchase. In this paper, we introduce a novel paradigm that augments language agents with model-based planning, pioneering the innovative use of large language models (LLMs) as world models in complex web environments. Our method, WebDreamer, builds on the key insight that LLMs inherently encode comprehensive knowledge about website structures and functionalities. Specifically, WebDreamer uses LLMs to simulate outcomes for each candidate action (e.g., "what would happen if I click this button?") using natural language descriptions, and then evaluates these imagined outcomes to determine the optimal action at each step. Empirical results on two representative web agent benchmarks with online interaction -- VisualWebArena and Mind2Web-live -- demonstrate that WebDreamer achieves substantial improvements over reactive baselines. By establishing the viability of LLMs as world models in web environments, this work lays the groundwork for a paradigm shift in automated web interaction. More broadly, our findings open exciting new avenues for future research into 1) optimizing LLMs specifically for world modeling in complex, dynamic environments, and 2) model-based speculative planning for language agents. |
Date | 2024-11-10 |
Short Title | Is Your LLM Secretly a World Model of the Internet? |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.06559 |
Accessed | 12/1/2024, 8:40:56 PM |
Extra | arXiv:2411.06559 |
DOI | 10.48550/arXiv.2411.06559 |
Repository | arXiv |
Archive ID | arXiv:2411.06559 |
Date Added | 12/1/2024, 8:40:56 PM |
Modified | 12/1/2024, 8:40:56 PM |
Item Type | Preprint |
---|---|
Author | Kai Fronsdal |
Author | David Lindner |
Abstract | We propose a suite of tasks to evaluate the instrumental self-reasoning ability of large language model (LLM) agents. Instrumental self-reasoning ability could improve adaptability and enable self-modification, but it could also pose significant risks, such as enabling deceptive alignment. Prior work has only evaluated self-reasoning in non-agentic settings or in limited domains. In this paper, we propose evaluations for instrumental self-reasoning ability in agentic tasks in a wide range of scenarios, including self-modification, knowledge seeking, and opaque self-reasoning. We evaluate agents built using state-of-the-art LLMs, including commercial and open source systems. We find that instrumental self-reasoning ability emerges only in the most capable frontier models and that it is highly context-dependent. No model passes the the most difficult versions of our evaluations, hence our evaluation can be used to measure increases in instrumental self-reasoning ability in future models. We open-source our evaluations at https://github.com/kaifronsdal/Self-Reasoning-Evals. |
Date | 2024-12-05 |
Short Title | MISR |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2412.03904 |
Accessed | 12/11/2024, 10:15:02 AM |
Extra | arXiv:2412.03904 [cs] |
DOI | 10.48550/arXiv.2412.03904 |
Repository | arXiv |
Archive ID | arXiv:2412.03904 |
Date Added | 12/11/2024, 10:15:02 AM |
Modified | 12/11/2024, 10:15:12 AM |
Comment: 10 pages, 65 page appendix, 5 figures
Item Type | Preprint |
---|---|
Author | Javier Ferrando |
Author | Oscar Obeso |
Author | Senthooran Rajamanoharan |
Author | Neel Nanda |
Abstract | Hallucinations in large language models are a widespread problem, yet the mechanisms behind whether models will hallucinate are poorly understood, limiting our ability to solve this problem. Using sparse autoencoders as an interpretability tool, we discover that a key part of these mechanisms is entity recognition, where the model detects if an entity is one it can recall facts about. Sparse autoencoders uncover meaningful directions in the representation space, these detect whether the model recognizes an entity, e.g. detecting it doesn't know about an athlete or a movie. This suggests that models can have self-knowledge: internal representations about their own capabilities. These directions are causally relevant: capable of steering the model to refuse to answer questions about known entities, or to hallucinate attributes of unknown entities when it would otherwise refuse. We demonstrate that despite the sparse autoencoders being trained on the base model, these directions have a causal effect on the chat model's refusal behavior, suggesting that chat finetuning has repurposed this existing mechanism. Furthermore, we provide an initial exploration into the mechanistic role of these directions in the model, finding that they disrupt the attention of downstream heads that typically move entity attributes to the final token. |
Date | 2024-11-21 |
Short Title | Do I Know This Entity? |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.14257 |
Accessed | 12/4/2024, 5:18:40 PM |
Extra | arXiv:2411.14257 |
DOI | 10.48550/arXiv.2411.14257 |
Repository | arXiv |
Archive ID | arXiv:2411.14257 |
Date Added | 12/4/2024, 5:18:40 PM |
Modified | 12/4/2024, 5:18:40 PM |
Item Type | Blog Post |
---|---|
Author | Joe Edelman |
Author | Oliver Klingefjord |
Abstract | You may want a compliant assistant, but a co-founder with integrity. We propose ‘model integrity’ as an overlooked challenge in aligning LLM agents. |
Date | 2024-12-05 |
URL | https://meaningalignment.substack.com/p/model-integrity |
Accessed | 12/6/2024, 8:12:16 AM |
Blog Title | Meaning Alignment Institute |
Website Type | Substack newsletter |
Date Added | 12/6/2024, 8:12:16 AM |
Modified | 12/6/2024, 8:12:21 AM |
Item Type | Preprint |
---|---|
Author | Liming Dong |
Author | Qinghua Lu |
Author | Liming Zhu |
Abstract | Large language model (LLM) agents have demonstrated remarkable capabilities across various domains, gaining extensive attention from academia and industry. However, these agents raise significant concerns on AI safety due to their autonomous and non-deterministic behavior, as well as continuous evolving nature . From a DevOps perspective, enabling observability in agents is necessary to ensuring AI safety, as stakeholders can gain insights into the agents' inner workings, allowing them to proactively understand the agents, detect anomalies, and prevent potential failures. Therefore, in this paper, we present a comprehensive taxonomy of AgentOps, identifying the artifacts and associated data that should be traced throughout the entire lifecycle of agents to achieve effective observability. The taxonomy is developed based on a systematic mapping study of existing AgentOps tools. Our taxonomy serves as a reference template for developers to design and implement AgentOps infrastructure that supports monitoring, logging, and analytics. thereby ensuring AI safety. |
Date | 2024-11-30 |
Short Title | AgentOps |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.05285 |
Accessed | 12/4/2024, 5:19:13 PM |
Extra | arXiv:2411.05285 version: 2 |
DOI | 10.48550/arXiv.2411.05285 |
Repository | arXiv |
Archive ID | arXiv:2411.05285 |
Date Added | 12/4/2024, 5:19:13 PM |
Modified | 12/4/2024, 5:19:13 PM |
Item Type | Journal Article |
---|---|
Author | Alex Beutel |
Author | Kai Xiao |
Author | Johannes Heidecke |
Author | Lilian Weng |
Abstract | Automated red teaming can discover rare model failures and generate challenging examples that can be used for training or evaluation. However, a core challenge in automated red teaming is ensuring that the attacks are both diverse and effective. Prior methods typically succeed in optimizing either for diversity or for effectiveness, but rarely both. In this paper, we provide methods that enable automated red teaming to generate a large number of diverse and successful attacks. |
Language | en |
Library Catalog | Zotero |
Date Added | 12/1/2024, 8:42:36 PM |
Modified | 12/1/2024, 8:42:36 PM |
Item Type | Preprint |
---|---|
Author | Glen Berman |
Author | Ned Cooper |
Author | Wesley Hanwen Deng |
Author | Ben Hutchinson |
Abstract | To evaluate the societal impacts of GenAI requires a model of how social harms emerge from interactions between GenAI, people, and societal structures. Yet a model is rarely explicitly defined in societal impact evaluations, or in the taxonomies of societal impacts that support them. In this provocation, we argue that societal impacts should be conceptualised as application- and context-specific, incommensurable, and shaped by questions of social power. Doing so leads us to conclude that societal impact evaluations using existing taxonomies are inherently limited, in terms of their potential to reveal how GenAI systems may interact with people when introduced into specific social contexts. We therefore propose a governance-first approach to managing societal harms attended by GenAI technologies. |
Date | 2024-10-30 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.22985 |
Accessed | 12/11/2024, 10:10:33 PM |
Extra | arXiv:2410.22985 [cs] |
DOI | 10.48550/arXiv.2410.22985 |
Repository | arXiv |
Archive ID | arXiv:2410.22985 |
Date Added | 12/11/2024, 10:10:33 PM |
Modified | 12/11/2024, 10:10:36 PM |
Comment: 3 pages
Item Type | Journal Article |
---|---|
Author | Gagan Bansal |
Author | Jennifer Wortman Vaughan |
Author | Saleema Amershi |
Author | Eric Horvitz |
Author | Adam Fourney |
Author | Hussein Mozannar |
Author | Victor Dibia |
Author | Daniel S. Weld |
Abstract | Explore key challenges in human-agent communication with generative AI and autonomous agents. Learn about transparency, control, and challenges for improving human-AI interaction. |
Date | 2024/12/01 |
Language | en-US |
Library Catalog | www.microsoft.com |
URL | https://www.microsoft.com/en-us/research/publication/human-agent-interaction-challenges/ |
Accessed | 12/4/2024, 5:19:46 PM |
Date Added | 12/4/2024, 5:19:45 PM |
Modified | 12/4/2024, 5:19:45 PM |
Item Type | Journal Article |
---|---|
Author | Lama Ahmad |
Author | Sandhini Agarwal |
Author | Michael Lampe |
Author | Pamela Mishkin |
Abstract | Red teaming has emerged as a critical practice in assessing the possible risks of AI models and systems. It aids in the discovery of novel risks, stress testing possible gaps in existing mitigations, enriching existing quantitative safety metrics, facilitating the creation of new safety measurements, and enhancing public trust and the legitimacy of AI risk assessments. This white paper describes OpenAI’s work to date in external red teaming and draws some more general conclusions from this work. We describe the design considerations underpinning external red teaming, which include: selecting composition of red team, deciding on access levels, and providing guidance required to conduct red teaming. Additionally, we show outcomes red teaming can enable such as input into risk assessment and automated evaluations. We also describe the limitations of external red teaming, and how it can fit into a broader range of AI model and system evaluations. Through these contributions, we hope that AI developers and deployers, evaluation creators, and policymakers will be able to better design red teaming campaigns and get a deeper look into how external red teaming can fit into model deployment and evaluation processes. These methods are evolving and the value of different methods continues to shift as the ecosystem around red teaming matures and models themselves improve as tools for red teaming. |
Language | en |
Library Catalog | Zotero |
Date Added | 12/1/2024, 8:42:16 PM |
Modified | 12/1/2024, 8:42:16 PM |
Item Type | Web Page |
---|---|
Author | #author.fullName} |
Abstract | Letting AI models communicate with each other in their internal mathematical language, rather than translating back and forth to English, could accelerate their task-solving abilities |
Language | en-US |
URL | https://www.newscientist.com/article/2455173-ai-models-work-together-faster-when-they-speak-their-own-language/ |
Accessed | 12/1/2024, 8:52:55 PM |
Website Title | New Scientist |
Date Added | 12/1/2024, 8:52:55 PM |
Modified | 12/1/2024, 8:53:00 PM |