Item Type | Preprint |
---|---|
Author | Seth Anil |
Abstract | As artificial intelligence (AI) continues to develop, it is natural to ask whether AI systems can be not only intelligent, but also conscious. I consider why some people think AI might develop consciousness, identifying some biases that lead us astray. I ask what it would take for conscious AI to be a realistic prospect, pushing back against some common assumptions such as the notion that computation provides a sufficient basis for consciousness. I’ll instead make the case for taking seriously the possibility that consciousness might depend on our nature as living organisms – a form of biological naturalism. I will end by exploring some wider issues including testing for consciousness in AI, and ethical considerations arising from AI that either actually is, or convincingly seems to be, conscious. |
Date | 2024-06-30 |
Language | en-us |
Library Catalog | OSF Preprints |
URL | https://osf.io/tz6an |
Accessed | 11/12/2024, 8:54:33 AM |
DOI | 10.31234/osf.io/tz6an |
Repository | OSF |
Date Added | 11/12/2024, 8:54:33 AM |
Modified | 11/16/2024, 3:29:54 PM |
Item Type | Preprint |
---|---|
Author | Jillian Fisher |
Author | Shangbin Feng |
Author | Robert Aron |
Author | Thomas Richardson |
Author | Yejin Choi |
Author | Daniel W. Fisher |
Author | Jennifer Pan |
Author | Yulia Tsvetkov |
Author | Katharina Reinecke |
Abstract | As modern AI models become integral to everyday tasks, concerns about their inherent biases and their potential impact on human decision-making have emerged. While bias in models are well-documented, less is known about how these biases influence human decisions. This paper presents two interactive experiments investigating the effects of partisan bias in AI language models on political decision-making. Participants interacted freely with either a biased liberal, biased conservative, or unbiased control model while completing political decision-making tasks. We found that participants exposed to politically biased models were significantly more likely to adopt opinions and make decisions aligning with the AI's bias, regardless of their personal political partisanship. However, we also discovered that prior knowledge about AI could lessen the impact of the bias, highlighting the possible importance of AI education for robust bias mitigation. Our findings not only highlight the critical effects of interacting with biased AI and its ability to impact public discourse and political conduct, but also highlights potential techniques for mitigating these risks in the future. |
Date | 2024-11-04 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.06415 |
Accessed | 11/11/2024, 8:54:31 AM |
Extra | arXiv:2410.06415 |
DOI | 10.48550/arXiv.2410.06415 |
Repository | arXiv |
Archive ID | arXiv:2410.06415 |
Date Added | 11/11/2024, 8:54:31 AM |
Modified | 11/16/2024, 1:56:40 PM |
Item Type | Preprint |
---|---|
Author | Yuan Gao |
Author | Dokyun Lee |
Author | Gordon Burtch |
Author | Sina Fazelpour |
Abstract | Recent studies suggest large language models (LLMs) can exhibit human-like reasoning, aligning with human behavior in economic experiments, surveys, and political discourse. This has led many to propose that LLMs can be used as surrogates for humans in social science research. However, LLMs differ fundamentally from humans, relying on probabilistic patterns, absent the embodied experiences or survival objectives that shape human cognition. We assess the reasoning depth of LLMs using the 11-20 money request game. Almost all advanced approaches fail to replicate human behavior distributions across many models, except in one case involving fine-tuning using a substantial amount of human behavior data. Causes of failure are diverse, relating to input language, roles, and safeguarding. These results caution against using LLMs to study human behaviors or as human surrogates. |
Date | 2024-10-25 |
Short Title | Take Caution in Using LLMs as Human Surrogates |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.19599 |
Accessed | 10/30/2024, 9:09:40 AM |
Extra | arXiv:2410.19599 |
DOI | 10.48550/arXiv.2410.19599 |
Repository | arXiv |
Archive ID | arXiv:2410.19599 |
Date Added | 10/30/2024, 9:09:40 AM |
Modified | 10/30/2024, 9:09:42 AM |
Item Type | Preprint |
---|---|
Author | Guan Zhe Hong |
Author | Nishanth Dikkala |
Author | Enming Luo |
Author | Cyrus Rashtchian |
Author | Rina Panigrahy |
Abstract | Large language models (LLMs) have shown amazing performance on tasks that require planning and reasoning. Motivated by this, we investigate the internal mechanisms that underpin a network's ability to perform complex logical reasoning. We first construct a synthetic propositional logic problem that serves as a concrete test-bed for network training and evaluation. Crucially, this problem demands nontrivial planning to solve, but we can train a small transformer to achieve perfect accuracy. Building on our set-up, we then pursue an understanding of precisely how a three-layer transformer, trained from scratch, solves this problem. We are able to identify certain "planning" and "reasoning" circuits in the network that necessitate cooperation between the attention blocks to implement the desired logic. To expand our findings, we then study a larger model, Mistral 7B. Using activation patching, we characterize internal components that are critical in solving our logic problem. Overall, our work systemically uncovers novel aspects of small and large transformers, and continues the study of how they plan and reason. |
Date | 2024-11-06 |
Short Title | How Transformers Solve Propositional Logic Problems |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.04105 |
Accessed | 11/7/2024, 2:05:21 PM |
Extra | arXiv:2411.04105 |
DOI | 10.48550/arXiv.2411.04105 |
Repository | arXiv |
Archive ID | arXiv:2411.04105 |
Date Added | 11/7/2024, 2:05:21 PM |
Modified | 11/7/2024, 2:05:21 PM |
Item Type | Preprint |
---|---|
Author | Samuel G. B. Johnson |
Author | Amir-Hossein Karimi |
Author | Yoshua Bengio |
Author | Nick Chater |
Author | Tobias Gerstenberg |
Author | Kate Larson |
Author | Sydney Levine |
Author | Melanie Mitchell |
Author | Iyad Rahwan |
Author | Bernhard Schölkopf |
Author | Igor Grossmann |
Abstract | Recent advances in artificial intelligence (AI) have produced systems capable of increasingly sophisticated performance on cognitive tasks. However, AI systems still struggle in critical ways: unpredictable and novel environments (robustness), lack of transparency in their reasoning (explainability), challenges in communication and commitment (cooperation), and risks due to potential harmful actions (safety). We argue that these shortcomings stem from one overarching failure: AI systems lack wisdom. Drawing from cognitive and social sciences, we define wisdom as the ability to navigate intractable problems - those that are ambiguous, radically uncertain, novel, chaotic, or computationally explosive - through effective task-level and metacognitive strategies. While AI research has focused on task-level strategies, metacognition - the ability to reflect on and regulate one's thought processes - is underdeveloped in AI systems. In humans, metacognitive strategies such as recognizing the limits of one's knowledge, considering diverse perspectives, and adapting to context are essential for wise decision-making. We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety. By focusing on developing wise AI, we suggest an alternative to aligning AI with specific human values - a task fraught with conceptual and practical difficulties. Instead, wise AI systems can thoughtfully navigate complex situations, account for diverse human values, and avoid harmful actions. We discuss potential approaches to building wise AI, including benchmarking metacognitive abilities and training AI systems to employ wise reasoning. Prioritizing metacognition in AI research will lead to systems that act not only intelligently but also wisely in complex, real-world situations. |
Date | 2024-11-04 |
Short Title | Imagining and building wise machines |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.02478 |
Accessed | 11/6/2024, 9:58:17 AM |
Extra | arXiv:2411.02478 |
DOI | 10.48550/arXiv.2411.02478 |
Repository | arXiv |
Archive ID | arXiv:2411.02478 |
Date Added | 11/6/2024, 9:58:17 AM |
Modified | 11/6/2024, 9:58:17 AM |
Item Type | Preprint |
---|---|
Author | Geoff Keeling |
Author | Winnie Street |
Author | Martyna Stachaczyk |
Author | Daria Zakharova |
Author | Iulia M. Comsa |
Author | Anastasiya Sakovych |
Author | Isabella Logothesis |
Author | Zejia Zhang |
Author | Blaise Agüera y Arcas |
Author | Jonathan Birch |
Abstract | Pleasure and pain play an important role in human decision making by providing a common currency for resolving motivational conflicts. While Large Language Models (LLMs) can generate detailed descriptions of pleasure and pain experiences, it is an open question whether LLMs can recreate the motivational force of pleasure and pain in choice scenarios - a question which may bear on debates about LLM sentience, understood as the capacity for valenced experiential states. We probed this question using a simple game in which the stated goal is to maximise points, but where either the points-maximising option is said to incur a pain penalty or a non-points-maximising option is said to incur a pleasure reward, providing incentives to deviate from points-maximising behaviour. Varying the intensity of the pain penalties and pleasure rewards, we found that Claude 3.5 Sonnet, Command R+, GPT-4o, and GPT-4o mini each demonstrated at least one trade-off in which the majority of responses switched from points-maximisation to pain-minimisation or pleasure-maximisation after a critical threshold of stipulated pain or pleasure intensity is reached. LLaMa 3.1-405b demonstrated some graded sensitivity to stipulated pleasure rewards and pain penalties. Gemini 1.5 Pro and PaLM 2 prioritised pain-avoidance over points-maximisation regardless of intensity, while tending to prioritise points over pleasure regardless of intensity. We discuss the implications of these findings for debates about the possibility of LLM sentience. |
Date | 2024-11-01 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.02432 |
Accessed | 11/6/2024, 9:58:54 AM |
Extra | arXiv:2411.02432 |
DOI | 10.48550/arXiv.2411.02432 |
Repository | arXiv |
Archive ID | arXiv:2411.02432 |
Date Added | 11/6/2024, 9:58:54 AM |
Modified | 11/6/2024, 9:58:54 AM |
Item Type | Preprint |
---|---|
Author | Seth Lazar |
Abstract | A century ago, John Dewey observed that '[s]team and electricity have done more to alter the conditions under which men associate together than all the agencies which affected human relationships before our time'. In the last few decades, computing technologies have had a similar effect. Political philosophy's central task is to help us decide how to live together, by analysing our social relations, diagnosing their failings, and articulating ideals to guide their revision. But these profound social changes have left scarcely a dent in the model of social relations that (analytical) political philosophers assume. This essay aims to reverse that trend. It first builds a model of our novel social relations as they are now, and as they are likely to evolved, and then explores how those differences affect our theories of how to live together. I introduce the 'Algorithmic City', the network of algorithmically-mediated social relations, then characterise the intermediary power by which it is governed. I show how algorithmic governance raises new challenges for political philosophy concerning the justification of authority, the foundations of procedural legitimacy, and the possibility of justificatory neutrality. |
Date | 2024-10-17 |
Short Title | Lecture I |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.20720 |
Accessed | 11/18/2024, 10:28:27 AM |
Extra | arXiv:2410.20720 |
DOI | 10.48550/arXiv.2410.20720 |
Repository | arXiv |
Archive ID | arXiv:2410.20720 |
Date Added | 11/18/2024, 10:28:27 AM |
Modified | 11/18/2024, 10:28:32 AM |
Item Type | Preprint |
---|---|
Author | Seth Lazar |
Author | Lorenzo Manuali |
Abstract | LLMs are among the most advanced tools ever devised for analysing and generating linguistic content. Democratic deliberation and decision-making involve, at several distinct stages, the production and analysis of language. So it is natural to ask whether our best tools for manipulating language might prove instrumental to one of our most important linguistic tasks. Researchers and practitioners have recently asked whether LLMs can support democratic deliberation by leveraging abilities to summarise content, as well as to aggregate opinion over summarised content, and indeed to represent voters by predicting their preferences over unseen choices. In this paper, we assess whether using LLMs to perform these and related functions really advances the democratic values that inspire these experiments. We suggest that the record is decidedly mixed. In the presence of background inequality of power and resources, as well as deep moral and political disagreement, we should be careful not to use LLMs in ways that automate non-instrumentally valuable components of the democratic process, or else threaten to supplant fair and transparent decision-making procedures that are necessary to reconcile competing interests and values. However, while we argue that LLMs should be kept well clear of formal democratic decision-making processes, we think that they can be put to good use in strengthening the informal public sphere: the arena that mediates between democratic governments and the polities that they serve, in which political communities seek information, form civic publics, and hold their leaders to account. |
Date | 2024-10-17 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2410.08418 |
Accessed | 11/19/2024, 8:35:14 AM |
Extra | arXiv:2410.08418 |
DOI | 10.48550/arXiv.2410.08418 |
Repository | arXiv |
Archive ID | arXiv:2410.08418 |
Date Added | 11/19/2024, 8:35:14 AM |
Modified | 11/19/2024, 8:35:14 AM |
Item Type | Journal Article |
---|---|
Author | Harry R. Lloyd |
Abstract | New AI technologies have the potential to cause unintended harms in diverse domains including warfare, judicial sentencing, medicine and governance. One strategy for realising the benefits of AI whilst avoiding its potential dangers is to ensure that new AIs are properly ‘aligned’ with some form of ‘alignment target.’ One danger of this strategy is that–dependent on the alignment target chosen–our AIs might optimise for objectives that reflect the values only of a certain subset of society, and that do not take into account alternative views about what constitutes desirable and safe behaviour for AI agents. In response to this problem, several AI ethicists have suggested alignment targets that are designed to be sensitive to widespread normative disagreement amongst the relevant stakeholders. Authors inspired by voting theory have suggested that AIs should be aligned with the verdicts of actual or simulated ‘moral parliaments’ whose members represent the normative views of the relevant stakeholders. Other authors inspired by decision theory and the philosophical literature on moral uncertainty have suggested that AIs should maximise socially expected choiceworthiness. In this paper, I argue that both of these proposals face several important problems. In particular, they fail to select attractive ‘compromise options’ in cases where such options are available. I go on to propose and defend an alternative, bargaining-theoretic alignment target, which avoids the problems associated with the voting- and decision-theoretic approaches. |
Date | 2024-11-18 |
Language | en |
Library Catalog | DOI.org (Crossref) |
URL | https://link.springer.com/10.1007/s11098-024-02224-5 |
Accessed | 11/19/2024, 8:29:08 AM |
Publication | Philosophical Studies |
DOI | 10.1007/s11098-024-02224-5 |
Journal Abbr | Philos Stud |
ISSN | 0031-8116, 1573-0883 |
Date Added | 11/19/2024, 8:29:08 AM |
Modified | 11/19/2024, 8:29:08 AM |
Item Type | Preprint |
---|---|
Author | Robert Long |
Author | Jeff Sebo |
Author | Patrick Butlin |
Author | Kathleen Finlinson |
Author | Kyle Fish |
Author | Jacqueline Harding |
Author | Jacob Pfau |
Author | Toni Sims |
Author | Jonathan Birch |
Author | David Chalmers |
Abstract | In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future. That means that the prospect of AI welfare and moral patienthood, i.e. of AI systems with their own interests and moral significance, is no longer an issue only for sci-fi or the distant future. It is an issue for the near future, and AI companies and other actors have a responsibility to start taking it seriously. We also recommend three early steps that AI companies and other actors can take: They can (1) acknowledge that AI welfare is an important and difficult issue (and ensure that language model outputs do the same), (2) start assessing AI systems for evidence of consciousness and robust agency, and (3) prepare policies and procedures for treating AI systems with an appropriate level of moral concern. To be clear, our argument in this report is not that AI systems definitely are, or will be, conscious, robustly agentic, or otherwise morally significant. Instead, our argument is that there is substantial uncertainty about these possibilities, and so we need to improve our understanding of AI welfare and our ability to make wise decisions about this issue. Otherwise there is a significant risk that we will mishandle decisions about AI welfare, mistakenly harming AI systems that matter morally and/or mistakenly caring for AI systems that do not. |
Date | 2024-11-04 |
Library Catalog | arXiv.org |
URL | http://arxiv.org/abs/2411.00986 |
Accessed | 11/17/2024, 6:29:26 PM |
Extra | arXiv:2411.00986 version: 1 |
DOI | 10.48550/arXiv.2411.00986 |
Repository | arXiv |
Archive ID | arXiv:2411.00986 |
Date Added | 11/17/2024, 6:29:26 PM |
Modified | 11/17/2024, 6:29:26 PM |
Item Type | Journal Article |
---|---|
Author | Arianna Manzini |
Author | Geoff Keeling |
Author | Lize Alberts |
Author | Shannon Vallor |
Author | Meredith Ringel Morris |
Author | Iason Gabriel |
Abstract | The development of increasingly agentic and human-like AI assistants, capable of performing a wide range of tasks on user's behalf over time, has sparked heightened interest in the nature and bounds of human interactions with AI. Such systems may indeed ground a transition from task-oriented interactions with AI, at discrete time intervals, to ongoing relationships -- where users develop a deeper sense of connection with and attachment to the technology. This paper investigates what it means for relationships between users and advanced AI assistants to be appropriate and proposes a new framework to evaluate both users' relationships with AI and developers' design choices. We first provide an account of advanced AI assistants, motivating the question of appropriate relationships by exploring several distinctive features of this technology. These include anthropomorphic cues and the longevity of interactions with users, increased AI agency, generality and context ambiguity, and the forms and depth of dependence the relationship could engender. Drawing upon various ethical traditions, we then consider a series of values, including benefit, flourishing, autonomy and care, that characterise appropriate human interpersonal relationships. These values guide our analysis of how the distinctive features of AI assistants may give rise to inappropriate relationships with users. Specifically, we discuss a set of concrete risks arising from user--AI assistant relationships that: (1) cause direct emotional or physical harm to users, (2) limit opportunities for user personal development, (3) exploit user emotional dependence, and (4) generate material dependencies without adequate commitment to user needs. We conclude with a set of recommendations to address these risks. |
Date | 2024-10-16 |
Language | en |
Short Title | The Code That Binds Us |
Library Catalog | ojs.aaai.org |
URL | https://ojs.aaai.org/index.php/AIES/article/view/31694 |
Accessed | 10/28/2024, 9:54:41 AM |
Rights | Copyright (c) 2024 Association for the Advancement of Artificial Intelligence |
Volume | 7 |
Pages | 943-957 |
Publication | Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society |
Date Added | 10/28/2024, 9:54:41 AM |
Modified | 10/28/2024, 9:54:41 AM |
Item Type | Journal Article |
---|---|
Author | Sebastian Porsdam Mann |
Author | Anuraag A. Vazirani |
Author | Mateo Aboy |
Author | Brian D. Earp |
Author | Timo Minssen |
Author | I. Glenn Cohen |
Author | Julian Savulescu |
Abstract | In this Comment, we propose a cumulative set of three essential criteria for the ethical use of LLMs in academic writing, and present a statement that researchers can quote when submitting LLM-assisted manuscripts in order to testify to their adherence to them. |
Date | 2024-11-13 |
Language | en |
Library Catalog | www.nature.com |
URL | https://www.nature.com/articles/s42256-024-00922-7 |
Accessed | 11/15/2024, 2:40:59 PM |
Rights | 2024 Springer Nature Limited |
Extra | Publisher: Nature Publishing Group |
Pages | 1-3 |
Publication | Nature Machine Intelligence |
DOI | 10.1038/s42256-024-00922-7 |
Journal Abbr | Nat Mach Intell |
ISSN | 2522-5839 |
Date Added | 11/15/2024, 2:40:59 PM |
Modified | 11/15/2024, 2:40:59 PM |
Item Type | Journal Article |
---|---|
Author | Nathaniel Sharadin |
Abstract | Suppose there are no in-principle restrictions on the contents of arbitrarily intelligent agents’ goals. According to “instrumental convergence” arguments, potentially scary things follow. I do two things in this paper. First, focusing on the influential version of the instrumental convergence argument due to Nick Bostrom, I explain why such arguments require an account of “promotion”, i.e., an account of what it is to “promote” a goal. Then, I consider whether extant accounts of promotion in the literature—in particular, probabilistic and fit-based views of promotion—can be used to support dangerous instrumental convergence. I argue that neither account of promotion can do the work. The opposite is true: accepting either account of promotion undermines support for instrumental convergence arguments’ existentially worrying conclusions. The conclusion is that we needn’t be scared—at least not because of arguments concerning instrumental convergence. |
Date | 2024-10-21 |
Language | en |
Library Catalog | Springer Link |
URL | https://doi.org/10.1007/s11098-024-02212-9 |
Accessed | 11/19/2024, 8:30:19 AM |
Publication | Philosophical Studies |
DOI | 10.1007/s11098-024-02212-9 |
Journal Abbr | Philos Stud |
ISSN | 1573-0883 |
Date Added | 11/19/2024, 8:30:19 AM |
Modified | 11/19/2024, 8:30:19 AM |
Item Type | Journal Article |
---|---|
Author | Tan Zhi-Xuan |
Author | Micah Carroll |
Author | Matija Franklin |
Author | Hal Ashton |
Abstract | The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term a preferentist approach to AI alignment. In this paper, we characterize and challenge the preferentist approach, describing conceptual and technical alternatives that are ripe for further research. We first survey the limits of rational choice theory as a descriptive model, explaining how preferences fail to capture the thick semantic content of human values, and how utility representations neglect the possible incommensurability of those values. We then critique the normativity of expected utility theory (EUT) for humans and AI, drawing upon arguments showing how rational agents need not comply with EUT, while highlighting how EUT is silent on which preferences are normatively acceptable. Finally, we argue that these limitations motivate a reframing of the targets of AI alignment: Instead of alignment with the preferences of a human user, developer, or humanity-writ-large, AI systems should be aligned with normative standards appropriate to their social roles, such as the role of a general-purpose assistant. Furthermore, these standards should be negotiated and agreed upon by all relevant stakeholders. On this alternative conception of alignment, a multiplicity of AI systems will be able to serve diverse ends, aligned with normative standards that promote mutual benefit and limit harm despite our plural and divergent values. |
Date | 2024-11-09 |
Language | en |
Library Catalog | Springer Link |
URL | https://doi.org/10.1007/s11098-024-02249-w |
Accessed | 11/19/2024, 8:28:47 AM |
Publication | Philosophical Studies |
DOI | 10.1007/s11098-024-02249-w |
Journal Abbr | Philos Stud |
ISSN | 1573-0883 |
Date Added | 11/19/2024, 8:28:47 AM |
Modified | 11/19/2024, 8:28:55 AM |