Zotero Report

Conscious artificial intelligence and biological naturalism

Item Type	Preprint
Author	Seth Anil
Abstract	As artificial intelligence (AI) continues to develop, it is natural to ask whether AI systems can be not only intelligent, but also conscious. I consider why some people think AI might develop consciousness, identifying some biases that lead us astray. I ask what it would take for conscious AI to be a realistic prospect, pushing back against some common assumptions such as the notion that computation provides a sufficient basis for consciousness. I’ll instead make the case for taking seriously the possibility that consciousness might depend on our nature as living organisms – a form of biological naturalism. I will end by exploring some wider issues including testing for consciousness in AI, and ethical considerations arising from AI that either actually is, or convincingly seems to be, conscious.
Date	2024-06-30
Language	en-us
Library Catalog	OSF Preprints
URL	https://osf.io/tz6an
Accessed	11/12/2024, 8:54:33 AM
DOI	10.31234/osf.io/tz6an
Repository	OSF
Date Added	11/12/2024, 8:54:33 AM
Modified	11/16/2024, 3:29:54 PM

Tags:

AI
artificial consciousness
artificial intelligence
biological naturalism
computational functionalism
consciousness

Attachments

OSF Preprint

Biased AI can Influence Political Decision-Making

Item Type	Preprint
Author	Jillian Fisher
Author	Shangbin Feng
Author	Robert Aron
Author	Thomas Richardson
Author	Yejin Choi
Author	Daniel W. Fisher
Author	Jennifer Pan
Author	Yulia Tsvetkov
Author	Katharina Reinecke
Abstract	As modern AI models become integral to everyday tasks, concerns about their inherent biases and their potential impact on human decision-making have emerged. While bias in models are well-documented, less is known about how these biases influence human decisions. This paper presents two interactive experiments investigating the effects of partisan bias in AI language models on political decision-making. Participants interacted freely with either a biased liberal, biased conservative, or unbiased control model while completing political decision-making tasks. We found that participants exposed to politically biased models were significantly more likely to adopt opinions and make decisions aligning with the AI's bias, regardless of their personal political partisanship. However, we also discovered that prior knowledge about AI could lessen the impact of the bias, highlighting the possible importance of AI education for robust bias mitigation. Our findings not only highlight the critical effects of interacting with biased AI and its ability to impact public discourse and political conduct, but also highlights potential techniques for mitigating these risks in the future.
Date	2024-11-04
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.06415
Accessed	11/11/2024, 8:54:31 AM
Extra	arXiv:2410.06415
DOI	10.48550/arXiv.2410.06415
Repository	arXiv
Archive ID	arXiv:2410.06415
Date Added	11/11/2024, 8:54:31 AM
Modified	11/16/2024, 1:56:40 PM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Human-Computer Interaction

Attachments

Preprint PDF
Snapshot

Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina

Item Type	Preprint
Author	Yuan Gao
Author	Dokyun Lee
Author	Gordon Burtch
Author	Sina Fazelpour
Abstract	Recent studies suggest large language models (LLMs) can exhibit human-like reasoning, aligning with human behavior in economic experiments, surveys, and political discourse. This has led many to propose that LLMs can be used as surrogates for humans in social science research. However, LLMs differ fundamentally from humans, relying on probabilistic patterns, absent the embodied experiences or survival objectives that shape human cognition. We assess the reasoning depth of LLMs using the 11-20 money request game. Almost all advanced approaches fail to replicate human behavior distributions across many models, except in one case involving fine-tuning using a substantial amount of human behavior data. Causes of failure are diverse, relating to input language, roles, and safeguarding. These results caution against using LLMs to study human behaviors or as human surrogates.
Date	2024-10-25
Short Title	Take Caution in Using LLMs as Human Surrogates
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.19599
Accessed	10/30/2024, 9:09:40 AM
Extra	arXiv:2410.19599
DOI	10.48550/arXiv.2410.19599
Repository	arXiv
Archive ID	arXiv:2410.19599
Date Added	10/30/2024, 9:09:40 AM
Modified	10/30/2024, 9:09:42 AM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Computers and Society
Computer Science - Human-Computer Interaction
Economics - General Economics
Quantitative Finance - Economics

Attachments

Preprint PDF
Snapshot

How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis

Item Type	Preprint
Author	Guan Zhe Hong
Author	Nishanth Dikkala
Author	Enming Luo
Author	Cyrus Rashtchian
Author	Rina Panigrahy
Abstract	Large language models (LLMs) have shown amazing performance on tasks that require planning and reasoning. Motivated by this, we investigate the internal mechanisms that underpin a network's ability to perform complex logical reasoning. We first construct a synthetic propositional logic problem that serves as a concrete test-bed for network training and evaluation. Crucially, this problem demands nontrivial planning to solve, but we can train a small transformer to achieve perfect accuracy. Building on our set-up, we then pursue an understanding of precisely how a three-layer transformer, trained from scratch, solves this problem. We are able to identify certain "planning" and "reasoning" circuits in the network that necessitate cooperation between the attention blocks to implement the desired logic. To expand our findings, we then study a larger model, Mistral 7B. Using activation patching, we characterize internal components that are critical in solving our logic problem. Overall, our work systemically uncovers novel aspects of small and large transformers, and continues the study of how they plan and reason.
Date	2024-11-06
Short Title	How Transformers Solve Propositional Logic Problems
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.04105
Accessed	11/7/2024, 2:05:21 PM
Extra	arXiv:2411.04105
DOI	10.48550/arXiv.2411.04105
Repository	arXiv
Archive ID	arXiv:2411.04105
Date Added	11/7/2024, 2:05:21 PM
Modified	11/7/2024, 2:05:21 PM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Machine Learning

Attachments

Preprint PDF
Snapshot

Imagining and building wise machines: The centrality of AI metacognition

Item Type	Preprint
Author	Samuel G. B. Johnson
Author	Amir-Hossein Karimi
Author	Yoshua Bengio
Author	Nick Chater
Author	Tobias Gerstenberg
Author	Kate Larson
Author	Sydney Levine
Author	Melanie Mitchell
Author	Iyad Rahwan
Author	Bernhard Schölkopf
Author	Igor Grossmann
Abstract	Recent advances in artificial intelligence (AI) have produced systems capable of increasingly sophisticated performance on cognitive tasks. However, AI systems still struggle in critical ways: unpredictable and novel environments (robustness), lack of transparency in their reasoning (explainability), challenges in communication and commitment (cooperation), and risks due to potential harmful actions (safety). We argue that these shortcomings stem from one overarching failure: AI systems lack wisdom. Drawing from cognitive and social sciences, we define wisdom as the ability to navigate intractable problems - those that are ambiguous, radically uncertain, novel, chaotic, or computationally explosive - through effective task-level and metacognitive strategies. While AI research has focused on task-level strategies, metacognition - the ability to reflect on and regulate one's thought processes - is underdeveloped in AI systems. In humans, metacognitive strategies such as recognizing the limits of one's knowledge, considering diverse perspectives, and adapting to context are essential for wise decision-making. We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety. By focusing on developing wise AI, we suggest an alternative to aligning AI with specific human values - a task fraught with conceptual and practical difficulties. Instead, wise AI systems can thoughtfully navigate complex situations, account for diverse human values, and avoid harmful actions. We discuss potential approaches to building wise AI, including benchmarking metacognitive abilities and training AI systems to employ wise reasoning. Prioritizing metacognition in AI research will lead to systems that act not only intelligently but also wisely in complex, real-world situations.
Date	2024-11-04
Short Title	Imagining and building wise machines
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.02478
Accessed	11/6/2024, 9:58:17 AM
Extra	arXiv:2411.02478
DOI	10.48550/arXiv.2411.02478
Repository	arXiv
Archive ID	arXiv:2411.02478
Date Added	11/6/2024, 9:58:17 AM
Modified	11/6/2024, 9:58:17 AM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Computers and Society
Computer Science - Human-Computer Interaction

Attachments

Preprint PDF
Snapshot

Can LLMs make trade-offs involving stipulated pain and pleasure states?

Item Type	Preprint
Author	Geoff Keeling
Author	Winnie Street
Author	Martyna Stachaczyk
Author	Daria Zakharova
Author	Iulia M. Comsa
Author	Anastasiya Sakovych
Author	Isabella Logothesis
Author	Zejia Zhang
Author	Blaise Agüera y Arcas
Author	Jonathan Birch
Abstract	Pleasure and pain play an important role in human decision making by providing a common currency for resolving motivational conflicts. While Large Language Models (LLMs) can generate detailed descriptions of pleasure and pain experiences, it is an open question whether LLMs can recreate the motivational force of pleasure and pain in choice scenarios - a question which may bear on debates about LLM sentience, understood as the capacity for valenced experiential states. We probed this question using a simple game in which the stated goal is to maximise points, but where either the points-maximising option is said to incur a pain penalty or a non-points-maximising option is said to incur a pleasure reward, providing incentives to deviate from points-maximising behaviour. Varying the intensity of the pain penalties and pleasure rewards, we found that Claude 3.5 Sonnet, Command R+, GPT-4o, and GPT-4o mini each demonstrated at least one trade-off in which the majority of responses switched from points-maximisation to pain-minimisation or pleasure-maximisation after a critical threshold of stipulated pain or pleasure intensity is reached. LLaMa 3.1-405b demonstrated some graded sensitivity to stipulated pleasure rewards and pain penalties. Gemini 1.5 Pro and PaLM 2 prioritised pain-avoidance over points-maximisation regardless of intensity, while tending to prioritise points over pleasure regardless of intensity. We discuss the implications of these findings for debates about the possibility of LLM sentience.
Date	2024-11-01
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.02432
Accessed	11/6/2024, 9:58:54 AM
Extra	arXiv:2411.02432
DOI	10.48550/arXiv.2411.02432
Repository	arXiv
Archive ID	arXiv:2411.02432
Date Added	11/6/2024, 9:58:54 AM
Modified	11/6/2024, 9:58:54 AM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Computers and Society

Attachments

Preprint PDF
Snapshot

Lecture I: Governing the Algorithmic City

Item Type	Preprint
Author	Seth Lazar
Abstract	A century ago, John Dewey observed that '[s]team and electricity have done more to alter the conditions under which men associate together than all the agencies which affected human relationships before our time'. In the last few decades, computing technologies have had a similar effect. Political philosophy's central task is to help us decide how to live together, by analysing our social relations, diagnosing their failings, and articulating ideals to guide their revision. But these profound social changes have left scarcely a dent in the model of social relations that (analytical) political philosophers assume. This essay aims to reverse that trend. It first builds a model of our novel social relations as they are now, and as they are likely to evolved, and then explores how those differences affect our theories of how to live together. I introduce the 'Algorithmic City', the network of algorithmically-mediated social relations, then characterise the intermediary power by which it is governed. I show how algorithmic governance raises new challenges for political philosophy concerning the justification of authority, the foundations of procedural legitimacy, and the possibility of justificatory neutrality.
Date	2024-10-17
Short Title	Lecture I
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.20720
Accessed	11/18/2024, 10:28:27 AM
Extra	arXiv:2410.20720
DOI	10.48550/arXiv.2410.20720
Repository	arXiv
Archive ID	arXiv:2410.20720
Date Added	11/18/2024, 10:28:27 AM
Modified	11/18/2024, 10:28:32 AM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Computers and Society

Attachments

Preprint PDF
Snapshot

Can LLMs advance democratic values?

Item Type	Preprint
Author	Seth Lazar
Author	Lorenzo Manuali
Abstract	LLMs are among the most advanced tools ever devised for analysing and generating linguistic content. Democratic deliberation and decision-making involve, at several distinct stages, the production and analysis of language. So it is natural to ask whether our best tools for manipulating language might prove instrumental to one of our most important linguistic tasks. Researchers and practitioners have recently asked whether LLMs can support democratic deliberation by leveraging abilities to summarise content, as well as to aggregate opinion over summarised content, and indeed to represent voters by predicting their preferences over unseen choices. In this paper, we assess whether using LLMs to perform these and related functions really advances the democratic values that inspire these experiments. We suggest that the record is decidedly mixed. In the presence of background inequality of power and resources, as well as deep moral and political disagreement, we should be careful not to use LLMs in ways that automate non-instrumentally valuable components of the democratic process, or else threaten to supplant fair and transparent decision-making procedures that are necessary to reconcile competing interests and values. However, while we argue that LLMs should be kept well clear of formal democratic decision-making processes, we think that they can be put to good use in strengthening the informal public sphere: the arena that mediates between democratic governments and the polities that they serve, in which political communities seek information, form civic publics, and hold their leaders to account.
Date	2024-10-17
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2410.08418
Accessed	11/19/2024, 8:35:14 AM
Extra	arXiv:2410.08418
DOI	10.48550/arXiv.2410.08418
Repository	arXiv
Archive ID	arXiv:2410.08418
Date Added	11/19/2024, 8:35:14 AM
Modified	11/19/2024, 8:35:14 AM

Tags:

Computer Science - Computers and Society

Attachments

Preprint PDF
Snapshot

Disagreement, AI alignment, and bargaining

Item Type	Journal Article
Author	Harry R. Lloyd
Abstract	New AI technologies have the potential to cause unintended harms in diverse domains including warfare, judicial sentencing, medicine and governance. One strategy for realising the benefits of AI whilst avoiding its potential dangers is to ensure that new AIs are properly ‘aligned’ with some form of ‘alignment target.’ One danger of this strategy is that–dependent on the alignment target chosen–our AIs might optimise for objectives that reflect the values only of a certain subset of society, and that do not take into account alternative views about what constitutes desirable and safe behaviour for AI agents. In response to this problem, several AI ethicists have suggested alignment targets that are designed to be sensitive to widespread normative disagreement amongst the relevant stakeholders. Authors inspired by voting theory have suggested that AIs should be aligned with the verdicts of actual or simulated ‘moral parliaments’ whose members represent the normative views of the relevant stakeholders. Other authors inspired by decision theory and the philosophical literature on moral uncertainty have suggested that AIs should maximise socially expected choiceworthiness. In this paper, I argue that both of these proposals face several important problems. In particular, they fail to select attractive ‘compromise options’ in cases where such options are available. I go on to propose and defend an alternative, bargaining-theoretic alignment target, which avoids the problems associated with the voting- and decision-theoretic approaches.
Date	2024-11-18
Language	en
Library Catalog	DOI.org (Crossref)
URL	https://link.springer.com/10.1007/s11098-024-02224-5
Accessed	11/19/2024, 8:29:08 AM
Publication	Philosophical Studies
DOI	10.1007/s11098-024-02224-5
Journal Abbr	Philos Stud
ISSN	0031-8116, 1573-0883
Date Added	11/19/2024, 8:29:08 AM
Modified	11/19/2024, 8:29:08 AM

Attachments

Lloyd - 2024 - Disagreement, AI alignment, and bargaining.pdf

Taking AI Welfare Seriously

Item Type	Preprint
Author	Robert Long
Author	Jeff Sebo
Author	Patrick Butlin
Author	Kathleen Finlinson
Author	Kyle Fish
Author	Jacqueline Harding
Author	Jacob Pfau
Author	Toni Sims
Author	Jonathan Birch
Author	David Chalmers
Abstract	In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future. That means that the prospect of AI welfare and moral patienthood, i.e. of AI systems with their own interests and moral significance, is no longer an issue only for sci-fi or the distant future. It is an issue for the near future, and AI companies and other actors have a responsibility to start taking it seriously. We also recommend three early steps that AI companies and other actors can take: They can (1) acknowledge that AI welfare is an important and difficult issue (and ensure that language model outputs do the same), (2) start assessing AI systems for evidence of consciousness and robust agency, and (3) prepare policies and procedures for treating AI systems with an appropriate level of moral concern. To be clear, our argument in this report is not that AI systems definitely are, or will be, conscious, robustly agentic, or otherwise morally significant. Instead, our argument is that there is substantial uncertainty about these possibilities, and so we need to improve our understanding of AI welfare and our ability to make wise decisions about this issue. Otherwise there is a significant risk that we will mishandle decisions about AI welfare, mistakenly harming AI systems that matter morally and/or mistakenly caring for AI systems that do not.
Date	2024-11-04
Library Catalog	arXiv.org
URL	http://arxiv.org/abs/2411.00986
Accessed	11/17/2024, 6:29:26 PM
Extra	arXiv:2411.00986 version: 1
DOI	10.48550/arXiv.2411.00986
Repository	arXiv
Archive ID	arXiv:2411.00986
Date Added	11/17/2024, 6:29:26 PM
Modified	11/17/2024, 6:29:26 PM

Tags:

Computer Science - Artificial Intelligence
Computer Science - Computers and Society
Quantitative Biology - Neurons and Cognition

Attachments

Preprint PDF
Snapshot

The Code That Binds Us: Navigating the Appropriateness of Human-AI Assistant Relationships

Item Type	Journal Article
Author	Arianna Manzini
Author	Geoff Keeling
Author	Lize Alberts
Author	Shannon Vallor
Author	Meredith Ringel Morris
Author	Iason Gabriel
Abstract	The development of increasingly agentic and human-like AI assistants, capable of performing a wide range of tasks on user's behalf over time, has sparked heightened interest in the nature and bounds of human interactions with AI. Such systems may indeed ground a transition from task-oriented interactions with AI, at discrete time intervals, to ongoing relationships -- where users develop a deeper sense of connection with and attachment to the technology. This paper investigates what it means for relationships between users and advanced AI assistants to be appropriate and proposes a new framework to evaluate both users' relationships with AI and developers' design choices. We first provide an account of advanced AI assistants, motivating the question of appropriate relationships by exploring several distinctive features of this technology. These include anthropomorphic cues and the longevity of interactions with users, increased AI agency, generality and context ambiguity, and the forms and depth of dependence the relationship could engender. Drawing upon various ethical traditions, we then consider a series of values, including benefit, flourishing, autonomy and care, that characterise appropriate human interpersonal relationships. These values guide our analysis of how the distinctive features of AI assistants may give rise to inappropriate relationships with users. Specifically, we discuss a set of concrete risks arising from user--AI assistant relationships that: (1) cause direct emotional or physical harm to users, (2) limit opportunities for user personal development, (3) exploit user emotional dependence, and (4) generate material dependencies without adequate commitment to user needs. We conclude with a set of recommendations to address these risks.
Date	2024-10-16
Language	en
Short Title	The Code That Binds Us
Library Catalog	ojs.aaai.org
URL	https://ojs.aaai.org/index.php/AIES/article/view/31694
Accessed	10/28/2024, 9:54:41 AM
Rights	Copyright (c) 2024 Association for the Advancement of Artificial Intelligence
Volume	7
Pages	943-957
Publication	Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society
Date Added	10/28/2024, 9:54:41 AM
Modified	10/28/2024, 9:54:41 AM

Attachments

Full Text PDF

Guidelines for ethical use and acknowledgement of large language models in academic writing

Item Type	Journal Article
Author	Sebastian Porsdam Mann
Author	Anuraag A. Vazirani
Author	Mateo Aboy
Author	Brian D. Earp
Author	Timo Minssen
Author	I. Glenn Cohen
Author	Julian Savulescu
Abstract	In this Comment, we propose a cumulative set of three essential criteria for the ethical use of LLMs in academic writing, and present a statement that researchers can quote when submitting LLM-assisted manuscripts in order to testify to their adherence to them.
Date	2024-11-13
Language	en
Library Catalog	www.nature.com
URL	https://www.nature.com/articles/s42256-024-00922-7
Accessed	11/15/2024, 2:40:59 PM
Rights	2024 Springer Nature Limited
Extra	Publisher: Nature Publishing Group
Pages	1-3
Publication	Nature Machine Intelligence
DOI	10.1038/s42256-024-00922-7
Journal Abbr	Nat Mach Intell
ISSN	2522-5839
Date Added	11/15/2024, 2:40:59 PM
Modified	11/15/2024, 2:40:59 PM

Tags:

Ethics
Policy
Publishing

Promotionalism, orthogonality, and instrumental convergence

Item Type	Journal Article
Author	Nathaniel Sharadin
Abstract	Suppose there are no in-principle restrictions on the contents of arbitrarily intelligent agents’ goals. According to “instrumental convergence” arguments, potentially scary things follow. I do two things in this paper. First, focusing on the influential version of the instrumental convergence argument due to Nick Bostrom, I explain why such arguments require an account of “promotion”, i.e., an account of what it is to “promote” a goal. Then, I consider whether extant accounts of promotion in the literature—in particular, probabilistic and fit-based views of promotion—can be used to support dangerous instrumental convergence. I argue that neither account of promotion can do the work. The opposite is true: accepting either account of promotion undermines support for instrumental convergence arguments’ existentially worrying conclusions. The conclusion is that we needn’t be scared—at least not because of arguments concerning instrumental convergence.
Date	2024-10-21
Language	en
Library Catalog	Springer Link
URL	https://doi.org/10.1007/s11098-024-02212-9
Accessed	11/19/2024, 8:30:19 AM
Publication	Philosophical Studies
DOI	10.1007/s11098-024-02212-9
Journal Abbr	Philos Stud
ISSN	1573-0883
Date Added	11/19/2024, 8:30:19 AM
Modified	11/19/2024, 8:30:19 AM

Tags:

Artificial intelligence
Existential risk
Instrumental convergence
Instrumental rationality
Orthogonality
Promotion

Attachments

Full Text PDF

Beyond Preferences in AI Alignment

Item Type	Journal Article
Author	Tan Zhi-Xuan
Author	Micah Carroll
Author	Matija Franklin
Author	Hal Ashton
Abstract	The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term a preferentist approach to AI alignment. In this paper, we characterize and challenge the preferentist approach, describing conceptual and technical alternatives that are ripe for further research. We first survey the limits of rational choice theory as a descriptive model, explaining how preferences fail to capture the thick semantic content of human values, and how utility representations neglect the possible incommensurability of those values. We then critique the normativity of expected utility theory (EUT) for humans and AI, drawing upon arguments showing how rational agents need not comply with EUT, while highlighting how EUT is silent on which preferences are normatively acceptable. Finally, we argue that these limitations motivate a reframing of the targets of AI alignment: Instead of alignment with the preferences of a human user, developer, or humanity-writ-large, AI systems should be aligned with normative standards appropriate to their social roles, such as the role of a general-purpose assistant. Furthermore, these standards should be negotiated and agreed upon by all relevant stakeholders. On this alternative conception of alignment, a multiplicity of AI systems will be able to serve diverse ends, aligned with normative standards that promote mutual benefit and limit harm despite our plural and divergent values.
Date	2024-11-09
Language	en
Library Catalog	Springer Link
URL	https://doi.org/10.1007/s11098-024-02249-w
Accessed	11/19/2024, 8:28:47 AM
Publication	Philosophical Studies
DOI	10.1007/s11098-024-02249-w
Journal Abbr	Philos Stud
ISSN	1573-0883
Date Added	11/19/2024, 8:28:47 AM
Modified	11/19/2024, 8:28:55 AM

Tags:

Artificial Intelligence
Artificial intelligence
AI alignment
Decision theory
Preferences
Rational choice theory
Value theory

Attachments

Full Text PDF