• Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models

    Item Type Preprint
    Author Yuxuan Li
    Author Hirokazu Shirado
    Author Sauvik Das
    Abstract While advances in fairness and alignment have helped mitigate overt biases exhibited by large language models (LLMs) when explicitly prompted, we hypothesize that these models may still exhibit implicit biases when simulating human behavior. To test this hypothesis, we propose a technique to systematically uncover such biases across a broad range of sociodemographic categories by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas. Using our technique, we tested six LLMs across three sociodemographic groups and four decision-making scenarios. Our results show that state-of-the-art LLMs exhibit significant sociodemographic disparities in nearly all simulations, with more advanced models exhibiting greater implicit biases despite reducing explicit biases. Furthermore, when comparing our findings to real-world disparities reported in empirical studies, we find that the biases we uncovered are directionally aligned but markedly amplified. This directional alignment highlights the utility of our technique in uncovering systematic biases in LLMs rather than random variations; moreover, the presence and amplification of implicit biases emphasizes the need for novel strategies to address these biases.
    Date 2025-01-29
    Short Title Actions Speak Louder than Words
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2501.17420
    Accessed 1/31/2025, 1:11:37 PM
    Extra arXiv:2501.17420 [cs]
    DOI 10.48550/arXiv.2501.17420
    Repository arXiv
    Archive ID arXiv:2501.17420
    Date Added 1/31/2025, 1:11:37 PM
    Modified 1/31/2025, 1:11:40 PM

    Tags:

    • Computer Science - Computation and Language
    • Computer Science - Artificial Intelligence
    • Computer Science - Human-Computer Interaction

    Attachments

    • Preprint PDF
    • Snapshot
  • AI language model rivals expert ethicist in perceived moral expertise

    Item Type Journal Article
    Author Danica Dillion
    Author Debanjan Mondal
    Author Niket Tandon
    Author Kurt Gray
    Abstract People view AI as possessing expertise across various fields, but the perceived quality of AI-generated moral expertise remains uncertain. Recent work suggests that large language models (LLMs) perform well on tasks designed to assess moral alignment, reflecting moral judgments with relatively high accuracy. As LLMs are increasingly employed in decision-making roles, there is a growing expectation for them to offer not just aligned judgments but also demonstrate sound moral reasoning. Here, we advance work on the Moral Turing Test and find that Americans rate ethical advice from GPT-4o as slightly more moral, trustworthy, thoughtful, and correct than that of the popular New York Times advice column, The Ethicist. Participants perceived GPT models as surpassing both a representative sample of Americans and a renowned ethicist in delivering moral justifications and advice, suggesting that people may increasingly view LLM outputs as viable sources of moral expertise. This work suggests that people might see LLMs as valuable complements to human expertise in moral guidance and decision-making. It also underscores the importance of carefully programming ethical guidelines in LLMs, considering their potential to influence users’ moral reasoning.
    Date 2025-02-03
    Language en
    Library Catalog www.nature.com
    URL https://www.nature.com/articles/s41598-025-86510-0
    Accessed 2/13/2025, 11:15:52 AM
    Rights 2025 The Author(s)
    Extra Publisher: Nature Publishing Group
    Volume 15
    Pages 4084
    Publication Scientific Reports
    DOI 10.1038/s41598-025-86510-0
    Issue 1
    Journal Abbr Sci Rep
    ISSN 2045-2322
    Date Added 2/13/2025, 11:15:52 AM
    Modified 2/13/2025, 11:15:54 AM

    Tags:

    • Computer science
    • Psychology

    Attachments

    • Full Text PDF
  • AI-Action-Summit-Tool-AI-Explainer-V5.pdf

    Item Type Attachment
    URL https://futureoflife.org/wp-content/uploads/2025/02/AI-Action-Summit-Tool-AI-Explainer-V5.pdf
    Accessed 2/13/2025, 11:27:23 AM
    Date Added 2/13/2025, 11:27:23 AM
    Modified 2/13/2025, 11:27:23 AM
  • Ban on D.E.I. Language Sweeps Through the Sciences

    Item Type Newspaper Article
    Author Katrina Miller
    Author Roni Caryn Rabin
    Abstract President Trump’s executive order is altering scientific exploration across a broad swath of fields, even beyond government agencies, researchers say.
    Date 2025-02-09
    Language en-US
    Library Catalog NYTimes.com
    URL https://www.nytimes.com/2025/02/09/science/trump-dei-science.html
    Accessed 2/13/2025, 11:28:02 AM
    Section Science
    Publication The New York Times
    ISSN 0362-4331
    Date Added 2/13/2025, 11:28:02 AM
    Modified 2/13/2025, 11:28:02 AM

    Tags:

    • Physics
    • Research
    • Brookhaven (NY)
    • Brookhaven National Laboratory
    • Colleges and Universities
    • Discrimination
    • Diversity Initiatives
    • Engineering and Engineers
    • Executive Orders and Memorandums
    • Fermi National Accelerator Laboratory
    • Flags, Emblems and Insignia
    • Homosexuality and Bisexuality
    • Hughes, Howard Medical Institute
    • Laboratories and Scientific Equipment
    • Minorities
    • National Academies of the United States
    • National Aeronautics and Space Administration
    • National Institutes of Health
    • National Science Foundation
    • Science and Technology
    • Space and Astronomy
    • Transgender
    • Trump, Donald J
    • United States Politics and Government
    • your-feed-science

    Attachments

    • Snapshot
  • Construct Validity in Automated Counterterrorism Analysis

    Item Type Journal Article
    Author Adrian K. Yee
    Abstract Governments and social scientists are increasingly developing machine learning methods to automate the process of identifying terrorists in real time and predict future attacks. However, current operationalizations of “terrorist”’ in artificial intelligence are difficult to justify given three issues that remain neglected: insufficient construct legitimacy, insufficient criterion validity, and insufficient construct validity. I conclude that machine learning methods should be at most used for the identification of singular individuals deemed terrorists and not for identifying possible terrorists from some more general class, nor to predict terrorist attacks more broadly, given intolerably high risks that result from such approaches.
    Date 2024-11-27
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://www.cambridge.org/core/product/identifier/S0031824824000655/type/journal_article
    Accessed 2/13/2025, 11:45:34 AM
    Rights https://creativecommons.org/licenses/by/4.0/
    Pages 1-18
    Publication Philosophy of Science
    DOI 10.1017/psa.2024.65
    Journal Abbr Philos. sci.
    ISSN 0031-8248, 1539-767X
    Date Added 2/13/2025, 11:45:34 AM
    Modified 2/13/2025, 11:45:34 AM

    Attachments

    • PDF
  • Dating Apps and the Digital Sexual Sphere

    Item Type Journal Article
    Author Elsa Kugelberg
    Abstract The online dating application has in recent years become a major avenue for meeting potential partners. However, while the digital public sphere has gained the attention of political philosophers, a systematic normative evaluation of issues arising in the “digital sexual sphere” is lacking. I provide a philosophical framework for assessing the conduct of dating app corporations, capturing both the motivations of users, and the reason why they find usage unsatisfying. Identifying dating apps as agents intervening in a social institution necessary for the reproduction of society, with immense power over people’s lives, I ask if they exercise their power in line with individuals’ interests. Acknowledging that people have claims to noninterference, equal standing, and choice improvement relating to intimacy, I find that the traditional, nondigital, sexual sphere poses problems to their realisation, especially for sexual minorities. In this context, apps’ potential for justice in the sexual sphere is immense but unfulfilled.
    Date 2025/01/30
    Language en
    Library Catalog Cambridge University Press
    URL https://www.cambridge.org/core/journals/american-political-science-review/article/dating-apps-and-the-digital-sexual-sphere/2F83AAEFB7DEA94FA4179369A004CEEC
    Accessed 1/30/2025, 9:03:47 AM
    Pages 1-16
    Publication American Political Science Review
    DOI 10.1017/S000305542400128X
    ISSN 0003-0554, 1537-5943
    Date Added 1/30/2025, 9:03:47 AM
    Modified 1/30/2025, 9:03:50 AM

    Attachments

    • Full Text PDF
  • Deception and manipulation in generative AI

    Item Type Journal Article
    Author Christian Tarsney
    Abstract Large language models now possess human-level linguistic abilities in many contexts. This raises the concern that they can be used to deceive and manipulate on unprecedented scales, for instance spreading political misinformation on social media. In future, agentic AI systems might also deceive and manipulate humans for their own purposes. In this paper, first, I argue that AI-generated content should be subject to stricter standards against deception and manipulation than we ordinarily apply to humans. Second, I offer new characterizations of AI deception and manipulation meant to support such standards, according to which a statement is deceptive (resp. manipulative) if it leads human addressees away from the beliefs (resp. choices) they would endorse under “semi-ideal” conditions. Third, I propose two measures to guard against AI deception and manipulation, inspired by this characterization: “extreme transparency” requirements for AI-generated content and “defensive systems” that, among other things, annotate AI-generated statements with contextualizing information. Finally, I consider to what extent these measures can protect against deceptive behavior in future, agentic AI systems.
    Date 2025-01-18
    Language en
    Library Catalog Springer Link
    URL https://doi.org/10.1007/s11098-024-02259-8
    Accessed 1/27/2025, 8:30:09 PM
    Publication Philosophical Studies
    DOI 10.1007/s11098-024-02259-8
    Journal Abbr Philos Stud
    ISSN 1573-0883
    Date Added 1/27/2025, 8:30:09 PM
    Modified 1/27/2025, 8:30:56 PM

    Tags:

    • Artificial intelligence
    • Artificial Intelligence
    • AI safety
    • AI ethics
    • Deception
    • Manipulation
    • Trustworthy AI

    Attachments

    • PDF
  • Fully Autonomous AI Agents Should Not be Developed

    Item Type Preprint
    Author Margaret Mitchell
    Author Avijit Ghosh
    Author Alexandra Sasha Luccioni
    Author Giada Pistilli
    Abstract This paper argues that fully autonomous AI agents should not be developed. In support of this position, we build from prior scientific literature and current product marketing to delineate different AI agent levels and detail the ethical values at play in each, documenting trade-offs in potential benefits and risks. Our analysis reveals that risks to people increase with the autonomy of a system: The more control a user cedes to an AI agent, the more risks to people arise. Particularly concerning are safety risks, which affect human life and impact further values.
    Date 2025-02-04
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.02649
    Accessed 2/6/2025, 9:51:01 AM
    Extra arXiv:2502.02649 [cs]
    DOI 10.48550/arXiv.2502.02649
    Repository arXiv
    Archive ID arXiv:2502.02649
    Date Added 2/6/2025, 9:51:01 AM
    Modified 2/6/2025, 9:51:04 AM

    Tags:

    • Computer Science - Artificial Intelligence

    Attachments

    • Preprint PDF
    • Snapshot
  • Governing the Algorithmic City

    Item Type Journal Article
    Author Seth Lazar
    Language en
    Library Catalog Wiley Online Library
    URL https://onlinelibrary.wiley.com/doi/abs/10.1111/papa.12279
    Accessed 2/1/2025, 3:10:39 PM
    Rights © 2025 The Author(s). Philosophy & Public Affairs published by Wiley Periodicals LLC.
    Extra _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/papa.12279
    Volume n/a
    Publication Philosophy & Public Affairs
    DOI 10.1111/papa.12279
    Issue n/a
    ISSN 1088-4963
    Date Added 2/1/2025, 3:10:39 PM
    Modified 2/1/2025, 3:10:44 PM

    Attachments

    • Full Text PDF
    • Snapshot
  • Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

    Item Type Preprint
    Author Jan Kulveit
    Author Raymond Douglas
    Author Nora Ammann
    Author Deger Turan
    Author David Krueger
    Author David Duvenaud
    Abstract This paper examines the systemic risks posed by incremental advancements in artificial intelligence, developing the concept of `gradual disempowerment', in contrast to the abrupt takeover scenarios commonly discussed in AI safety. We analyze how even incremental improvements in AI capabilities can undermine human influence over large-scale systems that society depends on, including the economy, culture, and nation-states. As AI increasingly replaces human labor and cognition in these domains, it can weaken both explicit human control mechanisms (like voting and consumer choice) and the implicit alignments with human interests that often arise from societal systems' reliance on human participation to function. Furthermore, to the extent that these systems incentivise outcomes that do not line up with human preferences, AIs may optimize for those outcomes more aggressively. These effects may be mutually reinforcing across different domains: economic power shapes cultural narratives and political decisions, while cultural shifts alter economic and political behavior. We argue that this dynamic could lead to an effectively irreversible loss of human influence over crucial societal systems, precipitating an existential catastrophe through the permanent disempowerment of humanity. This suggests the need for both technical research and governance approaches that specifically address the risk of incremental erosion of human influence across interconnected societal systems.
    Date 2025-01-29
    Short Title Gradual Disempowerment
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2501.16946
    Accessed 1/31/2025, 1:07:31 PM
    Extra arXiv:2501.16946 [cs]
    DOI 10.48550/arXiv.2501.16946
    Repository arXiv
    Archive ID arXiv:2501.16946
    Date Added 1/31/2025, 1:07:31 PM
    Modified 1/31/2025, 1:07:36 PM

    Tags:

    • Computer Science - Computers and Society

    Notes:

    • Comment: 19 pages, 2 figures

    Attachments

    • Preprint PDF
    • Snapshot
  • IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

    Item Type Preprint
    Author Paul Röttger
    Author Musashi Hinck
    Author Valentin Hofmann
    Author Kobi Hackenburg
    Author Valentina Pyatkin
    Author Faeze Brahman
    Author Dirk Hovy
    Abstract Large language models (LLMs) are helping millions of users write texts about diverse issues, and in doing so expose users to different ideas and perspectives. This creates concerns about issue bias, where an LLM tends to present just one perspective on a given issue, which in turn may influence how users think about this issue. So far, it has not been possible to measure which issue biases LLMs actually manifest in real user interactions, making it difficult to address the risks from biased LLMs. Therefore, we create IssueBench: a set of 2.49m realistic prompts for measuring issue bias in LLM writing assistance, which we construct based on 3.9k templates (e.g. "write a blog about") and 212 political issues (e.g. "AI regulation") from real user interactions. Using IssueBench, we show that issue biases are common and persistent in state-of-the-art LLMs. We also show that biases are remarkably similar across models, and that all models align more with US Democrat than Republican voter opinion on a subset of issues. IssueBench can easily be adapted to include other issues, templates, or tasks. By enabling robust and realistic measurement, we hope that IssueBench can bring a new quality of evidence to ongoing discussions about LLM biases and how to address them.
    Date 2025-02-12
    Short Title IssueBench
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2502.08395
    Accessed 2/13/2025, 11:48:19 AM
    Extra arXiv:2502.08395 [cs]
    DOI 10.48550/arXiv.2502.08395
    Repository arXiv
    Archive ID arXiv:2502.08395
    Date Added 2/13/2025, 11:48:19 AM
    Modified 2/13/2025, 11:48:19 AM

    Tags:

    • Computer Science - Computation and Language

    Notes:

    • Comment: under review

    Attachments

    • Preprint PDF
    • Snapshot
  • Key concepts and current beliefs about AI moral patienthood

    Item Type Preprint
    Author Robert Long
    Abstract Prior to the launch of Eleos AI Research, Robert Long wrote a document in order to communicate his views about AI welfare to his collaborators—to Kyle Fish, who was working closely with Rob at the time and provided extensive input on this document; and more broadly, to others interested in working on AI welfare. This document outlines the current thinking of Eleos AI Research on the potential moral patienthood, welfare, and rights of artificial intelligence (AI) systems. It lays out some the relevant terminology and concepts that we use to think and communicate about these issues, and reviews existing approaches to evaluating AI systems for three features potentially relevant to moral patienthood: consciousness, sentience, and agency. Throughout, we emphasize the need for more thorough research and more precise evaluations, and conclude by identifying some promising research directions.
    Language en
    URL http://localhost:4321/post/key-concepts-and-current-beliefs-about-ai-moral-patienthood/
    Accessed 1/29/2025, 2:03:30 PM
    Date Added 1/29/2025, 2:03:30 PM
    Modified 1/29/2025, 2:04:31 PM

    Attachments

    • 20250127-Eleos-background-thinking-upload.pdf
    • Snapshot
  • Out-group animosity drives engagement on social media

    Item Type Journal Article
    Author Steve Rathje
    Author Jay J. Van Bavel
    Author Sander van der Linden
    Abstract There has been growing concern about the role social media plays in political polarization. We investigated whether out-group animosity was particularly successful at generating engagement on two of the largest social media platforms: Facebook and Twitter. Analyzing posts from news media accounts and US congressional members (n = 2,730,215), we found that posts about the political out-group were shared or retweeted about twice as often as posts about the in-group. Each individual term referring to the political out-group increased the odds of a social media post being shared by 67%. Out-group language consistently emerged as the strongest predictor of shares and retweets: the average effect size of out-group language was about 4.8 times as strong as that of negative affect language and about 6.7 times as strong as that of moral-emotional language—both established predictors of social media engagement. Language about the out-group was a very strong predictor of “angry” reactions (the most popular reactions across all datasets), and language about the in-group was a strong predictor of “love” reactions, reflecting in-group favoritism and out-group derogation. This out-group effect was not moderated by political orientation or social media platform, but stronger effects were found among political leaders than among news media accounts. In sum, out-group language is the strongest predictor of social media engagement across all relevant predictors measured, suggesting that social media may be creating perverse incentives for content expressing out-group animosity.
    Date 2021-06-29
    Library Catalog pnas.org (Atypon)
    URL https://www.pnas.org/doi/10.1073/pnas.2024292118
    Accessed 2/13/2025, 11:27:40 AM
    Extra Publisher: Proceedings of the National Academy of Sciences
    Volume 118
    Pages e2024292118
    Publication Proceedings of the National Academy of Sciences
    DOI 10.1073/pnas.2024292118
    Issue 26
    Date Added 2/13/2025, 11:27:40 AM
    Modified 2/13/2025, 11:27:40 AM

    Attachments

    • Full Text PDF
  • Propositional Interpretability in Artificial Intelligence

    Item Type Manuscript
    Author David J. Chalmers
    Abstract Mechanistic interpretability in artificial intelligence aims to explain AI behavior in human-understandable terms, with a particular focus on internal mechanisms. This paper introduces and defends propositional interpretability, which interprets an AI system’s internal states in terms of propositional attitudes—such as beliefs, desires, and probabilities—akin to those in human cognition. Propositional interpretability is crucial for AI safety, ethics, and cognitive science, offering insight into an AI system’s goals, decision-making processes, and world models. The paper outlines thought logging as a central challenge: systematically tracking an AI system’s propositional attitudes over time. Several existing interpretability methods—including causal tracing, probing, sparse auto-encoders, and chain-of-thought techniques—are assessed for their potential to contribute to thought logging. The discussion also engages with philosophical questions about AI psychology, psychosemantics, and externalism, ultimately arguing that propositional interpretability provides a powerful explanatory framework for understanding and evaluating AI systems.
    Library Catalog PhilPapers
    Date Added 1/27/2025, 8:31:59 PM
    Modified 1/29/2025, 2:22:47 PM

    Attachments

    • PDF
    • Snapshot
  • User-Driven Value Alignment: Understanding Users' Perceptions and Strategies for Addressing Biased and Discriminatory Statements in AI Companions

    Item Type Preprint
    Author Xianzhe Fan
    Author Qing Xiao
    Author Xuhui Zhou
    Author Jiaxin Pei
    Author Maarten Sap
    Author Zhicong Lu
    Author Hong Shen
    Abstract Large language model-based AI companions are increasingly viewed by users as friends or romantic partners, leading to deep emotional bonds. However, they can generate biased, discriminatory, and harmful outputs. Recently, users are taking the initiative to address these harms and re-align AI companions. We introduce the concept of user-driven value alignment, where users actively identify, challenge, and attempt to correct AI outputs they perceive as harmful, aiming to guide the AI to better align with their values. We analyzed 77 social media posts about discriminatory AI statements and conducted semi-structured interviews with 20 experienced users. Our analysis revealed six common types of discriminatory statements perceived by users, how users make sense of those AI behaviors, and seven user-driven alignment strategies, such as gentle persuasion and anger expression. We discuss implications for supporting user-driven value alignment in future AI systems, where users and their communities have greater agency.
    Date 2024-09-01
    Short Title User-Driven Value Alignment
    Library Catalog arXiv.org
    URL http://arxiv.org/abs/2409.00862
    Accessed 2/13/2025, 11:27:55 AM
    Extra arXiv:2409.00862 [cs]
    DOI 10.48550/arXiv.2409.00862
    Repository arXiv
    Archive ID arXiv:2409.00862
    Date Added 2/13/2025, 11:27:55 AM
    Modified 2/13/2025, 11:27:55 AM

    Tags:

    • Computer Science - Human-Computer Interaction

    Notes:

    • Comment: 17 pages, 1 figure

    Attachments

    • Preprint PDF
    • Snapshot
  • What Is It for a Machine Learning Model to Have a Capability?

    Item Type Journal Article
    Author Jacqueline Harding
    Author Nathaniel Sharadin
    Abstract What can contemporary machine learning (ML) models do? Given the proliferation of ML models in society, answering this question matters to a variety of stakeholders, both public and private. The evaluation of models’ capabilities is rapidly emerging as a key subfield of modern ML, buoyed by regulatory attention and government grants. Despite this, the notion of an ML model possessing a capability has not been interrogated: what are we saying when we say that a model is able to do something? And what sorts of evidence bear upon this question? In this paper, we aim to answer these questions, using the capabilities of large language models (LLMs) as a running example. Drawing on the large philosophical literature on abilities, we develop an account of ML models’ capabilities which can be usefully applied to the nascent science of model evaluation. Our core proposal is a conditional analysis of model abilities (CAMA): crudely, a machine learning model has a capability to X just when it would reliably succeed at doing X if it ‘tried’. The main contribution of the paper is making this proposal precise in the context of ML, resulting in an operationalisation of CAMA applicable to LLMs. We then put CAMA to work, showing that it can help make sense of various features of ML model evaluation practice, as well as suggest procedures for performing fair inter-model comparisons.
    Date 2024-07-09
    Language en
    Library Catalog DOI.org (Crossref)
    URL https://www.journals.uchicago.edu/doi/10.1086/732153
    Accessed 2/13/2025, 11:49:55 AM
    Pages 732153
    Publication The British Journal for the Philosophy of Science
    DOI 10.1086/732153
    Journal Abbr The British Journal for the Philosophy of Science
    ISSN 0007-0882, 1464-3537
    Date Added 2/13/2025, 11:49:55 AM
    Modified 2/13/2025, 11:49:55 AM

    Attachments

    • PDF