A shrinking path to safety: how a narrowly technical approach to align AI with the public good could fail

Image: Anton Grabolle / Better Images of AI / Autonomous Driving / CC-BY 4.0

On September 4, the UK government announced plans to host the first global “AI Safety Summit” at historic Bletchley Park on the 1^st and 2^nd of November. The announcement made clear that the focus of the summit would be the grave risks posed by so-called “frontier AI” – that is, emerging and future AI systems not yet in deployment that may greatly surpass the capabilities of existing AI systems. The announcement states that the Summit will also ‘focus on how safe AI can be used for public good and to improve people’s lives – from lifesaving medical technology to safer transport.’

Just three days later, the UK’s Department of Science, Innovation and Technology published the first progress report of the “Frontier AI Taskforce,” in which it announced the creation of ‘an expert advisory board spanning AI Research and National Security’ and an ambitious new programme to recruit expert AI researchers into government and partner with ‘leading technical organisations’ to meet the challenge of AI safety posed by frontier models.

The roadmap for the Frontier AI Taskforce makes clear what kind of knowledge the UK is seeking to leverage to attain these worthy goals, and who it will seek that knowledge from: namely, technical experts.

The Taskforce progress report explains that to manage this risk, “technical evaluations are critical,” and thus “we are hiring technical AI experts into government at start-up speed.” The report sights a path to AI safety through “building the technical foundations for AI research inside government,” and to do that, it asks, “we need more technical experts and more leading technical organisations to come and support us.”

There is no question that the UK government must acquire a great deal of technical AI expertise in order to effectively govern the considerable risks of AI, while securing the potential benefits for the public good. But is investing in technical expertise alone a road to safety?

Learning from our mistakes: what history teaches us about safety

Consider other safety-critical endeavours. Can you design a safe automobile with expert engineers alone? Of course you can’t. You need experts in the biological fragilities and tolerances of different human bodies. You need experts in the psychological ‘human factors’ that shape driving behaviours. You need experts in the limits of human vision and cognition, in the social dynamics of driving, and in the moral, legal, political and economic tradeoffs that come with different kinds of safety design choices.

Do you know what happened when we tried to engineer safe automobiles by relying on ‘experts’ with deep but dangerously narrow technical knowledge? We got vehicles that kill women drivers and passengers at vastly higher rates, because those technical experts wrongly assumed that male and female bodies had the same vulnerabilities, and that seatbelts and crash test dummies designed for the average male frame would be equally protective of all.

How well did the technical experts who designed and calibrated pulse oximeters do at securing human safety when they failed to ensure the accuracy of these devices for darker-skinned patients, leading to routine failures during the COVID pandemic to detect dangerously low oxygen levels among some of the most heavily impacted populations?

Can you design a safe biosecurity lab for dangerous pathogens without expertise in the kinds of mistakes that humans make when they are tired and distracted? Can you design a safe cybersecurity protocol without understanding that the biggest, most vulnerable and most attractive attack surface for a hacker is not technical, but psychological?

Then why do we think that we can build safe AI systems by building a foundation of narrowly technical expertise in AI research? If we do so we will repeat the same mistakes that we’ve made again and again in safety-critical systems – treating safety as a purely technical problem to be cleverly solved rather than a human one to be wisely managed.

Building safety together: Safe AI is Responsible AI

This is particularly dangerous for AI, given that the most effective AI safety tools we have – part of a remarkable body of knowledge and technique amassed over the past decade within the field of “Responsible AI” research – were developed almost entirely by interdisciplinary teams of experts who have learned to cut across technical, scientific, social, humanistic, creative and design knowledge silos.

Responsible AI tools for locating the vulnerabilities, limits and dangers of AI models, such as ‘red-teaming’ exercises adapted from cybersecurity practice, as well as algorithmic auditing tools that evaluate the potential of AI models to produce unjust, dangerous and erroneous outputs, were not developed by machine learning researchers working in technical bubbles. They were built by collaborative teams of computer and data scientists working with social scientists, philosophers and ethicists, designers and civil society partners who have broader experience analyzing and mitigating the potential harms of AI systems.

There is no safety-critical domain in human history in which we got ahead of a technology’s risks by throwing away all that we already knew about how to make it safe and beneficial. There is no such domain in which we learned more about a tool’s safety by narrowing our view of the expertise needed to secure it. AI is no different, whether we are talking about ‘frontier’ models or the tools that already challenge our safety today.

Prominent academic and industry-attended conferences in Responsible AI, like FAccT and AIES (AI Ethics and Society) draw precisely from this broader and deeper well of knowledge. So do initiatives like the Global Partnership on AI, and Stanford’s AI100, a longitudinal hundred-year study of AI. So do leading UK AI research organisations like The Alan Turing Institute and the Ada Lovelace Institute. So do many large UKRI-funded research programmes like BRAID (Bridging Responsible AI Divides), RAI UK (Responsible AI UK) and TAS (Trustworthy Autonomous Systems). None of these efforts frame their work as narrowly technical endeavours.

In fact, these initiatives support technical excellence not by recruiting technologists to work in isolated, blinkered silos—divorced from diverse perspectives and the deep knowledge of the humane and social context they are trying to protect— but instead, alongside experts in law, ethics, politics, economics, design, the arts, the history and philosophy of technology, and critical social, interpersonal and cultural dynamics. In these multidisciplinary efforts, we link complementary methodologies together to go farther than before, or even create new ones to develop the unique forms of knowledge we need to make AI safe and beneficial.

These programmes also create wider, more diverse partnerships by translating knowledge across the divides between academia, industry, policy, government and civil society. For example, our BRAID programme, funded by the Arts and Humanities Research Council, launched a policy collaboration earlier this year with the Department of Culture, Media and Sport and representatives of the Office for AI. The project translated cutting-edge academic research into practical, immediate insights for policymakers on the risks of generative AI models, AI’s impact on the creative sector, and lessons for AI from the history of technology governance. We also launched a new fellowship programme that enables academics to embed with nonacademic stakeholders to show how Responsible AI knowledge can flow between researchers, organisations and communities.

The path to AI safety is wide and inclusive, not narrow and exclusive. It reflects the fact that at the end of the day, safety is a human and social concern, not a technical one. Technical approaches are part of what we need to know to make AI safe and beneficial; but the history of failures in safety engineering tells us what happens when we mistake that part for the whole.

Why the hardest problems of AI safety aren’t technical

One of the Summit’s priorities is to minimize the risk that ‘bad actors’ misuse AI to ‘create significant harm’. Which technical experts can judge what makes a harm to you ‘significant’? A second Summit priority is to manage the risks of losing control of advanced AI systems, so that they would no longer be aligned with ‘our values and intentions.’ Which technical experts at the Summit, or on the Frontier AI Taskforce, are trained to handle this question: “whose values and intentions?”

For in focusing narrowly on the speculative long-term horizon of ‘frontier’ models, the Summit’s scope excludes so-called “low risk” systems in existence today – but “low risk” to whom? Who gets to decide what safety means, and what risks are ‘high’ enough for us to prioritize? Which ‘technical experts’ and ‘leading technical organisations’ get to decide who gets to be safe from AI? Many human beings aren’t safe from AI now.

Finally, let’s remember that ‘safety’ itself warrants critical examination as the ultimate goal of AI governance. Who gets to decide if safety is good enough, or even the most important value? Are we willing to pursue safety by any means necessary, or only by morally justifiable means? Are you willing to trade justice for safety? What about your human rights? What about democracy, or civil liberty? What else besides safety is required for shared human flourishing? These are not technical questions. And any AI Safety Summit should be prepared to ask them.

Let’s remember that it wasn’t an engineer or a computer scientist, but a philosopher and political theorist, Jean-Jacques Rousseau, who had the wisdom to warn us 261 years ago in The Social Contract that if we pursue safety without asking these critical questions, we may unwittingly trade our futures for it. For as he reminded us, “humans can live tranquilly also in dungeons.”

AI safety is a vital and urgent priority, and it is essential that the UK government promote it. But what safety with AI means, whose safety is prioritised, and how we can justly secure it without losing all that makes a future worth wanting, is something that no narrowly technocratic exercise can tell us.

Shannon Vallor
Co-Director of BRAID (Bridging Responsible AI Divides)
Baillie Gifford Professor of the Ethics of Data and Artificial Intelligence
Director, Centre for Technomoral Futures at the Edinburgh Futures Institute
The University of Edinburgh

Ewa Luger
Co-Director of BRAID (Bridging Responsible AI Divides)
Personal Chair in Human-Data Interaction
Co-Director, Institute of Design Informatics
Edinburgh College of Art