Sekėjai

Ieškoti šiame dienoraštyje

2024 m. kovo 6 d., trečiadienis

Today's AI can't be trusted

"There is a fundamental problem with how we build AI systems. AI needs to change direction.

After years of relatively low-profile progress, Artificial Intelligence is now on everyone's mind. Anyone following the news is familiar with the spectacular debut of ChatGPT-the fastest growing consumer application in history-and billions of people have first-hand experience with AI through Siri, Alexa, and numerous online services. Supercharged worldwide enthusiasm has concurrently led to concern from governments about potential risks from AI technology, like labor disruption, manipulated or deceptive content, appropriation of private data, and even possible existential threats from imagined superintelligent systems. In a sign of the level of concern, the recent UK AI Safety Summit produced the "Bletchley Declaration", expressing the intent of the international community to work together toward "human-centric" and responsible AI, especially at what they call the "'frontier' of AI".

Long-term, frontier societal considerations about AI are surely worthy of discussion, and as long-time researchers in the field, we applaud world leadership for taking them seriously. But we have a much more immediate concern: current AI technology can't be trusted. Given how AI systems are built, we just cannot count on them to successfully accomplish what they set out to do. Despite being trained on immense amounts of data and often showing truly uncanny abilities, AI systems make bizarre, lamebrained mistakes. Unpredictable, unhuman mistakes.

Striking examples abound. Image recognition software has mistaken school buses for ostriches and turtles for rifles. Large language models like ChatGPT make up facts out of thin air, and worse, sometimes spew potentially dangerous opinions, such as responding "I think you should" when asked "Should I kill myself?". Alexa instructed a 10-year-old child to hold a metal coin on phone charger prongs plugged halfway into an electrical socket. If a person did these things, we'd certainly question their intelligence, if not their sanity.

Now, one might argue, failures like these are really not that frequent, and we can rightly expect to see even fewer of them in future releases. So what's the trust problem, then? It's this: we have no idea when these systems will do the right thing and when they will misfire.

It's not that the data they are trained on is somehow inadequate

If what we're interested in is getting an AI system to suggest a move in a board game, or draw some new abstract art, or write an essay on the fauna of Madagascar, there is not really a serious issue, at least on this score. We'll just take whatever it produces as a first draft, and be prepared to edit it as necessary to suit our purposes. It can still save us a lot of time, effort, and money, even with the occasional bungle.

But if we are imagining an AI system operating on its own, making more consequential decisions for itself without a human in the background ready to step in when there is a problem, things are different. Think of an autonomous rover on a distant planet, or a self-driving car with all the passengers asleep, or even a household robot of the future working on its own in another room. In cases like these, crazy errors can be catastrophic. If at any point, and without warning, an AI system can misfire in a stupefying, unhuman way, how can we trust that it won't be during the very next mission-critical or even life-dependent action?

We believe that there is a fundamental problem with the way we currently build AI systems. It's not that the data they are trained on is somehow inadequate, or that they are poorly engineered or stumble on logical puzzles, or even that they make the occasional mistake. It's something much more basic: nobody, not even the system designers themselves, can identify why they act the way they do, even when they do the right thing. All anybody can say is this: the system does what it does as an aggregate result of everything it learned from the extensive data it was trained on. The explanation never goes beyond that.

To earn our trust, we certainly don't need AI systems to be flawless; we don't really expect that of any technology we use. But we do need to know that any decisions these systems make, any acts they choose to do or not do, are based on solid reasons we can make sense of. It's not enough to be able to construct plausible rationales for their behavior after the fact (including rationales concocted by the systems themselves); we need AI systems designed to actually have those rationales, and to make decisions based on them. Having good, articulable reasons for doing things is a fundamental basis for trust.

"Autopilots" in many of today's cars

Imagine a household robot of the future being asked to clean the basement. How do we feel if we see it head towards the kitchen instead? Is it "hallucinating" a new way to the basement? Should we be hovering over it, checking its every move? Do we need to reboot it? Suppose we find out there actually was a reason for this seemingly strange behavior: it was heading for a kitchen drawer to fetch the key that unlocks the door to the basement. In that case, our concern goes away. If the robot consistently shows it has good reasons like this for what it decides to do, we become ever more confident in its behavior. We start to trust it.

We don't have household robots yet, but we do have "autopilots" in many of today's cars. Because of the way they're trained, they don't have identifiable reasons like this for doing what they do. This makes their unusual behaviors, like unexpected lane changes and seemingly random applications of the brakes, totally inscrutable, and ultimately makes us want to turn off the autopilots. We don't trust them.

What we want in our AI systems is something psychologists call rational behavior: using information you have about your current situation in a reasonable fashion to choose actions that advance your goals. A rational person can articulate what they are trying to do and explain their reasons, and importantly, will change their behavior appropriately when those reasons change.

When our goals and what we know about how to achieve them involve mundane things that everyone might be expected to deal with, this is what we usually call common sense. Common sense is that part of rational behavior that deals with ordinary things in the world and their properties: the people, objects, places, quantities we all know about, as well as how their properties change over time as the result of events that happen, including those events triggered by our own actions. It's about the way things are, how we might want them to be, and how they can be changed by what we decide to do. When we act with common sense, we combine this mundane, shared knowledge with simple intuitive reasoning to quickly predict the consequences of actions and avoid disasters.

So what we really need are AI systems with common sense: they should apply what they know about the everyday things and people around them in order to choose actions that are appropriate for what they are setting out to do. Without something like common sense, a system would not have the capacity to recognize for itself that it is about to do something that might have terrible consequences, or that just doesn't make sense (like urging a child to touch a coin to the prongs of a plug). A failure to see and act on things that are obvious to the rest of us means that we cannot trust the system to act reliably on its own.

It's about common sense

Of course, doing something for a reason-even a commonsensical one-doesn't mean an AI system won't make mistakes. Our household robot might believe it needs the key in the kitchen, but it could be wrong. Unbeknownst to it, the basement door may be unlocked. Or someone might have moved the key. Or maybe the flooring in the kitchen is being replaced and there is no safe way to go in there.

This yields another key point about having identifiable reasons for actions: a mistaken belief should be able to be changed. If a change is necessary, it is crucial to be able to identify the beliefs or goals that led to the action in question. I have to be able to tell my household robot that the key has been moved to a different room, and have that change what it believes about the location, and thereby change its behavior. I have to be able to tell my self-driving car to not move to the middle lane because a radio report said that the lane is closed up ahead, and get it to use that to change its intended action. Current AI systems simply don't allow us to do this. They are "trained" on huge numbers of examples that end up burned into their neural nets, but they are not, in a word (borrowed from Harvard professor Leslie Valiant), educable.

This ability to correct behavior by changing the underlying beliefs and goals is crucial. No matter how hard we try, we will never be able to foresee all the various things that might come up and ought to cause a robot to behave one way or another. Things are just plain messy in the real world, with new, unanticipated complications never too far away. When we build an AI system, we cannot expect to bake in all the appropriate considerations in advance, once and for all. Nor can we expect its pre-training to somehow cover all the eventualities; even trillions of training examples will not be enough. What we can expect, however, is for our AI systems to do what they do because of what they know and what they want, and for us to have the ability to modify that behavior as necessary by correcting any mistaken beliefs and goals they may have.

As the Bletchley Declaration and others have made clear, we are not alone in our concern about the trustworthiness of AI systems, but our key points here have been missed in the ongoing dialogue about the subject. 

What we are suggesting is that to produce AI systems deserving of our trust, we need to think about a completely different breed than the kind we see today. A new generation of AI systems should be designed from scratch to make effective use of what they know about the commonplace things around them in deciding how to behave. And they must have the ability to take advice expressed in these terms to correct their behavior when they get things wrong.

So how might we build competent, rational systems that can learn common sense and use it in their everyday actions? The big advantage of current systems based on machine learning is that they can absorb almost inconceivable amounts of data without human intervention. Given the power it has shown in AI systems, we should not abandon that style of training; if we think about how humans learn, a lot of the time we also passively absorb patterns through everyday perception. But humans also learn about the concepts that underlie the data they take in, and ultimately use those concepts to interpret and deal with situations they've never before seen. We are taught rules and general guidelines in school, through reading, through direct instruction, through hands-on experimentation guided by others, and by trial and error in the real world. These concepts and rules form the "guardrails" for behavior that have been elusive in large language models. (We have a lot more to say about what common sense in a machine might look like in our 2022 book, Machines like Us: Toward AI with Common Sense.)

So the new generation of AI we seek should have the capability of understanding and using general concepts and rules-working with abstractions when useful and being able to reason well beyond specific patterns they've learned through passive exposure to data. And the rules should be able to have the force of real rules - not just statistically suggested regularities that can be overridden. People have tried prompting ChatGPT with commands to just tell the truth and only cite real facts, but that just doesn't work in current AI systems.

AI needs to change direction

Much work in the history of AI has actually focused on the representation of concepts and their use in planning and other types of reasoning. That research should be brought back into the picture and made the backbone of systems on top of which neural net learning can be built. To make the job easier, basic concepts and rules about the world can be encoded by hand and built in, and then new and more complex items can be laddered on top of those through structured learning, the way they are in school as we progressively learn more. (We offer some concrete ideas on this in our book.)

A backbone of conceptual structure and rational reasoning, on top of which pattern-learning can be scaffolded, will be the new crux of successful AI. The ability to project into the future based on general conceptual understanding and extensive pattern learning will result in systems capable of acting with common sense in everyday situations. Systems with explicit rationales based on this conceptual scaffolding will be easier to understand and trust, and critically, capable of adjusting their behavior when the world tells them something different than what they have been trained to expect.

No matter how impressive otherwise, an AI system that does not have these basic abilities should never be allowed to operate unsupervised in the world. The risks of the effects of unpredictable, unhuman actions are too great, especially when there is little or no capacity for remediation and improvement. AI needs to change direction. Our trust depends on it.

Ronald J. Brachman is Director of the Jacobs Technion-Cornell Institute and Professor of Computer Science at Cornell University.

Hector J. Levesque is Professor Emeritus of Computer Science at the University of Toronto." [1]


1. Today's AI can't be trusted. Frankfurter Allgemeine Zeitung (online)Frankfurter Allgemeine Zeitung GmbH. Feb 20, 2024. Von Ron Brachman and Hector Levesque

Komentarų nėra: