Sekėjai

Ieškoti šiame dienoraštyje

2025 m. spalio 8 d., trečiadienis

How Human Oversight of Artificial Intelligence Succeeds


"In the circus of AI, the question remains who holds the reins: humans or machines?

 

While the AI ​​Regulation requires humans to control the risks of AI, experts argue whether humans are even capable of doing so. What policymakers and scientists can do now?

 

Effective human oversight of AI systems is beneficial in three ways: First, it is an essential building block for the compliance of AI systems with the AI ​​Regulation and ethical principles. Second, human oversight improves the quality of AI systems on the market in Europe. Third, human oversight creates value because its effectiveness requires investments in new technologies, services, and human skills. It is now up to science and politics to jointly leverage this innovation potential.

 

But first things first: Behind a fatal accident lies a central challenge for the safe and trustworthy use of AI in our society: How can human oversight of AI succeed?

 

In the southwestern United States, an accident occurred in 2018. In 2018, a tragedy in technological history occurred: An avoidable accident led to the first fatality of a self-driving car. The Uber platform was testing a Volvo powered by artificial intelligence (AI) in traffic in a suburb of Phoenix, Arizona, when a 49-year-old woman was pushing her bicycle across the street and was struck by the car.

 

According to the U.S. Highway Patrol and a court in Arizona, neither Uber nor the AI ​​was responsible for her death, but a human. The safety driver in the self-driving car should have intervened and, given the moderate speed at which the car was traveling, probably could have.

 

But instead of looking at the road, she streamed an episode of the television show "The Voice" on her smartphone. Her boredom while monitoring the AI ​​cost a human life.

 

This challenge becomes all the more urgent as AI systems are increasingly deployed in sensitive areas such as medicine, traffic, or border control. In these areas, policymakers rely on human oversight to mitigate the risks associated with the technology. For example, Article 14 of the European Union Regulation (EU) No. 149/2016 requires the targeted use of humans when using high-risk AI systems to prevent or at least minimize "risks to health, safety, and fundamental rights." What cannot be completely eliminated technically should be mitigated by human oversight.

 

But some scientists doubt that humans are even capable of doing this. Even if they aren't looking at their phones, people in most cases have too little time and information to avert risks during the ongoing operation of AI. Instead of effectively monitoring AI, they risk becoming scapegoats for the risk-taking of technology developers.

 

The view that humans can hardly monitor AI systems effectively is, however, oversimplifying in many application areas. Under the right conditions, humans are perfectly capable of monitoring AI and intervening in ongoing processes. The real core of the challenge therefore lies in understanding and ensuring these demanding conditions.

The Dagstuhl Definition of Human Oversight

 

A seminar held at the beginning of July at Schloss Dagstuhl, a Leibniz Association facility in Saarland, offered an insight into the current state of research. International experts from computer science, psychology, law, ethics, cognitive science, and technology design addressed the question of how effective human oversight of AI systems can be designed.

 

Even agreeing on a sufficiently broad definition of "human oversight" that is clearly differentiated from other dimensions of human involvement in AI processes—such as system maintenance or regulatory oversight—was a challenge. The narrow definition of the term in the AI ​​Regulation contrasts with the interdisciplinary nature of the research field. However, it is precisely this interdisciplinarity that has proven key to identifying the specific function of human oversight: Human oversight exists when a person (or several people) is systematically prepared to consciously monitor the operation of AI systems and, if necessary, intervene to substantially mitigate the risks posed by AI.

 

Human oversight is therefore not a mere "checkbox" task or bureaucratic exercise, but rather responsible work. At the same time, the definition also implies that no one can spontaneously or accidentally fall into the role of a supervisor of an AI system—quite possibly. As required by Article 26 of the AI ​​Regulation, a supervisor must be explicitly appointed and systematically prepared.

 

 In particular, it is not sufficient to assign people a merely nominal role as "button pushers" in an AI-driven decision-making process without authority, insight, time, or training. This could potentially make them part of a technology-supported error. To enable people to avert risks, correct undesirable developments, or prevent harm, their role must be designed specifically and effectively.

 

How to Effectively Design Human Oversight of AI

 

While it can increase efficiency if AI systems support physicians in diagnostics by making suggestions, if these suggestions are adopted without reflection, there is a risk of uncritical acceptance of any AI judgments. Errors or biases can thus be incorporated into practice unnoticed. For example, distorted training data can lead to certain symptoms or patient groups being systematically overlooked or misjudged, resulting in structural disadvantages. There is also a risk that inconspicuous findings will rarely be independently reviewed, and that physicians' attention to individual patients will decrease.

 

Such dynamics can also be explained psychologically: The phenomenon of automation bias leads people to often place more trust in AI suggestions than is appropriate. Confirmation bias also exists, whereby findings are interpreted as confirming AI suggestions rather than critically questioning them. The causes are manifold, and pure convenience is just one.

 

Design measures that force users to actively reflect on their decisions before confirming them can reduce such biases. For example, AI systems could be designed so that a physician not only receives a diagnosis suggested by the AI, but is also required to document a brief written explanation for their approval or rejection of the suggestion. While such a design slows down work with AI, it promotes critical thinking. This naturally raises the further question of whether human supervision should even be provided by the same people who work directly with the AI.

 

In medical practice, the role of the medical professional may coincide with that of the AI ​​supervisor. In other high-risk contexts, participation and supervision in AI decision-making are more clearly separated.

 

Self-driving cars are now operating without human passengers. In Austin and San Francisco, fully autonomous robotaxis transport their passengers through the city. The supervisors are located in a central control center that monitors multiple vehicles simultaneously via interfaces.

 

Whether humans are directly embedded in the AI-supported decision-making process, such as physicians, or remotely oversee a fleet of robotaxis, three areas are central to effective supervision: technical factors such as system design, explainability methods, and user interfaces; human factors such as the supervisor's expertise, motivation, and psychological characteristics; and environmental factors such as workplace design and organizational framework.

 

When these factors are considered holistically, human supervision can be effective. This finding underscores the importance of interdisciplinary research into the success factors of effective human oversight, both for the implementation of the AI ​​Regulation and for the responsible use of AI in our society.

Remaining challenges must be addressed jointly with policymakers.

 

All this shows that science has already gathered insights into how human oversight can be successful. Together with policymakers, we should now discuss problem areas related to the implementation of the AI ​​Regulation in order to develop sensible solutions.

 

First, there is the problem of accountability. How can we prevent the oversight officer from becoming a mere symbolic figure, creating false confidence in the safety of AI and ultimately only securing economic interests—thus degenerating into a placebo? Experimental testing of oversight systems can make a significant contribution here. Whether human oversight is effective should be empirically tested before the AI ​​system is put into operation. Standardized templates, guidelines, or checklists that specify what such testing procedures should look like, or what findings must be available before the actual deployment of human-supervised AI, can support providers and operators in testing.

 

This brings us to the next problem: measuring success. What standards apply to the effectiveness of human supervision? What's needed are quantitative and qualitative benchmarks that can be translated into technical standards. Leading AI researchers and practitioners have long been calling for the establishment of a German AI Safety Institute (DAISI). DAISI could develop scientifically sound safety guidelines and promote dialogue between science, politics, and society. This doesn't require the creation of a new bureaucratic monster; rather, an agile agency should be created.

 

Finally, the problem of technical and organizational support must be addressed. How can supervisors be supported in recognizing the right time to intervene, and how can their interventions prevent them from creating more risks than they mitigate? While a completely error-proof solution seems unrealistic, policymakers can nevertheless rely on the dynamics of scientific knowledge. AI providers and users should therefore be demonstrably oriented toward the current state of research on human-AI interaction, which is constantly evolving with the progressive use of AI systems in our society.

 

The list of unanswered questions could be extended. One thing is clear: human oversight as a safety net in the risk management of the AI ​​Regulation means that AI systems will enter the European market with considerable residual risks. Managing these risks depends on the technical capabilities, the individual skills and motivation of the supervisors, and the specific working conditions.

 

Human oversight can be an economic factor

 

The development of solutions to manage technological risks is, not least, an economic factor. In competition with the AI ​​superpowers, the USA and China, the EU's AI Regulation is often seen as a brake on innovation. Regulation or innovation is a false dilemma, since the third option, promoting responsible innovation, is itself a form of economic policy. Significant investments from both public and private sources are required to successfully implement regulatory provisions such as the requirement for human oversight of AI. Developing benchmarks, testing oversight systems, and equipping people with AI skills – all of this requires capital and know-how, which can be accumulated in Europe through the implementation of the AI ​​Regulation.

 

While implementing human oversight may reduce a small portion of AI's efficiency potential, the effective integration of human and machine skills creates real value through improved outputs and reduced risks. This creates significant opportunities for new business models for technical products and services.

 

Furthermore, it is expected that AI will only become accepted in areas such as medicine once effective human oversight is ensured.

 

Effective human oversight of AI systems is beneficial in three ways: First, it is an essential building block for the compliance of AI systems with the AI ​​Regulation and ethical principles. Second, human oversight improves the quality of AI systems on the market in Europe. Third, human oversight creates value because its effectiveness requires investment in new technologies, services, and human skills. It is now up to science and politics to jointly leverage this innovation potential.

Johann Laux

Johann Laux works as a Departmental Research Lecturer in AI, Government & Policy at the Oxford Internet Institute at the University of Oxford and is a Fellow at the GovTech Campus Germany.

Markus Langer

Markus Langer is a Professor of Work and Organizational Psychology at the University of Freiburg and heads the Psy:Tech Lab for research into good AI-supported work.

Dr. Kevin Baum

Dr. Kevin Baum works at the German Research Center for Artificial Intelligence (DFKI) as a research group leader for Responsible AI & Machine Ethics and as a Senior Researcher at the Center for European Research in Trusted AI (CERTAIN)."

 


Komentarų nėra: