"In the circus of AI, the question remains who holds
the reins: humans or machines?
While the AI Regulation requires humans to control the
risks of AI, experts argue whether humans are even capable of doing so. What
policymakers and scientists can do now?
Effective human oversight of AI systems is beneficial in
three ways: First, it is an essential building block for the compliance of AI
systems with the AI Regulation and ethical principles. Second, human
oversight improves the quality of AI systems on the market in Europe. Third,
human oversight creates value because its effectiveness requires investments in
new technologies, services, and human skills. It is now up to science and
politics to jointly leverage this innovation potential.
But first things first: Behind a fatal accident lies a
central challenge for the safe and trustworthy use of AI in our society: How
can human oversight of AI succeed?
In the southwestern United States, an accident occurred in
2018. In 2018, a tragedy in technological history occurred: An avoidable
accident led to the first fatality of a self-driving car. The Uber platform was
testing a Volvo powered by artificial intelligence (AI) in traffic in a suburb
of Phoenix, Arizona, when a 49-year-old woman was pushing her bicycle across
the street and was struck by the car.
According to the U.S. Highway Patrol and a court in Arizona,
neither Uber nor the AI was responsible for her death, but a human. The
safety driver in the self-driving car should have intervened and, given the moderate
speed at which the car was traveling, probably could have.
But instead of looking at the road, she streamed an episode
of the television show "The Voice" on her smartphone. Her boredom
while monitoring the AI cost a human life.
This challenge becomes all the more urgent as AI systems are
increasingly deployed in sensitive areas such as medicine, traffic, or border
control. In these areas, policymakers rely on human oversight to mitigate the
risks associated with the technology. For example, Article 14 of the European
Union Regulation (EU) No. 149/2016 requires the targeted use of humans when
using high-risk AI systems to prevent or at least minimize "risks to
health, safety, and fundamental rights." What cannot be completely
eliminated technically should be mitigated by human oversight.
But some scientists doubt that humans are even capable of
doing this. Even if they aren't looking at their phones, people in most cases
have too little time and information to avert risks during the ongoing
operation of AI. Instead of effectively monitoring AI, they risk becoming
scapegoats for the risk-taking of technology developers.
The view that humans can hardly monitor AI systems
effectively is, however, oversimplifying in many application areas. Under the
right conditions, humans are perfectly capable of monitoring AI and intervening
in ongoing processes. The real core of the challenge therefore lies in
understanding and ensuring these demanding conditions.
The Dagstuhl Definition of Human Oversight
A seminar held at the beginning of July at Schloss Dagstuhl,
a Leibniz Association facility in Saarland, offered an insight into the current
state of research. International experts from computer science, psychology,
law, ethics, cognitive science, and technology design addressed the question of
how effective human oversight of AI systems can be designed.
Even agreeing on a sufficiently broad definition of
"human oversight" that is clearly differentiated from other
dimensions of human involvement in AI processes—such as system maintenance or
regulatory oversight—was a challenge. The narrow definition of the term in the
AI Regulation contrasts with the interdisciplinary nature of the research
field. However, it is precisely this interdisciplinarity that has proven key to
identifying the specific function of human oversight: Human oversight exists
when a person (or several people) is systematically prepared to consciously
monitor the operation of AI systems and, if necessary, intervene to
substantially mitigate the risks posed by AI.
Human oversight is therefore not a mere "checkbox"
task or bureaucratic exercise, but rather responsible work. At the same time,
the definition also implies that no one can spontaneously or accidentally fall
into the role of a supervisor of an AI system—quite possibly. As required by
Article 26 of the AI Regulation, a supervisor must be explicitly appointed
and systematically prepared.
In particular, it is not sufficient
to assign people a merely nominal role as "button pushers" in an
AI-driven decision-making process without authority, insight, time, or
training. This could potentially make them part of a technology-supported
error. To enable people to avert risks, correct undesirable developments, or
prevent harm, their role must be designed specifically and effectively.
How to Effectively Design Human Oversight of AI
While it can increase efficiency if AI systems support
physicians in diagnostics by making suggestions, if these suggestions are
adopted without reflection, there is a risk of uncritical acceptance of any AI
judgments. Errors or biases can thus be incorporated into practice unnoticed. For
example, distorted training data can lead to certain symptoms or patient groups
being systematically overlooked or misjudged, resulting in structural
disadvantages. There is also a risk that inconspicuous findings will rarely be
independently reviewed, and that physicians' attention to individual patients
will decrease.
Such dynamics can also be explained psychologically: The
phenomenon of automation bias leads people to often place more trust in AI
suggestions than is appropriate. Confirmation bias also exists, whereby
findings are interpreted as confirming AI suggestions rather than critically
questioning them. The causes are manifold, and pure convenience is just one.
Design
measures that force users to actively reflect on their decisions before confirming
them can reduce such biases. For example, AI systems could be designed so that
a physician not only receives a diagnosis suggested by the AI, but is also
required to document a brief written explanation for their approval or
rejection of the suggestion. While such a design slows down work with AI, it
promotes critical thinking. This naturally raises the further question of
whether human supervision should even be provided by the same people who work
directly with the AI.
In medical practice, the role of the medical professional
may coincide with that of the AI supervisor. In other high-risk contexts,
participation and supervision in AI decision-making are more clearly separated.
Self-driving
cars are now operating without human passengers. In Austin and San Francisco,
fully autonomous robotaxis transport their passengers through the city. The
supervisors are located in a central control center that monitors multiple
vehicles simultaneously via interfaces.
Whether
humans are directly embedded in the AI-supported decision-making process, such
as physicians, or remotely oversee a fleet of robotaxis, three areas are
central to effective supervision: technical factors such as system design,
explainability methods, and user interfaces; human factors such as the
supervisor's expertise, motivation, and psychological characteristics; and
environmental factors such as workplace design and organizational framework.
When these factors are considered holistically, human
supervision can be effective. This finding underscores the importance of
interdisciplinary research into the success factors of effective human
oversight, both for the implementation of the AI Regulation and for the
responsible use of AI in our society.
Remaining challenges must be addressed jointly with
policymakers.
All this shows that science has already gathered insights
into how human oversight can be successful. Together with policymakers, we
should now discuss problem areas related to the implementation of the AI
Regulation in order to develop sensible solutions.
First, there
is the problem of accountability. How can we prevent the oversight officer from
becoming a mere symbolic figure, creating false confidence in the safety of AI
and ultimately only securing economic interests—thus degenerating into a
placebo? Experimental testing of oversight systems can make a significant
contribution here. Whether human oversight is effective should be empirically
tested before the AI system is put into operation. Standardized templates,
guidelines, or checklists that specify what such testing procedures should look
like, or what findings must be available before the actual deployment of
human-supervised AI, can support providers and operators in testing.
This brings
us to the next problem: measuring success. What standards apply to the
effectiveness of human supervision? What's needed are quantitative and
qualitative benchmarks that can be translated into technical standards. Leading
AI researchers and practitioners have long been calling for the establishment
of a German AI Safety Institute (DAISI). DAISI could develop scientifically
sound safety guidelines and promote dialogue between science, politics, and
society. This doesn't require the creation of a new bureaucratic monster;
rather, an agile agency should be created.
Finally, the problem of technical and organizational support
must be addressed. How can supervisors be supported in recognizing the right
time to intervene, and how can their interventions prevent them from creating
more risks than they mitigate? While a completely error-proof solution seems
unrealistic, policymakers can nevertheless rely on the dynamics of scientific
knowledge. AI providers and users should therefore be demonstrably oriented
toward the current state of research on human-AI interaction, which is
constantly evolving with the progressive use of AI systems in our society.
The list of unanswered questions could be extended. One
thing is clear: human oversight as a safety net in the risk management of the
AI Regulation means that AI systems will enter the European market with
considerable residual risks. Managing these risks depends on the technical
capabilities, the individual skills and motivation of the supervisors, and the
specific working conditions.
Human oversight can be an economic factor
The development of solutions to manage technological risks
is, not least, an economic factor. In competition with the AI superpowers,
the USA and China, the EU's AI Regulation is often seen as a brake on
innovation. Regulation or innovation is a false dilemma, since the third
option, promoting responsible innovation, is itself a form of economic policy.
Significant investments from both public and private sources are required to
successfully implement regulatory provisions such as the requirement for human
oversight of AI. Developing benchmarks, testing oversight systems, and
equipping people with AI skills – all of this requires capital and know-how,
which can be accumulated in Europe through the implementation of the AI
Regulation.
While implementing human oversight may reduce a small
portion of AI's efficiency potential, the effective integration of human and
machine skills creates real value through improved outputs and reduced risks.
This creates significant opportunities for new business models for technical
products and services.
Furthermore,
it is expected that AI will only become accepted in areas such as medicine once
effective human oversight is ensured.
Effective
human oversight of AI systems is beneficial in three ways: First, it is an
essential building block for the compliance of AI systems with the AI
Regulation and ethical principles. Second, human oversight improves the
quality of AI systems on the market in Europe. Third, human oversight creates
value because its effectiveness requires investment in new technologies,
services, and human skills. It is now up to science and politics to jointly
leverage this innovation potential.
Johann Laux
Johann Laux works as a Departmental Research Lecturer in AI,
Government & Policy at the Oxford Internet Institute at the University of
Oxford and is a Fellow at the GovTech Campus Germany.
Markus Langer
Markus Langer is a Professor of Work and Organizational
Psychology at the University of Freiburg and heads the Psy:Tech Lab for
research into good AI-supported work.
Dr. Kevin Baum
Dr. Kevin Baum works at the German Research Center for
Artificial Intelligence (DFKI) as a research group leader for Responsible AI
& Machine Ethics and as a Senior Researcher at the Center for European
Research in Trusted AI (CERTAIN)."
Komentarų nėra:
Rašyti komentarą