Sekėjai

Ieškoti šiame dienoraštyje

2024 m. sausio 27 d., šeštadienis

How We Can Control Ai


"Today's large language models, the computer programs that form the basis of artificial intelligence, are impressive human achievements. Behind their remarkable language capabilities and impressive breadth of knowledge lie extensive swaths of data, capital and time. 

Many take more than $100 million to develop and require months of testing and refinement by humans and machines. They are refined, up to millions of times, by iterative processes that evaluate how close the systems come to the "correct answer" to questions and improve the model with each attempt.

What's still difficult is to encode human values. That currently requires an extra step known as Reinforcement Learning from Human Feedback, in which programmers use their own responses to train the model to be helpful and accurate. Meanwhile, so-called "red teams" provoke the program in order to uncover any possible harmful outputs. This combination of human adjustments and guardrails is designed to ensure alignment of AI with human values and overall safety. So far, this seems to have worked reasonably well.

But as models become more sophisticated, this approach may prove insufficient. Some models are beginning to exhibit polymathic behavior: They appear to know more than just what is in their training data and can link concepts across fields, languages, and geographies. At some point they will be able to, for example, suggest recipes for novel cyberattacks or biological attacks -- all based on publicly available knowledge.

There's little consensus around how we can rein in these risks. The press has reported a variety of explanations for the tensions at OpenAI in November, including that behind the then-board's decision to fire CEO Sam Altman was a conflict between commercial incentives and the safety concerns core to the board's nonprofit mission. Potential commercial offerings like the ability to fine-tune the company's ChatGPT program for different customers and applications could be very profitable, but such customization could also undermine some of OpenAI's basic safeguards in ChatGPT. Tensions like that around AI risk will only become more prominent as models get smarter and more capable. We need to adopt new approaches to AI safety that track the complexity and innovation speed of the core models themselves.

While most agree that today's programs are generally safe for use and distribution, can our current safety tests keep up with the rapid pace of AI advancement? At present, the industry has a good handle on the obvious queries to test for, including personal harms and examples of prejudice. It's also relatively straightforward to test for whether a model contains dangerous knowledge in its current state. 

What's much harder to test for is what's known as "capability overhang" -- meaning not just the model's current knowledge, but the derived knowledge it could potentially generate on its own.

Red teams have so far shown some promise in predicting models' capabilities, but upcoming technologies could break our current approach to safety in AI. For one, "recursive self-improvement" is a feature that allows AI systems to collect data and get feedback on their own and incorporate it to update their own parameters, thus enabling the models to train themselves. This could result in, say, an AI that can build complex system applications (e.g., a simple search engine or a new game) from scratch. But, the full scope of the potential new capabilities that could be enabled by recursive self-improvement is not known.

Another example would be "multi-agent systems," where multiple independent AI systems are able to coordinate with each other to build something new. Having just two AI models from different companies collaborating together will be a milestone we'll need to watch out for. This so-called "combinatorial innovation," where systems are merged to build something new, will be a threat simply because the number of combinations will quickly exceed the capacity of human oversight.

Short of pulling the plug on the computers doing this work, it will likely be very difficult to monitor such technologies once these breakthroughs occur. Current regulatory approaches are based on individual model size and training effort, and are based on passing increasingly rigorous tests, but these techniques will break down as the systems become orders of magnitude more powerful and potentially elusive. AI regulatory approaches will need to evolve to identify and govern the new emergent capabilities and the scaling of those capabilities.

Europe has so far attempted the most ambitious regulatory regime with its AI Act, imposing transparency requirements and varying degrees of regulation based on models' risk levels. It even accounts for general-purpose models like ChatGPT, which have a wide range of possible applications and could be used in unpredictable ways. But the AI Act has already fallen behind the frontier of innovation, as open-source AI models -- which are largely exempt from the legislation -- expand in scope and number. President Biden's recent executive order on AI took a broader and more flexible approach, giving direction and guidance to government agencies and outlining regulatory goals, though without the full power of the law that the AI Act has. For example, the order gives the National Institute of Standards and Technology basic responsibility to define safety standards and evaluation protocols for AI systems, but does not require that AI systems in the U.S. "pass the test." Further, both Biden's order and Europe's AI Act lack intrinsic mechanisms to rapidly adapt to an AI landscape that will continue to change quickly and often.

I recently attended a gathering in Palo Alto organized by the Rand Corp. and the Carnegie Endowment for International Peace, where key technical leaders in AI converged on an idea: The best way to solve these problems is to create a new set of testing companies that will be incentivized to out-innovate each other -- in short, a robust economy of testing. To check the most powerful AI systems, their testers will also themselves have to be powerful AI systems, precisely trained and refined to excel at the single task of identifying safety concerns and problem areas in the world's most advanced models. To be trustworthy and yet agile, these testing companies should be checked and certified by government regulators but developed and funded in the private market, with possible support by philanthropy organizations. (The philanthropy I co-founded, Schmidt Sciences, and I have helped fund some early AI safety research.) The field is moving too quickly and the stakes are too high for exclusive reliance on typical government processes and timeframes.

One way this can unfold is for government regulators to require AI models exceeding a certain level of capability to be evaluated by government-certified private testing companies (from startups to university labs to nonprofit research organizations), with model builders paying for this testing and certification so as to meet safety requirements. Testing companies would compete for dollars and talent, aiming to scale their capabilities at the same breakneck speed as the models they're checking. As AI models proliferate, growing demand for testing would create a big enough market.

 Testing companies could specialize in certifying submitted models across different safety regimes, such as the ability to self-proliferate, create new bio or cyber weapons, or manipulate or deceive their human creators. 

Such a competitive market for testing innovation would have similar dynamics to what we currently have for the creation of new models, where we've seen explosive advances in short timescales. Without such a market and the competitive incentives it brings, governments, research labs and volunteers will be left to guarantee the safety of the most powerful systems ever created by humans, using tools that lag generations behind the frontier of AI research.

Much ink has been spilled over presumed threats of AI. Advanced AI systems could end up misaligned with human values and interests, able to cause chaos and catastrophe either deliberately or (often) despite efforts to make them safe. 

And as they advance, the threats we face today will only expand as new systems learn to self-improve, collaborate and potentially resist human oversight.

While the risks are real, they are not inevitable. If we can bring about an ecosystem of nimble, sophisticated, independent testing companies who continuously develop and improve their skill evaluating AI testing, we can help bring about a future in which society benefits from the incredible power of AI tools while maintaining meaningful safeguards against destructive outcomes.

---

Eric Schmidt is the former CEO and executive chairman of Google and cofounder of the philanthropy Schmidt Sciences, which funds science and technology research." [1]

1. REVIEW --- How We Can Control Ai --- The technology's rapid advance threatens to overwhelm all efforts at regulation. We need our best tech experts competing to rein in AI as fast as companies are competing to build it. Schmidt, Eric.  Wall Street Journal, Eastern edition; New York, N.Y.. 27 Jan 2024: C.1. 

 

Komentarų nėra: