"Open AI promises big things for the planned language model GPT-5. But how does the company plan to achieve these goals? A look at the components of AI progress.
The release of GPT-3 in November 2022 changed a lot. Generative AI developed from an academic field into its own industry - and made Open AI the market leader. So it's no wonder that when the successor GPT-4 was announced, the advertising language consisted primarily of superlatives.
Open AI is now making even bigger promises for GPT-5 - the company has already announced that the language model will be able to think logically better, interact with videos and have "the intelligence of a doctorate".
But how these advances could be measured, let alone achieved, remains open and was not published retrospectively for previous models.
The development of advanced AI models essentially consists of three components - data, algorithms and computing time. Data is the knowledge base on which a model is trained by applying algorithms. The algorithms are the architecture of the neural network and optimize the model during training. The amount of computing time determines how much computer power is used for training and thus how nuanced the finished model is. The three components - data, algorithms and computing time - can therefore be used together to determine the performance of an AI model.
Open AI can potentially adjust all three parameters in AI development. Realistically, however, changes to the three components require varying degrees of effort to implement. Current research suggests that improving individual components has significantly positive effects on the performance of AI models. If, for example, more computing time is invested based on the same data and algorithms, performance improves measurably.
Simply increasing the computing time is not enough
An increase in computing time therefore seems foreseeable. Open AI has already consistently increased this over previous models. But computing time alone will probably not be enough for a groundbreaking development. Many experts believe that the current models are at a saturation point in terms of improvement through computing power. The efficiency of the algorithms and the type of data determine the value of more computing time. Ideally, Open AI will therefore address all three components.
The biggest changes Open AI will probably make to the algorithms and architecture of GPT-5. One approach frequently mentioned is to use expert networks that specialize in certain tasks or topics within the model. GPT-4 is said to already use such networks, but further expansion could further increase reliability and precision. Competitor Google recently demonstrated how effective this increasing specialization is. Two of its expert models took part in the International Mathematics Olympiad, in which the best students from different nations compete to write math exams.
The innovation: Google's models can check intermediate steps. The models begin by understanding the exam task using the Gemini language model and using it to develop suggested solutions. Generative language models are good at this, but they also tend to produce so-called hallucinations - answers that are formulated logically and convincingly, but are factually incorrect. To identify and correct these, one of the models translates the proposed solutions into a programming language that specializes in mathematical proofs. The proposal can now be calculated step by step and thus reliably show whether the proposed chain of proof is logically consistent or contains errors. If an error is found in a step, the language model is automatically asked for a new proposal for this proof step until the entire chain withstands the check.
The expert networks can keep up with the students
This breakdown of the task into logical components and automated checking of individual steps is a promising way of working. In terms of the quality of their solutions, the models performed among the best 30 percent of students. However, the response times sometimes exceeded the 90-minute limit and would have put the AI in last place. The results are nevertheless exciting because they suggest that AI models are also getting better at mathematics. They have had difficulties in this area so far due to the high requirements for logical consistency and numerical understanding. If Open AI follows the path of its competitor Google, it could use a combination of expert models and formal verification methods to rules set a new standard. With the right training and rules for work routes, GPT-5 would also be able to access previously inaccessible areas such as accounting or tax returns.
Altman wants to integrate videos into ChatGPT
There is also a lot of room for improvement in terms of data for GPT-5. CEO Sam Altman has announced a particularly ambitious project in this regard: he wants to integrate the analysis and creation of videos directly into the main model. Anyone who sees this as just another tool for users is missing the fundamental changes that are required for it. For such a system to be successful, the model must understand and link different types of data. It must be able to handle a concept in the data type image as well as in text, sound or video. In concrete terms, it should be able to associate the word "chair" with both a concrete example and the general form and function of all chairs.
What is self-evident for people is a complex challenge for machines. The fact that the current GPT version can already create (static) images is impressive, but every other form of input and output media presents a completely new challenge. There is a world of difference for AI models between a picture of a chair and a video of a chair tipping over. Videos are a new type of data for the systems.
If models can be given such an understanding of objects in space, they could also gain logical understanding, suspects Geoffrey Hinton, one of the world's best-known AI researchers. That would actually represent an evolutionary leap for AI models.
The hope of researchers like Hinton is that logical understanding will translate into fewer hallucinations and a reliable handling of new concepts.
It is conceivable that GPT-5 will at least make some progress here. Being able to generate videos and better images would be interesting for creative people. The reliable handling of new concepts, on the other hand, could make GPT-5 more useful in rapidly developing fields such as research or programming.
The potential of developments in generative AI therefore seems to be far from being exhausted. And although Open AI will probably only publish sparse information about GPT-5's data, algorithms and computing time, this will immediately be reflected in the quality of the model." [1]
1. Wie Open AI das nächste ChatGPT auf ein neues Level heben kann. Frankfurter Allgemeine Zeitung (online) Frankfurter Allgemeine Zeitung GmbH. Aug 14, 2024. Von Martin Wendiggensen
Komentarų nėra:
Rašyti komentarą