Sekėjai

Ieškoti šiame dienoraštyje

2023 m. spalio 20 d., penktadienis

How AI learns to read images.

 

 "ChatGPT is currently receiving an update that allows the analysis of uploaded images. Artificial intelligence (AI) goes beyond mere image recognition - and opens up new connections.

 

     ChatGPT-4V is the name of the latest twist on the popular AI from Open AI (chat.openai.com). The V stands for vision. The application becomes multimodal, which means that not only written instructions (“prompts”) can now be fed in, but also images and audio. The function is gradually being rolled out worldwide to users in the paid version of ChatGPT (i.e. not in the free version 3.5).

 

     Open AI - or the main financial backer Microsoft - had already activated such a function for uploading images to Microsoft Bing Chat weeks ago. However, the result of many queries was - and still is - unusable. Bing Chat often hallucinates with images. We once again randomly tested the queries mentioned below for the new ChatGPT-4V on Bing Chat - with results that lag far behind the new GPT-4V.

 

     The image AI usually recognizes sights such as Marksburg Castle from a distance. Night shots like those of Ehrenbreitstein Fortress in Koblenz are also often no problem.

 

     One of the simpler applications: You upload an image and ask what you can see. For example, the AI GPT-4V recognizes sights such as Marksburg Castle on the Middle Rhine or the Ehrenbreitstein Fortress in Koblenz. It can also provide a clue to the location where the photo was taken based on the cars depicted and their license plates.

 

     Beyond the moderately fun guessing of places, serious applications are possible. We courageously fed a photo of collected mushrooms into GPT-4V and asked whether there were any poisonous mushrooms among them. The machine analyzed, after an urgent warning of its fallibility: the orange-red mushrooms in the picture above could be chanterelles, but the white and gray mushrooms in the middle were difficult to identify; the mushrooms with the darker surface could be porcini mushrooms. (The cook, an expert in the field, survived the meal.)

 

     The AI provides less life-threatening answers when uploading two diagrams of the so-called Sinus Milieus from 2021 and 2010. In them, target groups are located according to social, economic and cultural criteria and named in drawn ovals. The machine correctly recognizes a post-material and a neo-ecological milieu in the younger version of the graphic, and the sociology student is happy. The machine does not address the disappearance of the bourgeois middle - this obviously still requires trained sociological experts.

 

     The AI can also get through more complex diagrams - but not always without errors. During the corona pandemic, there were complicated quarantine rules if a colleague in the office became infected. A March 2021 [exhibit] explained these rules.

 

     The prompt after the image was uploaded to GPT-4V as a JPG file was: How long do I have to self-quarantine if I

 

     a) had close contact with the colleague on Monday,

 

     b) today is Wednesday and

 

     c) he has called in sick,

 

     d) I have just taken a positive PCR test and

 

     e) show symptoms myself?

 

     The machine responded based on the diagram:

 

     "You must isolate yourself immediately. [...] Since your last contact was on Monday and today is Wednesday, you would be in quarantine until the following Wednesday (10 days after your contact). If you isolate yourself on Thursday (11th day) "If you get tested and the result is negative, you could end the quarantine. If you decide not to take a test, you would have to remain in quarantine until the following Sunday (14 days after your exposure)."

 

     It is important to check these statements in detail in the diagram. When further detailed inquiries were made, the machine sometimes got tangled up with the days of the week - probably also because of unclear wording in the source.

 

     The same applies to a technical analysis of the DAX. To a recent article in the F.A.Z. Published diagram uploaded to GPT-4V, the AI noted in response to a prompt:

 

     Prompt: This is a curve of the Dax. Explain it to me. Get in now or not yet?

 

     KI: The diagram shows the price development of the DAX over a certain period of time. Uptrend: The red trend line shows a general uptrend in the Dax. This means that the overall market tended to rise during the period under consideration. Resistance Zone: The area between 16,300 and 16,500 appears to be a resistance zone as price is struggling to continue rising in this area. [...] Whether you should get involved now or not depends on various factors: risk tolerance, investment horizon, market conditions, and other fundamental data.

 

     We have shortened the answer here. As a result, the machine does not provide any clear recommendations, but at least it provides a useful introductory discussion. It's good if you can then read a more detailed analysis from an expert from the editorial team in the text linked above.

 

     The classic brainstorming idea is the board, on which several participants attach colorful pieces of paper with handwritten terms on a topic. Then everyone sticks colorful dots on their favorites. Before you painstakingly examine the photographed board, type it out as a table and then count the points, you pass the image to the image AI. The prompt: Many pieces of paper have colorful dots in the picture. Make a table out of it and count the points on each piece of paper.

 

     In their paper, the makers of ChatGPT-4V provide detailed examples of the further use of image-reading AI. Picture puzzles are solved, ID data is read, apples are counted and mathematical drawings are evaluated. The machine could recognize people, especially celebrities, in images, but Open AI has now set a policy to prevent this function. The machine also only allows assessments of feelings for cartoon characters, but not for people depicted. But it would be possible.

 

     The machine provides the recipe for photos of meals and an initial assessment of the abnormal wisdom tooth for medical X-ray images of human teeth. McDonald's and Nike brand logos are recognized in images. The AI can decipher handwriting and explain handwritten circled fields in scanned Excel spreadsheets.

 

     From the painted sketch of the floor plan of an apartment, the machine determines the total area based on a single indication of the length of a wall (albeit with grumbling because of the necessary, error-prone estimate).

 

     The machine can create programming code from a photographed diagram to display it as a clean graphic on a website. And the AI reads the brand from photos of accident cars and can give an initial assessment of the damage upon request.

 

     The bottom line is that the AI gets eyes and powerful recognition functions. This is not without errors, but the AI continues to learn based on the global activities of users. 

 

If you don't want your uploaded images and entered prompts to be used for further training, you can switch this off in the GPT-4V settings under "Data controls". And the following also applies: Be careful not to upload sensitive business data or personal data such as applications to third-party servers." [1]

 

1.Wie die KI lernt, Bilder zu lesen. Frankfurter Allgemeine Zeitung (online)Frankfurter Allgemeine Zeitung GmbH. Oct 17, 2023. Von Marcus Schwarze

Komentarų nėra: