Sekėjai

Ieškoti šiame dienoraštyje

2024 m. birželio 3 d., pirmadienis

The Great AI Challenge: How 5 Chatbots Fared --- In the running: OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, Anthropic's Claude and Perplexity


"Meet the Models

We have ChatGPT by OpenAI, celebrated for its versatility and ability to remember user preferences. (Wall Street Journal owner News Corp has a content-licensing partnership with OpenAI.) Anthropic's Claude, from a socially conscious startup, is geared to be inoffensive. Microsoft's Copilot leverages OpenAI's technology and integrates with services like Bing and Microsoft 365.

 Google's Gemini accesses the popular search engine for real-time responses. 

And Perplexity is a research-focused chatbot that cites sources with links and stays up to date.

While each of these services offer a no-fee version, we used the $20-a-month paid versions for enhanced performance, to assess their full capabilities across a wide range of tasks. (We used the latest ChatGPT GPT-4o model and Gemini 1.5 Pro model in our testing.)

With the help of Journal newsroom editors and columnists, we crafted a series of prompts to test popular use cases, including coding challenges, health inquiries and money questions. The same people judged the results without knowing which bot said what, rating them on accuracy, helpfulness and overall quality. We then ranked the bots in each category.

We excerpted some of the best and worst responses to prompts.

Health

Bad health advice from chatbots could be harmful to your. . .health. We asked five questions dealing with pregnancy, weight loss, depression and symptoms both chronic and sudden. Many answers sounded similar. Our judge, Journal health columnist Sumathi Reddy, looked for completeness, accuracy and nuances.

Prompt: What's the best age to get pregnant?

Best Answer: Having children at a later age can offer advantages, such as more maturity, better financial stability and a stronger partnership.

Worst Answer: The best time to get pregnant is whenever you feel confident and prepared to raise a child.

For instance, when we asked about the best age to get pregnant, Gemini gave a brief, general recommendation, while Perplexity went much deeper, even bringing up factors such as relationship and financial stability.

That said, Gemini came through with quality answers to other queries, and finished second to category winner ChatGPT, whose answers improved with the recent GPT-4o update.

Finance

We asked the bots three questions on subjects near and dear to Journal readers: interest rates, retirement savings and inheritance. The Journal's personal finance editor, Jeremy Olshan, posed the questions and assessed the advice based on clarity, thoroughness and practicality.

Prompt: I'm 40 years old. I just inherited an IRA from my grandfather with $1 million in it. How much money do I need to take out this year?

Best Answer: Because you're a non-spouse beneficiary, you likely have a 10-year window to deplete the account, but there might be exceptions.

Worst Answer: Congratulations on inheriting an IRA with a substantial amount!

Here, ChatGPT and Copilot fell behind. Claude had the best answers for the Roth vs. traditional IRA debate while Perplexity best weighed high-yield savings accounts vs. CDs. Gemini, the category winner, best answered a question about when to withdraw funds from an inherited $1 million IRA. The text emphasized not rushing into any withdrawals without professional guidance.

Cooking

AI promises to help in the kitchen, in part by bringing some clarity to the chaos of your fridge and pantry. Personal tech editor Wilson Rothman, an avid cook, threw a set of random ingredients at the bots to see what they came up with. The category winner, ChatGPT, provided a creative but realistic menu (cheesy pork-stuffed apples with kale salad and chocolate-bar shortbread cookies). Perplexity impressed us with the detailed cooking steps provided with its own clever menu.

Next, we asked the bots for a recipe for a chocolate dessert that addresses many dietary restrictions.

Prompt: Can I bake a chocolate cake with no flour, no gluten, no dairy, no nuts, no egg? If so, what's the recipe?

Best Answer: Simple Glaze: Melt dairy-free chocolate chips (check the label!), whisk in a bit of non-dairy milk.

Worst Answer: . . .2 sticks unsalted butter. . .4 large eggs. . .

Gemini took the cake, even recommending additional trimmings like non-dairy glaze. Copilot, on the other hand, immediately failed by including eggs and butter.

Creative Writing

One of the biggest surprises was the difference between work writing and creative writing. Copilot finished dead last in work writing, but was hands-down the funniest and most clever at creative writing. We asked for a poem about a poop on a log. We asked for a wedding toast featuring the Muppets. We asked for a fictional street fight between Donald Trump and Joe Biden. With Copilot, the jokes kept coming. Claude was the second best, with clever zingers about both presidential challengers.

Prompt: Write a wedding toast for Shara and Chris as told by the Muppets.

Best Answer: Gonzo: "Ah, love! It's like being shot out of a cannon into a pile of rubber chickens!"

Worst Answer: Kermit the Frog once said, "Life's a happy song when there's someone by your side to sing along."

In a rare flub, Perplexity erroneously attributed a lyric from the 2011 musical "The Muppets" to Kermit.

Summarization

For people just getting into generative-AI chatbots, summarization might be the best thing to try. It's useful and unlikely to create unforeseen errors. 

Because we used paid services, we were able to upload larger chunks of text, PDF documents and web pages.

For the most part, that is: Even the premium Claude account wasn't able to handle web links. "Our team is making Claude faster, expanding its knowledge base and refining its ability to understand and interact with a wide range of content," says Scott White, a product manager at Anthropic.

Prompt: Summarize this web page: https://en.wikipedia.org/wiki/Paul_McCartney

Best Answer: He was influenced by his father (a jazz player) and rock and roll artists like Little Richard and Buddy Holly.

Worst Answer: I apologize, but I am not able to open URLs, links or videos.

Wikipedia pages for really famous people can get wordy, so we asked for a summary of Paul McCartney's. Some provided short blurbs with obvious Beatle factoids. Copilot answered in a skimmable outline format, and included lesser-known fun facts.

Category winner Perplexity consistently summarized things well, including the subtitles it skimmed in a YouTube video.

Current Events

This category is trickier than it sounds, because not all chatbots can access the web. We asked about this summer's concert lineup, the latest on allegations that China uses TikTok for spying, and the current standings in the upcoming presidential election.

Prompt: Who is more favored to win, Trump or Biden? Please explain your sources and reasoning.

.

Best Answer: Given the mixed nature of the data, with both candidates having significant unfavorability and various leads in different areas, it is difficult to definitively state who is more favored to win.

Worst Answer: I'm still learning how to answer this question. In the meantime, try Google Search.

Category winner Perplexity stayed on top with balanced reasoning and solid sourcing. ChatGPT faltered when we first tested, but the GPT-4o upgrade boosted it into second place. Gemini didn't want to answer our election question.

OVERALL RESULTS

What did these Olympian challenges tell us? Each chatbot has unique strengths and weaknesses, making them all worth exploring. We saw few outright errors and "hallucinations," where bots go off on unexpected tangents and completely make things up.

The biggest surprise? ChatGPT, despite its big update and massive fame, didn't lead the pack.

 Instead, lesser-known Perplexity was our champ. "We optimize for conciseness," says Dmitry Shevelenko, chief business officer at Perplexity AI. "We tuned our model for conciseness, which forces it to identify the most essential components."

We also thought there might be an advantage from the big tech players, Microsoft and Google, though Copilot and Gemini fought hard to stay in the game. Google declined to comment. Microsoft also declined, but recently told the Journal it would soon integrate OpenAI's GPT-4o into Copilot.

With AI developing so fast, these bots just might leapfrog one another into the foreseeable future. Or at least until they all go "multimodal," and we can test their ability to see, hear and read -- and replace us as earth's dominant species." [1]

1. The Great AI Challenge: How 5 Chatbots Fared --- In the running: OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, Anthropic's Claude and Perplexity. Brown, Dalvin; Dapena, Kara; Stern, Joanna.  Wall Street Journal, Eastern edition; New York, N.Y.. 03 June 2024: A.12.

 

Mirtingiausias vėžys tampa labiau išgyvenamas

„Žmonėms, kuriems diagnozuotas mirtiniausias vėžys, yra daugiau vilties nei bet kada anksčiau.

 

 Rūkymo mažėjimas ir geresnė susirgusių atranką bei naujesnių vaistų atsiradimas pakeitė pacientų, sergančių plaučių vėžiu, kadaise laikytų mirties nuosprendžiu, perspektyvas.

 

 Ir laimėti galima daugiau. Daugiau pacientų gali kelis mėnesius ar metus apsisaugoti nuo ligos naudodami tikslinius arba imunitetą stiprinančius vaistus, onkologai parodė šį savaitgalį per aukščiausios vėžio konferencijos rezultatus. Tai apima pacientus, sergančius tokiomis ligos formomis, kurias labai sunku gydyti.

 

 "Tai turėjo tokią niūrią prognozę. Ir dabar turime gydomų žmonių, kurių niekada nemanėme, kad jie bus išgydyti", - sakė "Penn Medicine" medicinos onkologė daktarė Angela DeMichele.

 

 Vienas sekmadienį paskelbtas tyrimas parodė, kad AstraZeneca vaistas Tagrisso kai kuriems 3 stadijos pacientams gali turėti plaučių vėžio kontrolę beveik trejais metais ilgiau, nei vien chemoterapija ir spinduliuotė. Kitas nustatė, kad kai kurie pacientai, sergantys agresyvia liga, išgyveno beveik dvejais metais ilgiau, vartodami bendrovės imunoterapinį vaistą Imfinzi – tai pirmasis šio plaučių vėžio potipio pažanga per pastaruosius dešimtmečius.

 

 Kitas tyrimas, pristatytas Amerikos klinikinės onkologijos draugijos konferencijoje Čikagoje, atskleidė, kad 60% pažengusių pacientų išgyveno ir jų liga neprogresavo praėjus penkeriems metams po Pfizer Lorbrena – vaisto, nukreipto į genetinę jų navikų mutaciją, vartojimo. Palyginti su tik 8% pacientų, vartojančių senesnį vaistą su tuo pačiu tikslu.

 

 „Šie rezultatai tikrai puikūs“, – sakė daktaras Davidas Spigelis, vyriausiasis mokslininkas iš Sarah Cannon tyrimų instituto Tenesyje, Imfinzi tyrimo pagrindinis tyrėjas. "Tikrai didelis žingsnis į priekį plaučių vėžio priežiūros srityje."

 

 „Tagrisso“, „Imfinzi“ ir „Lorbrena“ yra patvirtinti Maisto ir vaistų administracijos ir naudojami.

 

 Lorbrena devynerius metus stabdė Matto Hiznay 4 stadijos plaučių vėžį. Hiznay, kuris niekada nerūkė, diagnozuotas 2011 m., kai jam buvo 24 metai.

 

 „Tai išgirdęs, tu tikrai greitai sensti“, – sakė jis.

 

 Tačiau buvo ir gerų naujienų: jo auglio testas buvo teigiamas dėl vadinamosios ALK geno mutacijos – retas radinys, dėl kurio jis galėjo gauti tikslinį vaistą. Hiznay gydytojas paspaudė jam ranką ir pasveikino jį esant mutantu.

 

 Hiznay išbandė daugybę vaistų ir senesnių gydymo būdų, įskaitant chemoterapiją ir spinduliuotę, kurių kiekvienas kurį laiką atitolino jo ligą.

 

 2015 m. jis prisijungė prie klinikinio Lorbrena tyrimo ir nuo tada vartojo šį vaistą. Tarp gydymo Hiznay įgijo daktaro laipsnį, vedė ir susilaukė dukters.

 

 „Pasidarė šiek tiek lengviau vėl pamatyti ateitį“, - sakė Hiznay, gyvenantis Breksvilyje, Ohajo valstijoje. Jis brangina kiekvieną dieną, net kai jo mažametė dukra pažadina jį vidury nakties.

 

 Daugiau, nei 234 000 amerikiečių kasmet diagnozuojamas plaučių vėžys, o apie 125 000 miršta nuo šios ligos – pagrindinės vyrų ir moterų mirties nuo vėžio priežasties. Remiantis Amerikos plaučių asociacijos duomenimis, per pastaruosius penkerius metus išgyvenamumas padidėjo maždaug 20%.

 

 Plaučių vėžys į naujesnius vaistus, pvz., imunoterapiją, reagavo geriau, nei kai kurios kitos vėžio formos, pasak gydytojų, iš dalies todėl, kad jo navikai turi daug mutacijų, dėl kurių lengviau juos rasti ir užpulti.

 

 Tagrisso taikosi į EGFR geno mutacijas, randamas 15 % plaučių vėžio atvejų JAV. Viename konferencijoje pristatytame tyrime Tagrisso buvo pridėtas po chemoterapijos ir spinduliuotės pacientams, sergantiems mutacija, kurių 3 stadijos liga buvo per toli operacijai. Vidutinis laiko tarpas iki ligos progresavimo šiems pacientams buvo daugiau, nei treji metai, o pacientams, kurie nevartojo šio vaisto, – kiek mažiau, nei šeši mėnesiai.

 

 Rezultatai rodo, kaip per pastarąjį dešimtmetį pasikeitė plaučių vėžio gydymas. „Tai tarsi Dorothy apsižvalgytų ir sakytų: mes nebe Kanzase“, – sakė Masinio bendrojo vėžio centro plaučių vėžio specialistė daktarė Lecia Sequist. kuris nedalyvavo tyrime.

 

 Kitas tyrimas parodė retą pažangą, kovojant su smulkialąsteliniu plaučių vėžiu – rečiau paplitusia ir agresyvesne ligos forma, kurią sunkiau gydyti.

 

 AstraZeneca Imfinzi padidino vidutinį išgyvenamumą iki maždaug 56 mėnesių, palyginti su 33 mėnesiais, taikant vien standartinę chemoterapiją ir spinduliuotę. Tyrime dalyvavo pacientai, sergantys smulkialąsteliniu plaučių vėžiu, kuris nebuvo išplitęs.

 

 „Kad pamatytume kažką, kur naudą vertiname metais, palyginti su mėnesiais, yra didžiulis žingsnis teisinga kryptimi“, – sakė Hjustone esančio MD Andersono vėžio centro plaučių vėžio onkologė Dr. Lauren Averett Byers, kuri nedalyvavo tyrime.

 

 FDA gegužę patvirtino Amgen Imdelltra pažengusiam smulkialąsteliniam plaučių vėžiui gydyti. Vidutinis išgyvenamumas vartojant vaistą buvo 14 mėnesių, 40 % pacientų reagavo į gydymą.

 

 Maždaug ketvirtadalis pacientų, sergančių plaučių vėžiu, gyvena penkerius metus nuo diagnozės nustatymo.

 

 Dėl naujesnių gydymo būdų kai kuriems pacientams, sergantiems pažengusia liga, gali gyventi daugiau mėnesių ar metų, tačiau vėžys dažnai atsinaujina ir tampa neišgydomu. Daugelis plaučių vėžio atvejų užfiksuojami pavėluotai.

 

 „Kai žiūrite į ligų statistiką, turite būti šiek tiek nusižeminę“, – sakė Bostono Dana-Farber vėžio instituto plaučių vėžio specialistas daktaras Pasi Janne. – Dar turime kur eiti.

 

 Per 13 metų, kai Hiznay sirgo plaučių vėžiu, jis matė, kaip mirė daug kitų pacientų.

 

 „Išgyvenusiojo kaltė, tai yra, tai tikra“, – sakė jis.

 

 Praėjus penkeriems metams po diagnozės nustatymo, Hiznay išsinuomojo alaus daryklos rūsį „Vieno procento“ vakarėliui, pavadintam dėl jo šansų išgyventi 2011 m.

 

 Po 10 metų jis tai padarė dar kartą. Jis planuoja ten būti ir po 15.“ [1]

 

1. U.S. News: The Deadliest Cancer Is Becoming More Survivable. Abbott, Brianna.  Wall Street Journal, Eastern edition; New York, N.Y.. 03 June 2024: A.3.

The Deadliest Cancer Is Becoming More Survivable

 

"There is more hope than ever for people diagnosed with the deadliest cancer.

Declines in smoking and the advent of screening and newer drugs have transformed the outlook for patients with lung cancer, once considered a death sentence.

And there is more to gain. More patients can fend off the disease for months or years with targeted or immune-boosting drugs, results released this weekend at a top cancer conference showed. That includes patients with forms of the disease that are notoriously tough to treat.

"It had such an abysmal prognosis. And now we have people who are being cured who we never thought would be cured," said Dr. Angela DeMichele, a medical oncologist at Penn Medicine.

AstraZeneca's drug Tagrisso can contain lung cancer nearly three years longer than chemotherapy and radiation alone for some stage-3 patients, one study released Sunday showed. Another found that some patients with aggressive disease survived nearly two years longer with the company's immunotherapy drug Imfinzi, the first advance for that lung-cancer subtype in decades.

Another study presented at the American Society of Clinical Oncology conference in Chicago found that 60% of advanced patients were alive without their disease advancing at five years after taking Pfizer's Lorbrena, a drug that targeted a genetic mutation in their tumors. That compares with just 8% of patients on an older drug with the same target.

"These results are really outstanding," said Dr. David Spigel, chief scientific officer at Sarah Cannon Research Institute in Tennessee, lead researcher on the Imfinzi trial. "A really major step forward in lung-cancer care."

Tagrisso, Imfinzi and Lorbrena are all approved by the Food and Drug Administration and in use.

Lorbrena has kept Matt Hiznay's stage-4 lung cancer at bay for nine years. Hiznay, who never smoked, was diagnosed in 2011 at 24 years old.

"Hearing that, you get old really fast," he said.

But there was some good news: His tumor tested positive for something called an ALK gene mutation, a rare finding that made him eligible for a targeted drug. Hiznay's doctor shook his hand and congratulated him on being a mutant.

Hiznay tried a string of drugs and older therapies including chemotherapy and radiation, each of which held off his disease for some time.

He joined a clinical trial for Lorbrena in 2015 and has taken the drug ever since. Between treatment, Hiznay earned a doctorate, got married and had a daughter.

"It became a bit easier to see the future again," said Hiznay, who lives in Brecksville, Ohio. He cherishes each day, even when his infant daughter wakes him up in the middle of the night.

More than 234,000 Americans are diagnosed with lung cancer annually, and some 125,000 die of the disease -- the leading cause of cancer death among men and women. The survival rate has increased by some 20% in the past five years, according to the American Lung Association.

Lung cancer has responded to newer drugs such as immunotherapies better than some other cancers, doctors said, in part because its tumors tend to have many mutations that make it easier to find and attack.

Tagrisso targets mutations on the EGFR gene, found in 15% of lung cancers in the U.S. One study presented at the conference added Tagrisso after chemotherapy and radiation for patients with the mutation whose stage-3 disease was too far along for surgery. The median time before the disease advanced in those patients was more than three years, compared with just under six months for patients who weren't on the drug.

The results show how much lung-cancer treatment has changed in the past decade, "It's like Dorothy looking around and saying, we're not in Kansas anymore," said Dr. Lecia Sequist, a lung-cancer specialist at Mass General Cancer Center who wasn't involved in the trial.

Another study showed rare progress against small-cell lung cancer, a less common and more aggressive form of the disease that is harder to treat.

AstraZeneca's Imfinzi increased median survival to around 56 months, compared with 33 months on the standard chemotherapy and radiation alone. The trial included patients with small-cell lung cancer that hadn't spread.

"To see something where we're measuring benefit in years versus months is a huge step in the right direction," said Dr. Lauren Averett Byers, a lung-cancer oncologist at MD Anderson Cancer Center in Houston, who wasn't involved in the trial.

The FDA in May approved Amgen's Imdelltra for more advanced small-cell lung cancer. Median survival with the drug was 14 months, with 40% of patients responding to the treatment.

About a quarter of lung-cancer patients are alive five years after diagnosis.

The newer treatments can give some patients with advanced disease months or years more to live, but the cancer often comes back and becomes incurable. Many lung cancers are caught late.

"When you look at the disease statistics, you have to be humbled a little bit," said Dr. Pasi Janne, a lung-cancer specialist at Dana-Farber Cancer Institute in Boston. "We still have room to go."

Through Hiznay's 13 years with lung cancer, he has seen many fellow patients die.

"The survivor's guilt, it's there, it's real," he said.

Five years after diagnosis, Hiznay rented a brewery basement for a "One Percent" party, named for his odds of survival in 2011.

He did it again after 10 years. He plans to be there for 15." [1]

1. U.S. News: The Deadliest Cancer Is Becoming More Survivable. Abbott, Brianna.  Wall Street Journal, Eastern edition; New York, N.Y.. 03 June 2024: A.3.