Mokslas, studijos ir ekonomika: Good AI Prompts for Writing and the 'AI Detector' as Defamation Machine

“If it weren't for Pangram, I never would have heard of Mia Ballard. The young novelist "loves all things horror and is passionate about writing stories focused on feminine rage," according to her Goodreads bio. Horror isn't my cup of tea, and I do my best to steer clear of feminine rage. But last month Ms. Ballard was thrust from obscurity into notoriety when Hachette, her publisher, announced that it was canceling her book "Shy Girl" over accusations that it was generated by artificial intelligence.

Horror-novel fans had speculated online for months about the book's authorship, and in January Pangram CEO Max Spero stepped in to validate the chatter. But his evidence was itself authored by AI: Pangram's namesake product is an AI tool that purports to "detect AI-generated content with 99.98% accuracy." It classified 78.3% of Ms. Ballard's book as AI-generated, although if you dig into its analysis you will find that for many passages it indicates only "medium confidence" in that assessment.

Ms. Ballard told a Journal reporter that she "did not personally use AI" while writing "Shy Girl," but an acquaintance she hired to edit the original, self-published edition did. "All I'm going to say," Ms. Ballard told the reporter, "is please do your research on editors before trusting them with your work."

Last week I learned that I too have been accused by Pangram, albeit indirectly, of publishing AI-generated content. In November the University of Maryland issued a preprint (meaning not peer-reviewed) academic paper alleging that three freelance op-ed pieces I accepted for these pages in 2025 were AI-generated. I looked into those charges and concluded that they are unsupported, that Pangram isn't reliable enough to serve as the basis for such accusations, and that there is a strong possibility Mia Ballard was railroaded.

Pangram pitches its tool to publishing and media organizations as a way to "protect editorial integrity." The company's website boasts that its product has been "reviewed as the proven, most reliable and most accurate AI detection tool in the market by third parties including University of Maryland." But the Maryland study is more self-promotion than neutral review. Four of its seven authors are Pangram employees, including Mr. Spero and fellow co-founder Bradley Emi.

The Maryland researchers used Pangram to scan 251,442 newspaper articles, including 44,803 opinion pieces from the Journal, the New York Times and the Washington Post. They report that Pangram found opinion articles (which include editorials and letters to the editor as well as op-eds and columns) "are 6.4 times more likely to contain AI use than contemporaneous news articles from the same three newspapers."

The study includes a table that highlights 18 op-eds. Among them are three from the Journal that were supposedly AI-generated, written by military analyst John Spencer, civil-rights activist Edward Blum and physician Nicole Saphier. "There's way more than 3 individuals whose work was detected as mixed/AI in WSJ," lead author Jenna Russell, a doctoral student in computer science, told me in an email. Since these are the ones she and her co-authors chose to spotlight, they are the ones I investigated.

My first step was to run the articles through Pangram, whose website allows four free scans a day. To my surprise, the same tool applied to the same articles produced different results. Mr. Blum's piece came up 100% human; Mr. Spencer's, 44% AI and 56% human, which would make it "mixed," not "AI-generated." Only Dr. Saphier's article was still labeled 100% AI-generated. Complicating matters further, when I checked the researchers' database, it labeled the Blum and Saphier articles "mixed" and only the Spencer one "AI." To my mind, these wildly inconsistent results are enough to discredit any accusation based on a Pangram analysis.

Vauhini Vara reported in the Atlantic last week that a Washington Post editor conducted the same exercise with similar results. Pangram has updated its software since the study was conducted, according to Ms. Vara. Mr. Spero told her, in her paraphrase, that "the current iteration of Pangram . . . was designed to be more conservative . . . in flagging material as AI-generated, partly for fear of spreading false accusations." Even so, "when he and Russell reran their data set of opinion articles through the current version, the underlying assessments were similar to those in the earlier iteration."

This raises more questions than it answers. What exactly is meant by "the underlying assessments"? How can they be "similar" when the results for the articles the study spotlighted are materially different? If various "iterations" of Pangram produce such divergent results, why should anyone have confidence in the tool's reliability? I posed these questions in an email to Ms. Russell and Mr. Spero. Both declined to answer. They didn't reply when I asked for an explanation of the discrepancies between the paper and the online database.

Next, I emailed the accused op-ed authors to ask whether and how they use AI in drafting their articles. Mr. Spencer said he doesn't use it at all, although he did circulate his draft among "a group of trusted friends, who I consider better writers than me."

Mr. Blum said he uses ChatGPT and Claude in a limited way: He pastes in drafts with the prompt "edit for grammar, usage and clarity."

"Mostly, they hyphenate 'civil rights,' which I don't; use dashes instead of semicolons (I prefer semicolons); and shorten my longer sentences." He said he doesn't paste text from the chatbot back into his draft but decides for himself which of its suggestions to incorporate.

Dr. Saphier said that "I always free write [articles] in a Word document, then sometimes I have copy/pasted them into an AI with the prompt 'check for grammar, spelling, accuracy of statements and provide any flow suggestions.'"

Unlike Mr. Blum, she has also sometimes pasted AI output back into a draft. She added that "I have stayed away from AI altogether in writing lately to avoid this exact scenario."

It is inaccurate and unfair to characterize work produced in the manner described by any of these authors as "AI-generated." The Pangram tool has no ability to discern a writer's work process. It merely looks for linguistic patterns that tend to correlate with AI output. Pangram boasts that its tool is far better than other AI detectors at distinguishing human from AI text. That may be true, but the bar is low. ZeroGPT, another popular detector, flags 96.4% of the Gettysburg Address as AI-written. Lincoln gets credit only for "We are met on a great battle-field of that war."

Pangram claims a false-positive rate of 1 in 10,000 for all forms of writing and 1 in 100,000 for news articles, based on a test of content published before ChatGPT's 2022 debut. But neither the company's website nor the Maryland paper cites a false-positive rate for op-eds. My long experience with op-eds leads me to think that they may be more prone to false positives than other forms of writing. A significant subset of them are formulaic in style -- expository, structured, smooth, coherent and impersonal, traits that overlap with AI-generated prose.

I asked Mr. Spero and Ms. Russell if their research tested for this possibility. He said "no op-eds from before the release of ChatGPT were flagged as AI-generated." She added that the sample size for the pre-AI op-ed test was 4,344 articles. As a matter of simple arithmetic, that is far too small a sample to rule out my hypothesis if we take Pangram's other claims at face value.

Those of us who practice writing as a professional craft have an ethical duty, and sometimes a contractual obligation, to ensure that our work is original. Many amateurs also adopt that ethos. To accuse them falsely of passing AI-generated work as their own is potentially defamatory. If the sole factual basis for such a claim is the output of a fallible and inconsistent AI-based pattern-detection tool that sometimes reports only "medium confidence" in its own findings, that looks like reckless disregard for the truth -- especially when the accuser is the software's designer, who presumably knows its limits.

Which leads to another question about the Maryland study: Did its authors contact Mr. Spencer, Mr. Blum and Dr. Saphier and offer them an opportunity to respond before publishing the paper accusing them? This is basic journalistic due diligence. All three writers said they received no such communication. Both Mr. Spero and Ms. Russell declined to answer the question. He said only that "Jenna contacted ten reporters to ask about AI use. She received no response from any of them." She disputed the question's premise. "We do not accuse anyone of using AI, rather we report trends at an aggregate level," she wrote. "We do not in any way say that using AI is inherently good or bad!"

A look at Ms. Russell's and Mr. Spero's Twitter feeds belies that nonjudgmental pose even more glaringly than their study does. On Feb. 16, Mr. Spero taunted a staffer for a British newspaper, whose name I will withhold: "We fetched 871 articles published in the Guardian by [the journalist] over the last six years. It's clear that he is increasingly relying on AI. In two weeks in February he churned out nine articles classified by Pangram as fully AI-generated. Receipts below."

On Feb. 18, Ms. Russell replied: "Reminder that you can search over 250k news articles for AI slop at . . ." followed by the URL of the Maryland study's database. (An editor of Semafor answered Mr. Spero's tweet by quoting a Guardian spokesman, who described the accused man as "an exemplary journalist" and the Pangram CEO's accusation as "preposterous.")

I am in a position to defend my writers and myself, but Pangram has shredded the reputation of Mia Ballard, whose book was supposed to be published next Tuesday. The Times reported last month that she is "pursuing legal action," and her prospects in a contract dispute depend on facts that haven't been publicly disclosed. We don't know the terms of her agreement with Hachette or how her acquaintance used AI to edit her manuscript, and the company's AI policy permits authors to use AI for "editing, correcting or otherwise refining text or other content."

If the accusations against her are false, she could also have a claim against her accusers for defamation and tortious interference.

And for copyright violation. The Pangram analysis of "Shy Girl," which Mr. Spero has made publicly available on his company's website, features the book's full text -- one of the few places you can still read it after Hachette's cancellation. Where did Mr. Spero get the text? Audrey Henson, who writes the Drey Dossier newsletter on Substack, noticed that "a URL was embedded directly in the document over and over: OceanofPDF.com." The Authors Guild describes OceanofPDF.com as "one of the most notorious digital ebook piracy sites."

Potential clients might ask: Exactly what does Pangram mean when it promises to "protect editorial integrity"?

---

Mr. Taranto is the Journal's editorial features editor.” [1]

1. The 'AI Detector' as Defamation Machine. Taranto, James. Wall Street Journal, Eastern edition; New York, N.Y.. 04 Apr 2026: A11.

Mokslas, studijos ir ekonomika

Sekėjai

Ieškoti šiame dienoraštyje

Subscribe Now: Feed Icon

Tinklaraščio archyvas

Apie mane

2026 m. balandžio 4 d., šeštadienis

Good AI Prompts for Writing and the 'AI Detector' as Defamation Machine

Komentarų nėra:

Translate