Sekėjai

Ieškoti šiame dienoraštyje

2024 m. balandžio 28 d., sekmadienis

How generative artificial intelligent systems are produced


 

"Artificial intelligent systems (AIs) are trained on vast quantities of human-made work, from novels to photos and songs.

 

These training data are broken down into “tokens”—numerical representations of bits of text, image or sound—and the model learns by trial and error how tokens are normally combined.

 

Following a prompt from a user, a trained model can then make creations of its own.

 

More and better training data means better outputs.

 

In America the model-makers are relying on the legal concept of fair use, which provides broad exemptions from the country’s otherwise ferocious copyright laws. An encouraging precedent comes courtesy of a ruling on Google Books in 2015. The Authors Guild sued the search giant for scanning copyrighted books without permission. But a court found that Google’s use of the material—making books searchable, but showing only small extracts—was sufficiently “transformative” to be deemed fair use. Generative-AI firms argue that their use of copyrighted material is similarly transformative. Rights-holders, meanwhile, are pinning their hopes on a Supreme Court judgment last year that a series of artworks by Andy Warhol, which had altered a copyrighted photograph of Prince, a pop star, were insufficiently transformative to constitute fair use.

Not all media types enjoy equal protection.

Copyright law covers creative expression, not ideas or information.

Computer code, for example, is only thinly protected, since it is mostly functional rather than expressive, says Matthew Sag of Emory University in Atlanta.

(A group of programmers aim to test this in court, claiming that Microsoft’s GitHub Copilot and OpenAI’s CodexComputer infringed their copyright by training on their work.) News can likewise be tricky to protect: the information within a scoop cannot itself be copyrighted. Newspapers in America were not covered by copyright at all until 1909, notes Jeff Jarvis, a journalist and author. Before then, many employed a “scissors editor” to literally cut and paste from rival titles.

Image-rights holders are better protected. AI models struggle to avoid learning how to draw copyrightable characters—the “Snoopy problem”, as Mr Sag calls it, referring to the cartoon beagle. Model-makers can try to stop their AIs drawing infringing images by blocking certain prompts, but they often fail. At The Economist’s prompting, Microsoft’s image creator, based on OpenAI’s Dall-E, happily drew images of “Captain America smoking a Marlboro” and “The Little Mermaid drinking Guinness”, despite lacking express permission from the brands in question. (Artists and organisations can report any concerns via an online form, says a Microsoft spokesman.) Musicians are also on relatively strong ground: music copyright in America is strictly enforced, with artists requiring licences even for short samples. Perhaps for this reason, many AI companies have been cautious in releasing their music-making models.

Outside America, the legal climate is mostly harsher for tech firms. The European Union, home to Mistral, a hot French AI company, has a limited copyright exception for data-mining, but no broad fair-use defence. Much the same is true in Britain, where Getty has brought its case against Stability AI, which is based in London (and had hoped to fight the lawsuit in America). Some jurisdictions offer safer havens. Israel and Japan, for instance, have copyright laws that are friendly for AI training." [1]


1. The imitation game. The Economist; London Vol. 451, Iss. 9393,  (Apr 20, 2024): 55, 56, 57.

 

Komentarų nėra: