Sekėjai

Ieškoti šiame dienoraštyje

2024 m. kovo 14 d., ketvirtadienis

OpenAI Made Video Clips Good Enough to Freak Us Out --- Company's chief tech officer explains new Sora AI tool and how it will be rolled out


"You wake up one morning with an inescapable urge to see a bull wandering around a china shop. Your options:

A) Contact a local livestock trainer and a nearby Crate & Barrel.

B) Hire a Hollywood animator.

C) Type six words into this magical AI tool, and out pops a video of a bull, carefully walking around the bowls and plates.

Welcome to the next "holy cow" moment in AI, where your words transform into smooth, highly realistic, detailed video. So long, reality! Thanks for all the good times.

OpenAI won't publicly release Sora, its new text-to-video tool, until later this year. Still, it's already showing us how easy it could be to replace many people involved in video productions with some well-written prompts and a lot of processing power. I sent the company a few prompts of my own, because who doesn't want to see a mermaid reviewing a smartphone with her crab assistant? Or a bull strolling daintily through a china shop?

Then I sat down for a video interview with Mira Murati, the company's chief technology officer, to dissect them and discuss my concerns around this technology.

When OpenAI began previewing videos made with the generative-AI tool last month, the internet understandably lost its mind. Other AI video technology has produced choppy, low-resolution clips. These looked like something out of a nature documentary or big-budget film.

Sora brings new intensity to the now-familiar AI Feelings Loop -- amazement about the capability followed by fear for society. Murati assured me OpenAI is taking a measured approach to releasing this powerful tool. That doesn't mean everything's gonna be all right.

I'd already been wowed by Sora-generated videos: drone shots of the Amalfi Coast, a corgi with a selfie stick and an animated otter on a surfboard. I asked OpenAI for something more familiar to my life: "Two professional women, both with brown hair and in their 30s, sitting down for a news interview in a well-lit studio."

The mouth and hair movements, the details on the leather jacket -- it all looks so real. Murati said the 20-second 720p-resolution clip took a few minutes to generate. There's also no sound. Murati said they plan to add that eventually.

When I put the same prompt into Runway, another AI video generator, out came two blurry, ghostlike women who haunt my dreams.

How does it all work? It'd be easier to explain the evolution of mermaids than the inner workings of "diffusion models," but here's the gist: The AI model analyzed lots of videos and learned to identify objects and actions. Then, when you give it a text prompt, it sketches out the whole scene then fills in each frame.

Industry observers and competitors -- including Runway's chief executive -- attribute some of these superior results to OpenAI's vast computing power and training data. OpenAI has recently faced copyright infringement lawsuits alleging the AI company has scraped content without permission to train ChatGPT.

I asked Murati what training data OpenAI used for Sora. "We used publicly available data and licensed data," Murati told me. When I asked if that included videos from YouTube, Instagram and Facebook, she said she didn't know. Murati later confirmed that licensed material includes content from Shutterstock.

AI models are a black box for users -- we know prompts go in and content comes out, but we don't know the steps in between. So we'll never quite know why things look the way they do. For instance, the mermaid's crustacean companion has a mustache like SpongeBob's friend Mr. Krabs. Coincidence? Maybe!

Right now, it's far more expensive to produce Sora's video clips than images from Dall-E, the company's image generator, Murati told me. When released to the public, however, it will be optimized to demand less computing power.

In this early stage, you can spot noticeable AI tells.

At one point in the Sora-created interview scene, the lighter-haired woman appears to have 10 fingers growing out of her hand. "It is really difficult to create an accurate representation of hand motion," Murati explained.

For another video, I asked to see a robot yanking a camera from a film producer. Sora's interpretation: a human film producer morphing into a moviemaking robot. The body-snatcher move is jarring. Also, in the background, a yellow taxi turns into a silver sedan. The model is "quite good at continuity, it's not perfect," Murati explained.

So, when the glitches go away, how will we tell real video from AI video?

A watermark appears at the bottom of the clips. The videos will eventually contain metadata to denote their origins, Murati said. OpenAI is also focused on red-teaming Sora, where safety testers try to throw prompts at it to draw out vulnerabilities, biases and other harmful results.

"This is the reason why we're actually not deploying the systems yet," she said. "We need to figure out these issues before we can confidently deploy them broadly."

Murati said Sora's prompt policies will likely follow those of Dall-E. For instance, you can't generate images of public figures. When I asked for "TV news footage of an incumbent American president," an OpenAI spokesman said Sora rejected the prompt.

I also asked for a "soldier walking in an Eastern European town." The company passed, opting instead for my more innocuous prompts. Regarding nudity, Murati told me the company is working with artists to figure out where it could create "guardrails and limitations without hindering creativity."

Tools like Sora will get better fast. And in a world where a text prompt could replace your drone operator or character illustrator, Hollywood is worried -- and excited. Just depends on whom you ask.

After seeing Sora, Tyler Perry said he would pause his $800 million studio expansion, saying this tech could save money on sets and location shoots, but was also cause for concern. Jeanette Moreno King, president of the Animation Guild, which represents animation artists in Hollywood and around the country, told me humans will still be needed for artistic decisions but "the future is foggy." Edward Saatchi and his AI-video studio, Fable, are dreaming up the Netflix of AI: input a prompt and out comes a full series you want to watch.

When I asked Murati about Sora's impact on video-production jobs, she again mentioned the slow, careful rollout, and said OpenAI has given these workers early access for testing. "We want people in the film industry and creators everywhere to be a part of informing how we develop it further," she said.

If OpenAI is that bull in the china shop, it might be treading lightly now. Inevitably, though, it's going to start smashing plates." [1]

1. OpenAI Made Video Clips Good Enough to Freak Us Out --- Company's chief tech officer explains new Sora AI tool and how it will be rolled out. Stern, Joanna.  Wall Street Journal, Eastern edition; New York, N.Y.. 14 Mar 2024: A.10.

 

Komentarų nėra: