Space Cats: Test Driving Sora, Open AI’s Text-to-Video Model

Loading the Elevenlabs Text to Speech AudioNative Player...

OpenAI’s Sora is a text-to-video AI model that enables users to create videos from textual descriptions, similar to how DALL·E generates images from text. Sora allows users to generate videos up to 1080p resolution, up to 20 seconds long, and in various aspect ratios, including widescreen, vertical, and square. Users can also bring their own assets to extend, remix, and blend, or generate entirely new content from text.

OpenAI’s Sora - still not available in the EU.

As part of the 12 Days of Open AI, Sora was released to Plus and Pro users of their ChatGPT service. Despite not being available in the UK or Europe yet, I used my VPN to create a Sora account to have a rudimentary look to see how well this technology is progressing, now that it’s out of research preview.

This prompt was given to me by DALL-E itself, when asked how it would describe itself to a layperson. It resulted in this image, which I’ll use as a baseline for Sora to work with. The prompt, which I’ll refer to as the ‘Basic prompt’ is: ‘A cat wearing a space helmet, floating in zero gravity’.

A cat wearing a space helmet, floating in zero gravity’ - Prompt suggested and illustrated by DALL-E.


Test 1: Basic prompt with no image input

In the first video, Sora has produced a nice side-on view of a galaxy in the background, with a starfield moving towards the camera. An orange cat spins around an invisible central point, like a pinwheel. The head of the cat is trapped in a glass ball, which occasionally duplicates itself to the opposite end of the cat’s body.

In the second iteration, the galaxy background is gone. The cat flies in from outside of the shot with he fishbowl on its head, morphing back into itself at the end.

Results

I’m awarding my point to the model’s inference of a space background, given the ‘space helmet’ and ‘zero gravity’ context in the prompt. The rest is reminiscent of something that would have been cobbled together on b3ta or Ebaum’s World, 20-30 years ago.

Grade: 🙀


Test 2: Basic prompt with image input

I then tried using DALL-E’s base image in addition to the basic prompt to see if that could nudge Sora toward a better idea of the results I was looking for.

Results

Almost no points are going to the first image, which treats the space helmet as a window in the spacecraft. Consistent cat anatomy has also gone out said (and metaphorical) window.

The second image at least attempts ‘floating in zero gravity’, but splices that with an entirely new scene of the cat on what looks to be like a bridge of the spacecraft.

Grade: 🙀🙀


Asking ChatGPT for help

At this point I thought it would be worth asking ChatGPT if it could create a ‘good Sora prompt’ based on the cat example it provided for me earlier.

Here’s what it came up with:

“A cute orange tabby cat wearing a shiny silver space helmet floats weightlessly inside a futuristic spaceship. The cat looks curious, pawing gently at floating objects like a wrench and a glowing blue orb. The spaceship interior is sleek, with glowing control panels and a large window revealing a stunning view of Earth in the background. The scene is softly lit, with occasional flashes of light reflecting off the cat’s helmet as it spins slightly in zero gravity.”

From here, we’ll refer to this as the ‘Complex prompt’. Here are the results of that prompt, without giving it an image to base anything off of:

Test 3: Complex prompt with no image input

Results

Here we have at least two Earth-like planets - assuming one is the suggested ‘glowing blue orb’, and a spaceship-interior with no control panels. No wrench is to be seen anywhere either. The first cat ends up with two tails, and the second one looks like it’s trying to be curious, while pawing at a version of itself that has morphed into an Earth-ball.

Grade: 🙀🙀🙀


Test 4: Complex prompt with base image input

Here are the results of the Complex Prompt again, using the original DALL-E image as a base.

Results

Here we have at least two Earth-like planets - assuming one is the suggested ‘glowing blue orb’, and a spaceship-interior with no control panels. No wrench is to be seen anywhere either. The first cat ends up with two tails, and the second one looks like it’s trying to be curious, while pawing at a version of itself that has morphed into an Earth-ball.

Grade: 🙀🙀🙀


Prompt with image input from MidJourney

Clearly I’m not getting anywhere with the image supplied from DALL-E. So what if I choose a different image? I decided to use MidJourney, another (and my favourite) text-to-image platform that’s been in the game a bit longer. MidJourney is known for creating more highly artistic and illustrated images.

I gave it the same prompt as DALL-E, to get a base image:

‘A cat wearing a space helmet, floating in zero gravity’ - MidJourney results

Test 5: Basic prompt with MidJourney image input

Results

The best results so far. I’m left wondering if it is because of the more highly illustrated input from the MidJourney image

Grade: 🙀🙀🙀🙀


Test 6: Complex prompt with MidJourney image input

Results

Video 1 has:

  • a cute orange tabby, wearing a shiny silver space helmet ✅

  • Floating weightlessly inside a futuristic spaceship ✅

  • The cat looks curious, pawing at 🚫 floating objects like a wrench ✅ and a glowing blue orb ✅

  • The spaceship interior is sleek, with glowing control panels and a large window revealing a stunning vide of Earth in the background ✅

  • The scene is softly lit 🚫 with occupational flashes of light reflecting off the cat’s helmet 🚫 as it spins slightly in zero gravity ✅

These results are objectively better than the first MidJourney image prompt, presumably because there’s a combination of high illustration in the base image and lots of detail in the prompt.

Video 2, by comparison, is a miss and full points are awarded to Video 1. I could see myself perhaps giving it half a point for an attempt at realism, having transplanted the cat’s head to its rear. In fairness, most cats do that on a regular basis.


Conclusions

To be frank, I’m not impressed with the quality or usability of Sora.

Based on the way Sora has been touted, I reasonably expected better interoperability with OpenAI’s own products, such as:

  • If I give Sora the prompt that ChatGPT recommends I use for DALL-E, I expect the same level of quality or better.

  • If I give Sora the same text prompt and image output from ChatGPT and DALL-E respectively, I expect the same level of quality or better

  • I expect Sora video output from a DALL-E image to yield better results than that of a competing product, in this case, MidJourney.

OpenAI must expect this kind of use case, where Sora sits within ChatGPT, alongside DALL-E to give the user one unified ‘generative AI bucket’. Moving DALL-E into this space was a clear intent on their part.

To be frank, I’m not impressed with the quality or usability of Sora. As my ten-year-old more succinctly put it while watching space cats tumbling and morphing over my shoulder this morning: “This AI is funny, but not particularly helpful.”.

I couldn’t have said it better myself.

Previous
Previous

Prompto: An AI Prompt Generator GPT

Next
Next

The Rise of Bluesky: A Privacy-Focused Alternative in the Social Media Landscape