Google Gemini Omni: A First Look at Google's New Video Model

Omnigen Editorialon 14 hours ago

What is Gemini Omni?

On May 11, 2026, eagle-eyed users spotted a new model card labeled "Omni" inside the Gemini app, with the description: "Create with Gemini Omni: meet our new video model, remix your videos, edit directly in chat, try templates, and more."

The leak landed roughly one week before Google I/O 2026 (May 19–20), which strongly suggests Omni is Google's next flagship video model — a successor and complement to the Veo line, oriented around conversational editing rather than pure text-to-video generation.

Google has not made an official announcement yet. Everything below is based on the in-app leak and the early demos that surfaced alongside it. Expect specifics to shift on stage at I/O.

Why "Omni"?

The naming is a strong hint about positioning. Where Veo focuses on cinematic generation and Imagen focuses on stills, Omni appears to be a single model that handles the full video workflow:

  • Text-to-video generation
  • Image-to-video and video-to-video remixing
  • In-chat editing of existing clips (object swaps, scene rewrites, watermark removal)
  • Native audio generation (dialogue, SFX, ambient)
  • Template-driven creation for common formats

In other words, Omni reads less like a raw model release and more like an end-to-end creative agent built on top of one.

What the leak actually shows

Three capabilities stood out in the demos that leaked alongside the model card:

1. Editing inside chat

Users were able to upload a clip and ask Omni — in plain language — to swap an object, rewrite a scene, or remove a watermark, and get back a coherent edit without leaving the chat. This is the headline workflow improvement: no NLE, no masking tools, no separate inpainting pass.

2. Templates and remixing

A "Templates" entry point implies Google is leaning into format-first creation — short-form social videos, ads, explainers — where the user picks a template and Omni fills in the content. Combined with remix, this is clearly aimed at creators who don't want to start from a blank prompt every time.

3. Improved native audio

Veo 3.1 already generates synced dialogue, sound effects, and ambient audio, but audio quality has been the weak spot versus the competition. Early Omni demos suggest a meaningful step up here, which would close one of the most visible gaps with rivals.

How does it compare?

Based on early demo footage, the picture looks roughly like this:

Model Strength Weakness
Gemini Omni Conversational editing, workflow integration, templates, audio Per-frame fidelity reportedly trails the leader
Seedance 2 (ByteDance) Best-in-class cinematic quality and motion Less native editing/agentic workflow
Sora 2 (OpenAI) Strong physics, long-form coherence Limited editing surface inside chat
Veo 3.1 (Google) Native audio, solid generation Being overtaken by Omni internally

The honest read: Omni's edge is the workflow, not the pixels. If you want the single most photorealistic shot from a prompt, current evidence points to Seedance 2. If you want to iterate on a video the way you iterate on a document — by talking to it — Omni is positioned to win that lane.

Tiers, limits, and distribution

Several signals from the leaked UI:

  • Tiered variants are likely (Flash and Pro), mirroring the rest of the Gemini family.
  • Tight rolling usage limits: testers reported hitting 86% of daily quota after just two generations on the Google AI Pro plan. Video remains expensive to serve.
  • Three distribution channels are expected at launch: the Gemini app (consumer), AI Studio (developers), and Vertex AI (enterprise) — the same playbook Google used for Veo.

What to watch for at Google I/O 2026

A few specific things worth watching when Google takes the stage:

  1. Pricing and quota — does Pro get meaningful video headroom, or is this gated behind a new tier?
  2. Clip length and resolution — current leaks don't pin these down.
  3. API surface — does the editing-in-chat behavior expose as a real API, or is it Gemini-app-only at first?
  4. Watermarking and provenance — how does Google handle SynthID and C2PA on generated and edited clips?
  5. Veo's roadmap — does Veo continue as a separate cinematic-focused model, or does Omni quietly absorb it?

Bottom line

Gemini Omni isn't trying to win a frame-by-frame fidelity contest. It's trying to make video feel like the rest of Gemini: something you talk to, edit in chat, and ship without leaving the conversation. If the keynote confirms the leaked behavior — especially the in-chat editing and the audio jump — Omni becomes one of the most interesting video releases of 2026, even if it isn't the prettiest.

We'll update this post as Google shares official details.