On May 19, Google unveiled the new Gemini Omni model series at Google I/O 2026. Positioned as a generative media model capable of ‘creating anything from any input,’ it currently focuses primarily on video generation. Gemini Omni combines Gemini’s reasoning capabilities with multimodal content creation, allowing users to provide references in the form of images, audio, videos, and text simultaneously. Through natural language prompts, it generates high-quality videos where each editing instruction builds upon the previous one; character appearances remain consistent throughout, and physical laws hold true across multiple interactions. The model boasts enhanced intuitive physics modeling capabilities covering gravity, kinetic energy, and fluid dynamics, while leveraging Gemini’s extensive knowledge base spanning history, science, and culture to seamlessly integrate visual storytelling with authentic semantic context. Previously, Google introduced Nano Banana last year to bring Gemini’s intelligence to image generation and editing; Gemini Omni now represents its comprehensive expansion into the video domain.
The initial model, Gemini Omni Flash, is now available globally to Google AI Plus, Pro, and Ultra subscribers via the Gemini app and Google Flow video creation tool. It is also offered free of charge on YouTube Shorts and the YouTube Create app. In the coming weeks, it will be made accessible to developers and enterprise users through APIs. Support for image and audio output modalities will be added gradually; audio editing features remain unavailable pending further safety assessments. Regarding security, all videos generated by Omni automatically incorporate SynthID invisible digital watermarks, which can be verified using the Gemini app, Chrome, or Google Search. Additionally, Google has launched an ‘Avatar’ feature enabling users to create videos featuring their own likeness and voice; the company is currently evaluating broader audio editing capabilities with due caution.