ByteDance’s Intelligent Creation Lab has released and open-sourced Lance, a multimodal unified model. With just 3B active parameters (6B total parameters) and a training budget covering up to 128 GPUs, Lance supports six distinct tasks—including image and video understanding, generation, and editing—within a single native framework. It also enables subject-driven image/video generation. Within a day of its release, Lance climbed to the top three spots on HuggingFace’s trending list. On major benchmarks, Lance achieved outstanding results: 85.11 on VBench (leading among unified models for video generation), 62.0 on MVBench (the best score among unified models for video understanding, representing roughly an 11.3% improvement over Show-o2 7B which came in second), 0.90 on GenEval (tying for the highest score among unified models for image generation), and 7.30 on GEdit-Bench (the optimal result for unified models in image editing).
Architecturally, Lance features a dual-stream Mixture-of-Experts design: the understanding pathway processes semantic visual tokens while the generation pathway handles VAE latent tokens. Both pathways share a unified interleaved multimodal context yet maintain decoupled capabilities. Additionally, MaPE—a modality-aware rotary position encoding—is introduced to explicitly differentiate heterogeneous visual tokens with varying functions within the same sequence, thereby mitigating positional interference during joint optimization across multiple tasks. Training follows a four-stage paradigm comprising pretraining, continual training, supervised fine-tuning, and reinforcement learning. Research findings indicate that continuously incorporating data from diverse tasks such as editing and subject-driven generation further boosts basic generative abilities; this validates that ‘completeness of task coverage’ positively drives emergent generalization in unified models. In other words, collaboration among multiple tasks acts as a catalyst for capability evolution rather than merely serving as an additive mechanism. Model weights and source code are now publicly available on GitHub and HuggingFace, while the accompanying research paper has been published on arXiv (arXiv:2605.18678).