Developing AI systems that can seamlessly understand and generate content across various modalities — such as text, images, and video — with reasoning capabilities approaching human cognition has been a central goal in the field.
While proprietary models have long showcased this integrated intelligence, their underlying mechanisms remain private. BAGELhttps://bagel-ai.org (Scalable Generative Cognitive Model), an innovative open-source foundational model released in late May of 2025, is now stepping into this crucial space. As it scales, the model uncovers emergent multimodal abilities that push the field forward. With a robust architecture boasting 7 billion active parameters (and 14 billion in total), BAGEL performs better than open-source unified models across multimodal generation and understanding benchmarks. Its training on trillions of tokens derived from extensive, interleaved text, image, video, and web data has enabled advanced multimodal reasoning, including capabilities like free-form image manipulation, predicting future frames in a sequence, 3D manipulation, and world navigation.
So of course, we had to take it for a spin.