Search⌘ K
AI Features

VLM-Feedback Control and Human-in-the-Loop Interaction

Explore how Vision Language Model feedback enables self-correction in progressive image generation. Understand how human-in-the-loop interaction allows real-time control over object attributes, placement, and composition. Discover practical approaches to combining AI critiques and user input for more robust, controllable multimodal LLM agents.

In our last lesson, we saw how MuLan’s planner and progressive generator work together to build a complex image step-by-step. But what happens if the diffusion model makes a mistake in an early stage? Without a mechanism to catch and correct errors, these mistakes would cascade, ruining the final image.

A painter doesn’t just paint without looking; they constantly step back, critique their own work, and make corrections. To make its process robust, the MuLan system needs an internal “critic” that can do the same. This lesson explores the critic and how its step-by-step process unlocks powerful human-AI collaboration.

VLM-feedback for self-correction

This is the third and final pillar of MuLan’s architecture: a VLM-feedback control loop. After each object is generated, a Vision ...

Its job is to perform a number of ...