...

VLM-Feedback Control and Human-in-the-Loop Interaction

Explore MuLan’s self-correction mechanism powered by a VLM-based feedback loop, and understand how its step-by-step process enables powerful human-AI collaboration.

We'll cover the following...

In our last lesson, we saw how MuLan’s planner and progressive generator work together to build a complex image step-by-step. But what happens if the diffusion model makes a mistake in an early stage? Without a mechanism to catch and correct errors, these mistakes would cascade, ruining the final image.

A painter doesn’t just paint without looking; they constantly step back, critique their own work, and make corrections. To make its process robust, the MuLan system needs an internal “critic” that can do the same. This lesson explores the critic and how its step-by-step process unlocks powerful human-AI collaboration.

VLM-feedback for self-correction

This is the third and final pillar of MuLan’s architecture: a VLM-feedback control loop. After each object is generated, a Vision Language Model (VLM), such as LLaVA-1.5, is used as a critic.

Agent Design Fundamentals

Multi-Agent Conversational Recommender System (MACRS)

Nvidia Eureka Learning Agent

Applying Agentic Design Principles

Designing an AI Agent for Generating LLM Pipelines

Designing a Web Agent

Designing a Multimodal-LLM Agent for Multi-Object Diffusion

Thought Exercise: AI Hospital

Wrapping up

VLM-Feedback Control and Human-in-the-Loop Interaction

VLM-feedback for self-correction