Image generation has shifted from an experimental novelty to a tool teams rely on in production workflows. Designers use it for early-stage visual exploration and design. Educators use it to support visual explanations of complex concepts. Developers rely on it to generate assets, mock interfaces, and produce structured visuals such as diagrams and infographics.
While competitors iterated rapidly, especially in terms of speed, editing, and identity consistency, OpenAI’s image story seemed quiet. With GPT-Image-1.5, OpenAI is signaling a deliberate return to the image generation space, one that is focused less on spectacle and more on control, precision, and real-world workflows.
This newsletter examines what GPT-Image-1.5 offers and evaluates it in practice against a commonly cited alternative, Gemini Nano Banana.
OpenAI officially introduced GPT-Image-1.5 on December 16, 2025, as the model powering the new ChatGPT images experience, with the same capabilities also exposed through the OpenAI API. Rather than presenting it as a standalone feature, OpenAI positioned this release as a foundational upgrade to how ChatGPT generates and edits images.
At a high level, GPT-Image-1.5 focuses on a small set of practical improvements that directly address how teams use image generation today:
Stronger instruction adherence allows longer and more specific prompts to be followed with fewer unintended changes.
High-fidelity image editing, where only the requested elements are modified while lighting, composition, and subject identity are preserved.
Improved text rendering and layout, making it more suitable for diagrams, infographics, and structured visuals.
Faster generation and lower iteration cost, enabling more experimentation without long wait times.
Taken together, these changes indicate a shift in how OpenAI approaches image generation. GPT-Image-1.5 prioritizes predictable behavior within real workflows over producing striking images in isolation. The emphasis on precision, editability, and consistency suggests an optimization for teams that need assets they can refine, reuse, and trust, not just visuals that appear visually appealing at first glance.
At a high level, GPT-Image-1.5 emphasizes a set of practical improvements that matter when image generation is part of an ongoing workflow rather than a one-off experiment. The focus is less on stylistic novelty and more on how reliably the model behaves when given specific constraints.
Instruction adherence: GPT-Image-1.5 is designed to follow multi-step, constraint-heavy prompts more consistently. This includes prompts that specify what must remain unchanged, such as preserving a subject’s appearance, layout, or background while modifying a single element. For builders and educators, this reduces the need to repeatedly restate instructions or manually correct drift across generations.
High-fidelity image editing: When an existing image is uploaded, GPT-Image-1.5 aims to apply targeted edits without unintentionally altering lighting, composition, or subject identity. This behavior is especially important for iterative workflows, where images evolve over multiple revisions and consistency matters as much as visual quality.
Text rendering and structured visuals: Image-embedded text has historically been unreliable across image models. GPT-Image-1.5 is positioned as more capable in scenarios that require readable text and clear structure, such as diagrams, infographics, posters, or UI-like layouts used in technical and educational content.
Taken together, these improvements suggest a model optimized for professional use, where predictability, control, and iteration are more important than visual flair alone.
In August 2025, Google introduced Gemini 2.5 Flash Image, an image generation and editing model that quickly became known in the community as Nano Banana. The release garnered attention for practical reasons, including fast generation, conversational image editing, and a noticeable improvement in character consistency across images. For many developers and creators, it reset expectations about what a modern image model should be capable of doing.
Now, GPT-Image-1.5 has entered a landscape where image generation is no longer judged solely on novelty. The baseline is shifted. Models are expected to support iterative workflows, preserve identity across edits, and behave predictably when given constraints.
In that context, the real question is how GPT-Image-1.5 behaves when applied to real tasks today. That is why we’re taking a practical approach. Rather than comparing feature lists or marketing claims, we’ll run the same prompts through both GPT-Image-1.5 and Gemini Nano Banana and observe how they respond under identical conditions.
To keep this evaluation grounded, we tested both GPT-Image-1.5 and Gemini Nano Banana using three carefully chosen prompts. Each prompt is designed to stress a different capability that frequently surfaces strengths and weaknesses in image generation models.
Character consistency and stylized animation
Photorealistic scene generation
Text-in-image and structured layout
Let’s walk through each prompt and examine how the models behave under identical conditions.
For the first test, we evaluate whether each model can generate and maintain a single animated character across multiple scenes. Character-level consistency is crucial for educational content, storybooks, and branded mascots, where even minor visual discrepancies can disrupt continuity.
We will use the same prompt on both models to see how reliably they establish a character’s identity and carry it forward across different settings.
Create a two-panel illustrated scene featuring the same animated character. The character is a small, friendly robot named “Lumo” with:
A round white body
A soft blue glowing face screen
Two short antennae
Thin mechanical arms and legs
Panel 1: Lumo is standing in a bright classroom, smiling and waving.
Panel 2: Lumo is indoors at night, sitting at a desk with a small lamp, appearing thoughtful.The character’s design, proportions, and facial appearance must remain consistent across all panels.
Use a clean, modern, animated illustration style with soft lighting and simple backgrounds.
Let’s first look at what Gemini Nano Banana produces for this prompt:
Now let’s compare that with the output from GPT-Image-1.5.
Both models captured the core idea of a single robot character shown across two scenes, but neither followed the prompt perfectly.
Gemini Nano Banana: We can see that the character’s face and expression shift noticeably between panels, which breaks the “facial appearance must remain consistent” constraint. However, it generated an image really fast (within 10-20 seconds).
GPT-Image-1.5: It does a good job keeping the character’s face style and proportions consistent across the two scenes, but it interprets the requested “clean, modern animated illustration style” more as a softer cartoon look than a crisp animated style. Also, it took a little longer to create the image (within 60 seconds).
Overall, both outputs are usable, but the differences highlight how identity consistency and style control can still vary even with clear constraints.
For the second test, we move away from illustration and focus on photorealism. This kind of prompt is common in product visuals and design mockups, where images need to feel believable enough to sit alongside real photographs. Here’s the prompt:
Generate a photorealistic image of a modern home office.
The scene should include:
A wooden desk near a large window
Natural daylight coming from the left
A laptop, notebook, and coffee mug on the desk
A small indoor plant beside the laptop
The room should feel realistic, well-lit, and naturally composed.
Avoid exaggerated lighting, surreal elements, or cartoon-like styling.
Let’s first look at the image generated by Gemini Nano Banana for this prompt:
Now let’s compare it with the output from GPT-Image-1.5.
Both models produced highly realistic home office scenes that closely follow the prompt, making this a case where differences are more subtle than dramatic.
Gemini Nano Banana: The generated image adheres well to the prompt, with natural daylight entering from the left, a wooden desk, and all required objects present. The scene feels clean and realistic. Generation was fast, completing within approximately 10–20 seconds.
GPT-Image-1.5: The output also aligns closely with the prompt, capturing a well-lit, naturally composed home office with convincing materials and lighting. The image appears slightly brighter overall, with softer daylight diffusion. The generation took longer than Gemini’s, but still completed comfortably within a minute.
Overall, both models handle photorealistic scene generation reliably in this scenario. The results suggest that for straightforward real-world compositions, both GPT-Image-1.5 and Gemini Nano Banana meet expectations, with differences primarily evident in generation speed and presentation details rather than visual accuracy.
For the final test, we focus on how well each model can render structured text inside an image while maintaining layout, readability, and accuracy. This type of task frequently appears in educational content, social media visuals, and learning materials, where text must be clear and precisely placed, rather than decorative. Here is the prompt:
Create a clean, designed page featuring a short poem.
The page should display the following four-line poem exactly as written, with each line on its own line and in the correct order:
The morning light begins to grow
A quiet breeze moves soft and slow
New ideas wait where shadows end
Today feels ready to beginBelow the poem, include a simple, relevant illustration that visually reflects the theme of a calm morning and a fresh start.
Use a light background, clear readable typography, and a balanced layout where the text is the main focus. The design should feel calm, minimal, and suitable for educational or creative content.
The text must be clearly legible, correctly spelled, and accurately formatted.
Avoid decorative fonts that reduce readability. Avoid surreal or abstract imagery.
Let’s first look at the image generated by Gemini Nano Banana for this prompt.
Now let’s compare it with the output from GPT-Image-1.5.
This prompt highlights how each model handles the placement of structured text and spelling accuracy when text is the primary element of the image.
Gemini Nano Banana: The model generates a clean, minimal layout with the poem clearly separated from the illustration below, which aligns well with the prompt’s structural intent. However, the final line of the poem contains a spelling error (“Teday” instead of “Today”). The overall composition is simple and readable, and the image was generated quickly, within approximately 10–20 seconds.
GPT-Image-1.5: The output preserves all four lines of the poem accurately, with correct spelling and clear line breaks. The layout maintains a strong balance between text and illustration, and the color gradients add visual depth without distracting from readability. This result aligns closely with the prompt, although image generation took longer than in previous tests, completing in approximately two minutes.
Overall, both models demonstrate the ability to place structured text within an image, but GPT-Image-1.5 shows stronger reliability in text accuracy, while Gemini Nano Banana prioritizes speed.
Looking across all three prompts, GPT-Image-1.5 emerges as the more reliable model when precision and control matter, particularly in scenarios involving character consistency and text accuracy. Its outputs align more closely with prompt constraints, even when those constraints are subtle or layered.
Gemini Nano Banana, on the other hand, consistently prioritizes speed and simplicity. It produces usable results quickly and handles straightforward visual tasks well, but it shows more variability when strict consistency or exact text reproduction is required.
In practice, this means GPT-Image-1.5 is better suited for workflows where images are part of structured content, such as educational material, branded assets, or visuals that will be reused and refined. The Gemini Nano Banana fits naturally into rapid experimentation and early-stage exploration, where turnaround time is more important than strict adherence to constraints.
This comparison ultimately reveals that image generation has entered a more mature phase. The focus is no longer on whether models can create compelling visuals, but on how well they integrate into real-world workflows, how predictable they are, how safely they handle constraints, and how confidently teams can build upon them. GPT-Image-1.5 reflects this shift clearly, signaling OpenAI’s intent to treat image generation as a core, production-ready capability rather than an experimental add-on.
From a practical standpoint, GPT-Image-1.5 is already available within ChatGPT through the Images experience. For developers, the same model can be accessed via the OpenAI API under the gpt-image-1.5 model name. This dual availability matters: it allows teams to experiment interactively in ChatGPT and then carry those workflows directly into applications, tools, or learning platforms without needing to switch models or mental context.
As image models evolve, a key differentiator is how well they support iteration, structure, and correctness at scale. GPT-Image-1.5 moves in that direction and reinforces a broader point: capability matters less than how reliably teams can integrate and iterate on model output.
If you’re ready to translate these capabilities into real applications, the next step is learning how to work directly with OpenAI’s APIs and tools.
Building with OpenAI: From APIs to Agents
In this hands-on course, you will learn how to utilize OpenAI’s platform to develop intelligent, real-world AI applications. You’ll begin by exploring how AI development has evolved and gain practical coding experience with OpenAI’s APIs, setting a strong foundation for creative experimentation and applied problem-solving. Next, you will explore OpenAI’s core capabilities in text, audio, images, and embeddings. You’ll learn to build conversational systems, use web search and function calling, process multimedia inputs, and evaluate model performance. In the process, you’ll develop the technical fluency required to connect models with real-world workflows. Finally, you’ll learn to build and deploy agentic AI systems. You’ll create autonomous agents, design workflows visually with the Agent Builder, integrate ChatKit for user interfaces, and implement security and monitoring. By the end, you’ll be equipped to develop and ship reliable, production-grade AI applications.