Building with OpenAI: From APIs to Agents/

...

Audio Capabilities

Learn how to generate audio files with the Chat Completions API.

We'll cover the following...

By the end of this lesson, you’ll be able to create voice-enabled applications, transcribe audio, generate speech, and build complete voice interaction systems.

Why will the Responses API not work now?

Audio capabilities unlock entirely new categories of applications. We can build conversational AI that speaks and listens, convert text to speech for visually impaired users, and perhaps even convert meetings, interviews, and calls to text.

Instead of requiring users to type or read, applications can now engage in natural voice conversations, making technology more accessible and intuitive.

OpenAI offers several approaches to working with audio, but we’ll focus on the most powerful and flexible option that aligns with our course approach.

While the course primarily uses the new Responses API, this lesson requires a fallback to the Chat Completions API.

You might wonder why we’re switching. Here’s the situation.

Responses API: OpenAI’s latest offering and our preferred API (used in all previous lessons) does not support audio yet.
Chat Completions API: The original OpenAI API, but now seen as a legacy offering, is currently the only way to work with audio using gpt-4o-audio-preview ...