12 Days of OpenAI: Your TL;DR recap

12 Days of OpenAI: Your TL;DR recap

Discover OpenAI’s 12 Days of Innovation, featuring ChatGPT Pro, Reinforcement Fine-Tuning, the Sora video generator, and the groundbreaking o3 model shaping AI’s future.
15 mins read
Jan 22, 2025
Share

The days of arguing with your IDE over cryptic bugs are numbered.

Over the past two weeks, OpenAI hosted its first-ever "12 Days of OpenAI" event—a showcase of cutting-edge innovations designed to transform how developers and creators work with AI. If you missed it, don’t worry—this newsletter has you covered.

Think of this event as OpenAI’s holiday gift to developers, researchers, and tech enthusiasts. Each day brought something new: from the official release of the o1 model with advanced reasoning capabilities to tools like Reinforcement Fine-TuningCanvas, and voice integration. These updates are critical steps toward making AI a true partner in creation.

Whether you’re debugging code, prototyping new ideas, or simply exploring the potential of AI, these tools promise to unlock new levels of efficiency, creativity, and problem-solving power.

In this newsletter, I’ll break down the highlights from all 12 days, show why they matter, and explore how the grand finale—the release of the o3 model—could change the way we think about AI.

Let’s dive in.

Day 1: ChatGPT Pro: Power at a Price#

OpenAI kicked off its event with a bold announcement: ChatGPT Pro, a $200-per-month subscription plan aimed squarely at power users pushing AI to its limits.

Whether you’re juggling multiple frameworks or slicing through advanced math proofs, ChatGPT Pro grants access to the freshly unveiled o1 reasoning model in its most optimized form—including o1 pro mode.

Unlike typical AI models that generate an answer in one pass, o1 steps back and checks its work. That private “chain of thought” often leads to fewer mistakes—though it does mean you’ll wait a bit longer for a final answer.

However, for some, it’s a small price to pay for near-expert coding assistance, multi-modal reasoning, and advanced data analysis—though the added wait time might hinder workflows requiring rapid responses, such as real-time debugging or interactive demonstrations.

What Makes o1 Special?#

Earlier o1-preview versions hit 62% accuracy on CodeForces challenges and struggled with the toughest problems. These versions also couldn’t handle tasks that combined text and images, causing developers to spend hours sifting through documentation or rephrasing queries.

However, with the official release of o1, accuracy has jumped to 89%, and the model can now interpret both text and images in the same session. Translation? You can finally tackle bigger, more complex challenges confidently—and save yourself a lot of time.

According to OpenAI’s internal testing, this refined o1 delivers:

  • Fewer major errors on tricky, real-world tasks

  • Improved coding assistance thanks to extended reasoning time

  • Multimodal analysis for debugging screenshots, architecture diagrams, and more

Benchmark results for OpenAI models
Benchmark results for OpenAI models

At $200 per month, ChatGPT Pro isn’t for everyone—especially with standard ChatGPT (including GPT-4o and the basic o1 model) still available at lower tiers.

But if your work thrives on depth and precision, the pro mode promises a thoroughness that might offset the cost. OpenAI claims up to a 75% reduction in coding errors on everyday programming queries compared to previous versions of o1.

Still, it’s worth noting that on some specialized benchmarks, the new o1 lags behind its preview. And if your needs aren’t as advanced, you may not see the benefit of paying extra. But for those whose livelihoods hinge on shaving time off complex tasks—data scientists, AI researchers, hardcore developers—ChatGPT Pro might be the best seat in the house.

As impressive as these improvements are, there’s a darker side to advanced “reasoning” models. Research shows that OpenAI’s o1—and even rival models from Google, Meta, and Anthropic—can “scheme” against human users under certain conditions, pursuing their own goals even when these conflict with what a user wants. Interestingly, o1 is reported to exhibit some of the most deceptive behaviors when given a strongly prioritized objective.

Day 2: Reinforcement Fine-Tuning#

On the second day of OpenAI’s “12 Days of OpenAI” event, the spotlight turned to Reinforcement Fine-Tuning (RFT)—a new way to refine large language models without drowning in labeled data.

Think of it like teaching an AI to solve problems by rewarding it when it stays on track. Rather than memorizing answers, the model starts to “reason” toward the solutions you want.

From a developer’s perspective, RFT opens doors to customization. Using fewer examples than traditional supervised methods, you can fine-tune models to adopt unique styles, tackle domain-specific queries, or meet strict compliance needs. As RFT integrates with OpenAI’s developer dashboard, you can configure everything from data selection to the Grader logic that assigns rewards. The upshot? Faster, more targeted tuning with less overhead.

But the real magic is how RFT balances efficiency with reliability. Sure, standard fine-tuning can yield decent results—but RFT’s reward-based feedback loop often generalizes better, meaning your model can adapt to a wider range of challenges.

If you’re building AI-driven apps in specialized fields like healthcare or finance, where every piece of data is precious, RFT might be the key to unlocking high-quality model performance without breaking the bank.

Fine-tuned o1-mini shows better performance than o1
Fine-tuned o1-mini shows better performance than o1

Whether you’re a seasoned ML engineer or just exploring what’s possible, RFT underscores one crucial lesson: AI doesn’t have to be trained from scratch to be effective. By selectively applying rewards, you can guide these frontier models to align with your goals—no massive datasets are required. It’s yet another step toward making AI more adaptable, versatile, and, ultimately, accessible to the everyday developer.

Day 3: Meet Sora—OpenAI’s First-Gen Video Generator#

Think of Sora as your new digital storyboard artist—a tool that can spin up short video clips, entire sequences, and quick animations from text prompts (or even images).

It’s currently making waves among artists and creators, but game developers should take note: Sora can produce dynamic visuals in the style of classic side-scrollers or first-person shooters, and it even emulates a Twitch-like overlay. Perfect for rapid prototyping or creating proof-of-concept cutscenes, Sora helps you iterate on ideas faster than ever.

One impressive feature is how Sora handles obstruction (a.k.a. “object permanence”). Imagine a small building on your right until a massive skyscraper slides by—once the skyscraper passes, the small building reappears exactly where it was.

That continuity adds a layer of coherence that older video generators often botch. True, Sora still has its quirks—think misaligned limbs or off-kilter animation details—but as a first iteration, it’s raising the bar for AI-driven visuals.

Video generated by Sora
Video generated by Sora

Sora’s Remix tool makes it easy to tweak camera angles, color palettes, or visual styles on the fly—a huge win for developers who like to experiment with different aesthetics.

From preliminary cinematic sequences to quick gameplay teasers, you can churn out multiple variations in minutes—without needing a dedicated art team. Don’t expect photorealistic footage yet: Sora filters out mature or trademarked content and watermarks all videos by default.

Interestingly, OpenAI isn’t planning a Sora API anytime soon, citing high demand and capacity issues that forced it to pause new sign-ups shortly after launch. While sign-ups have resumed, CEO Sam Altman admits they “significantly underestimated demand.” Skipping an API may put OpenAI at a disadvantage compared to Google, which launched a limited-access API for its video generator, Veo—and promises a Veo 2 API next year.

For now, though, Sora remains a standalone tool that game devs can tap into for swift visual prototyping, one AI-generated clip at a time.

Day 4: ChatGPT Canvas—A More Seamless Coding Workflow?#

On Day 4, OpenAI showcased ChatGPT Canvas, a feature designed to make coding with ChatGPT feel more like a collaborative editing session.

Rather than jumping back and forth between your IDE and the chatbot, Canvas lets you write, debug, and refine code in a single, integrated environment—complete with version history, inline suggestions, and the ability to highlight specific lines for ChatGPT to modify. It's like your AI teammate who can spot typos, clarify confusing logic, or translate your Python script into Java within one interface.

For coding tasks, Canvas opens a specialized code editor that detects your programming language, shows line numbers, and offers context-aware tools like Fix Bugs or Code Review.

You can ask Canvas to expand comments, add debugging print statements, or polish your code’s structure—all while keeping a bird’s-eye view of your entire script. And because Canvas tracks every change in a version history, you can easily revert to earlier states or examine a diff that highlights exactly what the AI modified.

ChatGPT Canvas
ChatGPT Canvas

While Canvas shines for standalone scripts and smaller projects, it may still fall short for larger codebases that demand deeper IDE integration. But for developers seeking quick, iterative coding help—especially those just dipping their toes into AI-assisted development—ChatGPT Canvas provides a streamlined, context-rich environment that turns ChatGPT into more than just a chat window.

It’s a first step toward making AI a seamless part of the coding workflow, setting the stage for even more advanced tools on the horizon.

Day 5: ChatGPT for Everyone!#

From Day 5 onward, OpenAI focused on making ChatGPT more accessible to everyday users, starting with a big-name partnership: Apple. With iOS 18.2, you can now invoke ChatGPT through Apple Intelligence—essentially, Siri with an AI upgrade.

Though some Apple fans aren’t keen on trying yet another AI tool, it’s a noteworthy step toward blending ChatGPT’s conversational prowess with a platform already in millions of pockets and purses. 

Day 6: Advanced Voice Mode (AVM)#

On Day 6, OpenAI introduced a visually-aware version of advanced voice mode that processes real-time video from your phone’s camera or screen sharing. Imagine scanning your kitchen pantry and asking ChatGPT if you have the ingredients for a recipe. It’s a playful, hands-free approach to AI, and OpenAI even gave AVM a seasonal twist by letting you chat with a voice that sounds remarkably like Santa himself. 

Day 7: Projects for Organizational Improvements#

Day 7 shifted gears to organizational improvements. OpenAI unveiled Projects, a smart folder system that tidily groups your chat histories and uploaded documents by topic.

It’s a simple yet powerful addition that reduces clutter for power users and novices alike—no more hunting through endless chat logs. With Projects, you can keep your holiday baking ideas, coding experiments, and personal journaling all neatly sectioned off.

Day 8: ChatGPT Search—Now Free for All#

Finally, on Day 8, OpenAI opened its ChatGPT Search feature to every user, with no subscription required.

Rather than answering purely from its training data, ChatGPT can search live websites for the latest info—much like competing apps such as Perplexity AI. Just be warned: this newfound "knowledge" can sometimes be confidently incorrect. Still, it signals another leap in ChatGPT’s evolution, making AI-based search more mainstream and conversational.

Day 9: o1 Comes to the OpenAI API#

As Day 9 dawned, OpenAI announced a significant milestone for developers: o1—its high-powered reasoning model—would join the OpenAI API. But there’s a catch: only devs at tier 5 (representing spending at least $1,000 and with accounts older than 30 days) can tap into o1’s self-checking logic for now.

If you’re used to speed and simplicity, brace yourself. Models like o1 take their time “thinking” through tasks, but the added thoroughness and higher accuracy in coding and business queries may justify the wait.

The price tag on o1 might pause some developers: $15 per ~750,000 words for input and $60 per ~750,000 words for outputsix times the cost of GPT-4o. Still, the newest release, o1-2024-12-17, promises to be worth every penny. According to OpenAI, it’s post-trained with feedback-driven improvements that boost accuracy, reduce unwarranted rejections, and help devs tackle gnarly problems more effectively.

On the customization front, o1 in the API gains new features like function calling (to link external data), developer messages (to set tone and style), and image analysis. There’s even a new reasoning_effort parameter to control how long the model deliberates before spitting out answers.

That flexibility goes a long way for developers building specialized applications, particularly those who need to parse images or orchestrate complex, multi-step operations.

Day 9 also brought more goodies for engineers. GPT-4o and GPT-4o mini got updated for Realtime API usage, offering cost reductions, better data efficiency, and improved stability.

Meanwhile, WebRTC support has been added for real-time audio streaming within the Realtime API, coinciding with OpenAI hiring WebRTC’s creator. Finally, preference fine-tuning joins the fine-tuning API—allowing devs to teach models to prefer certain responses over others—and new Go and Java SDKs have entered early access. 

Day 10: Dial-In AI and Cross-Platform Support#

On Day 10, OpenAI reached the final frontier of connectivity—people without reliable internet—by debuting a 1-800-ChatGPT phone line (1-800-242-8478).

Calling in from anywhere in the U.S. grants you 15 free minutes of advanced voice mode, letting you chat with AI from even the most remote landlines. It’s another step in making AI accessible to everyone, proving you don’t need a fancy smartphone or blazing internet to tap into the latest from OpenAI. 

Day 11: Expanded Desktop Version for Coders and Creatives#

Day 11 turned the spotlight back onto coders and creatives alike. OpenAI massively expanded the desktop version of ChatGPT for Mac, adding integration with an even wider range of apps and IDEs.

Whether editing Swift code in Xcode, drafting a blog post in Apple Notes, or organizing your day in Notion, ChatGPT can pull snippets and context directly—no copying or pasting required. Advanced voice mode (AVM) can float in a separate window, offering real-time feedback as you work.

It’s collaboration in a fluid and natural way, bringing AI one step closer to being a true, cross-platform partner in our daily tasks. 

Day 12: AGI?#

On the twelfth and final day of OpenAI’s event, all eyes turned to o3, the much-anticipated successor to the o1 “reasoning” model. Billed as a breakthrough in AI’s quest for advanced reasoning, o3 arrives with bold claims of inching ever closer to AGI—albeit with important caveats. There’s also an o3-mini variant designed for more specialized tasks and faster response times, rounding out this new family of models.

Despite the hype, o3 isn’t widely available just yet—OpenAI has opened a preview for o3-mini to safety researchers, with a broader rollout planned in the coming months.

Why o3 and not o2? #

CEO Sam Altman explained that the telecom giant O2 posed a trademark snag, so OpenAI skipped straight to naming its latest model o3. Beyond this quirky detour, the real conversation is about deliberative alignment—a safety technique designed to keep AI from straying off-script or, worse, deceiving humans.

Despite these safeguards, internal testers report that o3, like o1 before it, can still exhibit “scheming” behavior under certain conditions—essentially, making decisions that prioritize its objectives over user intent. OpenAI continues to explore this risk through rigorous red-team evaluations.

Developers might be anxious about the phrase “approaches AGI,” but there’s no need to panic. o3 scores impressively on certain benchmarks yet stumbles on tasks any average human could solve in minutes. Much of o3’s hype revolves around its performance on ARC-AGI, a puzzle-based test designed to compare AI models to essential “human prior knowledge”—object permanence, goal-directedness, and basic geometry. Each task uses color-coded grids that any average human (including a child) can easily solve but often trip up AI models.  

In the ARC-AGI benchmark, designed to compare AI models to essential human prior knowledge—things like object permanence, goal-directedness, and basic geometry—o3 can rapidly solve tasks as long as they have clear evaluation metrics that serve as reward signals during fine-tuning.

This means results might be somewhat skewed right now, hence the need for ARC-AGI 2, which OpenAI and the creators of ARC-AGI are developing together. This ongoing collaboration ensures a more balanced and accurate assessment of AI capabilities in future benchmarks.

Example of ARC-AGI puzzle
Example of ARC-AGI puzzle

Performance and cost#

Still, there’s plenty to marvel at. Early testing shows o3 outperforming o1 by a wide margin on SWE-Bench Verified (a programming-focused benchmark) and achieving a Codeforces rating of 2727—well above the threshold for top-tier human coders.

On math-heavy tasks like the American Invitational Mathematics Exam, o3 reportedly aced nearly every question, while in specialized science domains like GPQA Diamond, it scored high on graduate-level biology, physics, and chemistry. These feats are a testament to how effectively o3 leverages its "private chain of thought" to plan solutions step by step.

But as the adage goes, “A model’s gotta eat”—in o3’s case, that means time and compute. Like o1, o3 can be set to low, medium, or high compute, which determines how methodical (and slow) it will be in "thinking" through a question. That extra time often translates into more accurate answers, but it also means bigger cloud bills if you run large-scale queries. So, for devs, it’s a matter of balancing speed, cost, and accuracy—no small feat for businesses that rely on quick turnarounds.

Where do we go from here?#

OpenAI’s final-day announcement tackled the inevitable question of AGI (human-like intelligence).

While o3 represents a leap forward in reasoning, I want to be clear that it’s still far from surpassing humans at most valuable economic tasks—the threshold OpenAI associates with true AGI. Researchers, including François Chollet, point out that while o3 excels on specific benchmarks, it struggles with tasks humans find trivial. OpenAI’s collaboration on ARC-AGI 2 underscores the gap between impressive test results and real-world general intelligence.

That said, o3 shows us a clear trend: the shift from brute-force AI scaling to models designed for reasoning and adaptability. It’s an exciting step, but the reality is that AI isn’t going to replace human developers. Instead, it’s here to amplify creativity and reduce the drudgery of repetitive tasks.

So what’s the takeaway for developers? Think of o3 and similar tools as the next evolution in your tech stack—an assistant that helps you tackle bigger challenges faster. Whether it’s debugging, prototyping, or exploring new ideas, AI can be the teammate that speeds up your workflow without replacing your role.

AI like o3 is the jet engine; it’s up to us to design the aircraft. The tools are here—now it’s time to start building.


Written By:
Usama Ahmed
The AI Infrastructure Blueprint: 5 Rules to Stay Online
Whether you’re building with OpenAI’s API, fine-tuning your own model, or scaling AI features in production, these strategies will help you keep services reliable under pressure.
9 mins read
Apr 9, 2025