OpenAI's o3-mini: Is it worth trying as a developer?

OpenAI's o3-mini: Is it worth trying as a developer?

Is the o3-mini a worthwhile alternative to DeepSeek's accuracy and performance? We break down its strength and compare it with R1.
7 mins read
Feb 24, 2025
Share

As with any developer tool, each LLM has its strengths and weaknesses, and we have to make tradeoffs when deciding which tool to use.

But what if you didn't have to make many compromises?

Imagine an LLM that can:

  • Help solve complex math problems

  • Write flawless code

  • Reason through technical challenges

....all at lightning speed and a fraction of the cost.

OpenAI o3-mini is attempting to check off all these boxes. o3-mini is the latest reasoning model in OpenAI’s lineup. Released on January 31, 2025, it's already proved its strength for coding, STEM, and beyond— and is a competitive response to DeepSeek's R1.

So, is the 03-mini worth trying—or is it just hype?

I'll break it down today with:

  • What o3-mini offers over o1

  • How o3-mini handles developer tasks

  • o3-mini vs. DeepSeek comparison

  • What o3-mini means for the future of development

Let's dive in.

o3-mini: The start of what's to come#

The o3-mini is OpenAI's first step toward o3—which will be the successor to o1. o3-mini is a compact, cost-efficient version of the full o3 model, allowing us to preview its reasoning capabilities until o3 is ready for market.

The o3-mini is built on the transformer’s architecture, specifically fine-tuned for reasoning tasks. This means o3-mini doesn’t just generate responses. Instead, it employs enhanced chain-of-thought reasoning to analyze problems from multiple perspectives before generating responses to deliver deeper, more accurate, and more reliable solutions.

So what does o3-mini already do better than its predecessor, o1? Let's compare the two.

What makes o3-mini better than o1?#

How does o3-mini stand up to o1?

o3-mini maintains the low cost and reduced latency of the corresponding o1-mini—while getting a little faster too.

For instance, in A/B testing1 against o1-mini:

  • o3-mini generated responses 24% faster than o1-mini

  • This reduced average response time from 10.16 seconds to 7.7 seconds

And as we can see in the table below, the o3-mini is also more affordable than the o1 model.

Unlike o1, the o3-mini integrates search capabilities, delivering up-to-date answers with relevant web links. While still in its early stages, this feature marks a significant step toward enhancing AI-driven research and analysis.

That said, if you require vision-based reasoning, o1 is still in the lead, as o3-mini lacks vision-based reasoning capabilities.

o3-mini standout features#

o3-mini introduces highly requested features, making it production-ready from day one.

Some of these features include:

  • Function calling

  • Structured outputs

  • Developer messages

  • Customizable reasoning effort

Function calling#

Imagine asking an AI for the latest stock prices, booking a dinner reservation, or updating a dashboard—all seamlessly executed through function calls.

With the o3-mini function calling feature, the model can fetch real-time data and take meaningful actions within applications. This capability enhances AI’s ability to interact with external systems, making it more practical and efficient for real-world tasks.

Python 3.10.4
from openai import OpenAI
# Custom function to handle restaurant booking
def hotel_reservation(date, time, guests):
confirmation = {
"status": "Confirmed",
"restaurant": "Grand Royal Hotel's Restaurant",
"date": date,
"time": time,
"guests": guests,
"message": f"Table for {guests} confirmed at Grand Royal Hotel's Restaurant on {date} at {time}."
}
return confirmation
# OpenAI function calling
client = OpenAI()
tools = [{
"type": "function",
"function": {
"name": "hotel_reservation",
"description": "Make a reservation at the Grand Royal Hotel's restaurant.",
"parameters": {
"type": "object",
"properties": {
"date": {"type": "string", "description": "Reservation date (YYYY-MM-DD)"},
"time": {"type": "string", "description": "Reservation time (HH:MM, 24-hour format)"},
"guests": {"type": "integer", "description": "Number of guests"},
},
"required": ["date", "time", "guests"],
"additionalProperties": False,
},
"strict": True
}
}]
messages = [{"role": "user", "content": "Can you book a table for 2 at a restaurant this Saturday at 8 PM?"}]
completion = client.chat.completions.create(
model="o3-mini-2025-01-31",
messages=messages,
tools=tools,
)
print(completion.choices[0].message.tool_calls)

Structured outputs#

Consistency is key when working with AI-generated data. Structured output ensures that every response strictly follows a defined JSON schema, eliminating the risk of missing fields, incorrect formats, or unpredictable values.

This means no more dealing with malformed responses—every output is precisely structured and ready for use, making o3-mini a reliable choice for applications that require well-formatted data.

Python 3.10.4
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class TravelItinerary(BaseModel):
destination: str
departure_date: str
return_date: str
travelers: list[str]
completion = client.beta.chat.completions.parse(
model="o3-mini-2025-01-31",
messages=[
{"role": "system", "content": "Extract the travel itinerary details from the user input."},
{"role": "user", "content": "John and I are flying to Paris next Monday and coming back on the 15th."},
],
response_format=TravelItinerary,
)
itinerary = completion.choices[0].message.parsed
print(itinerary)

Developer messages#

Developer messages in o3-mini allow us to define overarching instructions that guide the model’s behavior across interactions, ensuring consistency in responses.

Unlike user messages, which prompt the model for a specific output, developer messages act as system-level directives that the model follows before processing user inputs. This makes them ideal for setting response tone, personality, or formatting rules.

Python 3.5
response = client.chat.completions.create(
model= "o3-mini-2025-01-31",
messages= [
{
"role": "developer",
"content": [
{
"type": "text",
"text": "You are an AI assistant that responds in a highly formal and professional manner. Use precise language and structured explanations. Avoid contractions and informal phrasing."
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is the importance of data structures in programming?"
}
]
}
],
store= True,
);
print (response.choices[0].message.content)

Customizable reasoning effort#

The o3-mini allows us to choose between three reasoning effort levels—low, medium, and high—enabling developers to balance speed and accuracy based on their project needs.

  • Higher reasoning effort leads to slightly slower responses but enhances accuracy and efficiency, making it ideal for complex problem-solving.

  • Lower reasoning effort delivers faster responses with reduced computational load, which is useful for real-time applications where speed is a priority.

Trade-offs between various reasoning levels of o3-mini
Trade-offs between various reasoning levels of o3-mini

In ChatGPT, o3-mini defaults to medium reasoning effort, striking the right balance between performance and efficiency.

Does o3-mini really beat DeepSeek-R1?#

Is OpenAI’s o3-mini a comeback in the AI race against DeepSeek-R1?

Let’s compare their core architectures and performance.

Transformer vs. MoE architectures#

o3-mini is built on the transformer architecture, which uses the full model’s parameters to process every token. This architecture ensures robust and consistent performance across various reasoning tasks, making it a reliable choice for developers. However, it can become resource-intensive, especially when handling large-scale workloads.

On the contrary, DeepSeek-R1 adopts a mixture of experts (MoE) architecture, which, instead of engaging all model parameters at once, selectively activates some experts per token for processing. Out of 671 billion parameters, only 37 billion are activated at a time for processing. This makes DeepSeek highly scalable and computationally efficient, outperforming o3-mini in scenarios that require extensive reasoning without consuming excessive resources.

Benchmark performance#

Regarding benchmark performance, OpenAI o3-mini (high) and DeepSeek-R1 compete head-to-head across various STEM reasoning tasks, with both models excelling uniquely.

Codeforces score comparison of OpenAI o1, o3-mini, and DeepSeek-R1
Codeforces score comparison of OpenAI o1, o3-mini, and DeepSeek-R1

The competitive programming domain is where o3-mini (high) shines, achieving an Elo score of 2130, outperforming DeepSeek-R1’s 2029 and OpenAI o1’s 1891. This solidifies the o3-mini as the go-to model for developers and coders aiming for peak performance.

5 benchmark comparisons#

Let’s get into detail comparing Open AI o1 and o3-mini with DeepSeek-R1.

Benchmark performance comparison of OpenAI o1, o3-mini, and DeepSeek-R1
Benchmark performance comparison of OpenAI o1, o3-mini, and DeepSeek-R1

Here's a breakdown of their performance across 5 major benchmarks:

1. Competition Math (AIME 2024)#

  • OpenAI o3-mini (high) takes the lead with a score of 87.3, surpassing OpenAI o1 (83.3) and even DeepSeek-R1 (79.8).

  • This demonstrates o3-mini’s expertise in handling advanced mathematical problems accurately and efficiently.

2. Software Engineering (SWE-bench Verified)#

The competition gets tighter in tasks requiring software engineering expertise.

  • o3-mini (high) edges slightly ahead at 49.3, followed closely by DeepSeek-R1 at 49.2 and OpenAI o1 at 48.9.

  • While the difference is marginal, this highlights o3-mini’s capability to deliver robust solutions in coding and algorithmic tasks.

3. Ph.D. Level Science Questions (GPQA Diamond)#

When evaluated on Ph.D. level biology, chemistry, and physics questions:

  • o3-mini (high) again leads with a score of 79.7, narrowly beating o1 at 78.

  • DeepSeek-R1 is a little behind in the race with a 71.5 score, making o3-mini a reliable option.

4. MMLU (Pass@1)#

  • OpenAI o1 excels in this category with 91.8

  • o1 is followed by DeepSeek-R1 at 90.8 and o3-mini (high) at 86.9.

While the o3-mini may not dominate here, it still delivers a solid performance that meets high expectations.

5. Math 500#

  • For advanced math tasks, o3-mini (high) and DeepSeek-R1 are neck and neck with scores of 97.9 and 97.3, respectively.

  • OpenAI o1 lags slightly at 96.4.

Across these benchmarks, o3-mini competes rather well with DeepSeek-R1.

Let’s break down how these models fare in terms of cost-efficiency.

Cost comparison with DeepSeek-R1#

Beyond performance, cost is a big decision-maker when choosing an AI model.

When it comes to API pricing, DeepSeek-R1 is more pocket-friendly as compared to o3-mini.

While o3-mini is 63% cheaper than OpenAI o1, it remains more expensive than DeepSeek-R1, making the latter a better option for cost-sensitive users.

Final verdict: o3-mini vs. DeepSeek-R1#

Benchmarks are close, but o3-mini is ahead in math, science, and software engineering, while DeepSeek-R1 wins in efficiency and affordability.

  • Go with o3-mini if you need accuracy and performance for coding/math-heavy tasks

  • Go with DeepSeek-R1 if your priorities are scalability and cost efficiency

If you want to try o3-mini for free, you can do so without a subscription by selecting the “Reason” option in ChatGPT's message composer. This is for a limited time, but will give you a nice trial run to determine if you want to subscribe for the model.

o3 and the future of AI-assisted development#

Alongside DeepSeek, o3-mini serves as yet another powerful example of how as LLMs are becoming better and better for developer use cases.

By enabling models to fetch real-time data, take actions, and generate reliable, schema-compliant responses, models like o3-mini will help streamline development, reduce errors, and enhance automation.

If you haven't started using LLMs as a developer, start today—the future of AI-driven development won't wait for you.

Here are some popular courses to help you get ahead with AI-driven development:


Written By:
Fahim ul Haq
The AI Infrastructure Blueprint: 5 Rules to Stay Online
Whether you’re building with OpenAI’s API, fine-tuning your own model, or scaling AI features in production, these strategies will help you keep services reliable under pressure.
9 mins read
Apr 9, 2025