OpenAI launches new reasoning model o3 and o3 mini

1024×643.jpeg” alt=”OpenAI launches new reasoning model o3 and o3 mini” style=”width:100%;height:auto” loading=”eager” />

OpenAI has officially released its latest reasoning models, o3 and o3 mini, marking a significant step forward in the company’s push to make AI systems more thoughtful and accurate. The new models are designed to spend extra time processing complex queries before generating an answer, a method that has already shown strong results in mathematics, coding, and scientific reasoning tasks.

📖

Benchmark performance and access

According to OpenAI, the o3 model achieved a score of 87.5 percent on the ARC AGI benchmark, a test designed to measure an AI’s ability to adapt to new tasks. The o3 mini, a smaller and more efficient variant, scored 82.6 percent on the same test. Both scores represent a significant jump over the previous o1 model, which scored 61.2 percent. On the AIME 2024 math competition, o3 solved 96.7 percent of problems, while o3 mini solved 93.3 percent. The o1 model, by comparison, solved 91.3 percent.

OpenAI is making o3 mini available to ChatGPT Plus and Team users starting today, with Enterprise and Education access following next week. The standard o3 model is available through the API and to OpenAI’s tiered subscribers, including those on the Pro plan. Pricing for o3 is set at $60 per million input tokens and $240 per million output tokens. The o3 mini costs significantly less, at $4 per million input tokens and $16 per million output tokens.

Safety and transparency measures

OpenAI emphasized that safety testing was a priority before launch. The company conducted red teaming exercises with external researchers to identify potential risks, including the model’s ability to generate misleading or harmful content. OpenAI also implemented new alignment techniques that help the model refuse requests it cannot answer reliably, rather than guessing. The o3 models also include built-in transparency features that allow users to see the model’s internal reasoning steps. This is intended to make the decision making process more interpretable and reduce the chance of hidden errors.

The release of o3 comes at a time when competitors like Google DeepMind and Anthropic are also pushing toward more deliberate, reasoning focused AI systems. DeepMind recently published research on a method called Think-and-Execute, which splits reasoning into separate stages, while Anthropic has been refining its own chain of thought techniques. OpenAI’s move signals that the industry is shifting away from simply making models bigger and faster, and instead investing in architectures that think more carefully.

For developers and power users, the o3 API offers adjustable reasoning effort. You can set the model to low, medium, or high thinking effort depending on your task. A low setting might be fine for simple classification work, while high effort is better for multi step logic problems. This flexibility should help teams optimize for both cost and accuracy without having to switch between entirely different models.

Early adopters have reported that o3 mini handles coding tasks with noticeably fewer hallucinations compared to earlier models. In internal tests, it performed better at finding subtle bugs in Python and JavaScript code, and it showed improved ability to generate valid SQL queries. Researchers in fields like computational biology and theoretical physics have also started testing the model for data analysis and model simulation work.

OpenAI has not shared a timeline for a potential o3 successor, but the company has signaled that reasoning based models will be a central part of its product roadmap for the foreseeable future. The release of the o3 family gives both casual users and enterprise teams a clear path toward more reliable AI driven problem solving. As the technology matures, expect these types of models to become the default choice for any task that demands accuracy over speed. For more analysis on how reasoning models are shaping the AI landscape, check out our latest report on {$link_text}.