OpenAI o3 and o4-mini-First-ever AI Models That “Think with Images”

OpenAI released its o3 and o4-mini models on April 16th, 2025. They are the latest iteration in the o-series that are trained to think for a long time before generating a response. As per OpenAI, these are the most intelligent AI models OpenAI has released till date. The models mark a major advancement in ChatGPT capabilities, relevant for both casual users and professional researchers. The company has revealed that o3 and o4 models will have “multimodal understanding.” This will help the models even process images in their chain-of-thought process. o3 is the most sophisticated model till date while o4-mini gives value for its cost and size.

The fact that these models can think with images means that models can understand user’s diagrams and sketches even if they are very low in quality. In the o3 model, users can upload images like sketches and whiteboards. The advanced AI algorithm can not only interpret the images but also offer opinions and insights. Not only that, but the model can also be leveraged for zooming, rotating, and editing images. Now, let us go further in this blog and understand the various capabilities of these new models.

What’s New in o3 and o4-mini AI Models?

o3 is an extremely advanced reasoning model that extends its capabilities across various application areas like science, coding, math, and visual perception. You can use the model to resolve your complex queries that do not have obvious answers and require deeper analysis. Most notably, it performs effectively at visual processes involving charts, images, and graphics. Reportedly, it makes 20% fewer errors than o1 model on tricky tasks involving real-world use-cases. It is effective in tasks like business/consulting, programming, and creative/ideation.

Initial testers found the model exceptionally good as a thought partner and highlighted its ability to create and critically assess novel hypotheses. It can be an asset in core domains like math, biology, and engineering. Though o4-mini is comparatively smaller model, it is also a fast and cost-effective AI model. Considering the high-efficiency of o4-mini, its usage limits are higher. o4-mini is a better performing model than o3-mini in non-STEM tasks and domains like data science.

What Do the Experts Say?

External experts have rated both these models highly. They have highlighted its improved capabilities to understand and follow instructions. The responses of these two models are more verifiable courtesy of their enhanced intelligence and inclusion of web sources. Compared to preceding iterations, the model also feels more conversational and natural. Most notably, the models reference previous conversations to make responses more relevant and personalized.

Relying on Reinforcement Training

The same trend of reinforcement learning, which was observed in GPT-series pretraining was observed during the development of O3 model. The trend of more computation=more performance essentially implies that the more these models are allowed to think, the better they become. Seeing this trend, Open AI increased the magnitude in both training compute and inference-time training. This accelerated performance, validating the value of reinforcement training. At the same latency and cost with o1, o3 gave better performance and thinking abilities.

Moreover, the company also trained these models to leverage platforms through reinforcement learning. The models became capable not only of just utilizing the tools but also figuring out when to use them. The capability of the models to employ tools as per the expected outcomes makes the models effective in open-ended situations. Particularly, these models are valuable in multi-step workflows and visual reasoning. This enhancement is visible both in terms of real-world tasks and academic benchmarks.

Toward Utilization of Agentic Tools

What is remarkable about these models is the fact that they have full accessibility to all the tools within ChatGPT. It can also gain accessibility to custom platforms through function calling in API. The models are trained to think deeply about questions that do not have obvious answers. Moreover, they can make decisions on when and how to utilize the tools to generate correct and in-depth answers in less time.

For example, suppose a user searches the query “Will crime rate increase or decrease in New York compared to last year?”. The models can not only perform internet searches to find relevant data. It can also create a relevant image or graph to help users understand the statistics. Moreover, it can also create a Python code to generate a prediction, combining several tools. Leveraging this strategic and flexible approach, the models become much more capable of handling tasks that need the latest information.

Thus, o3 and o4-mini go far beyond the in-built knowledge, synthesis, extended reasoning, and output creation across distinct modalities.

Accessibility of o3 and o4-mini

Open AI is planning to completely phase out o1, o3-mini, and o3-mini-high. These models will be replaced with o3 and o4-mini. All users of ChatGPT Plus, Pro, and Team will see the option of o3 and o4-mini high in the model selector options. Users with Edu and Enterprise plan will gain accessibility to the platform soon. The free users can use “Reason” option available in ChatGPT to utilize the capabilities of o4-mini model. Developers can also access the AI models today via the Responses API and Chat Completions API.

Conclusion

Ever since the release of ChatGPT model in 2022, Open has taken industries with storm. Now, with consistent upgrades, the company has expanded the potential of its AI models beyond voice, text, and videos. Now, with the launch of o3 and o4-mini models, Open AI is expected to regain edge in the market where it is facing tough competition from the likes Anthropic, Google, and xAI. The blog explores these models in detail and discusses their key capabilities and “thinking” prowess.