The next stage of LLMs is here: AI that “thinks,” or breaks down problems step-by-step like a human would.
These are the recently released reasoning models – the ChatGPT o1 family, DeepSeek-v3 r1, Gemini 2.0 Flash Thinking, Claude 3.7 Sonnet, and Grok 3. They work by applying a step-by-step reasoning process to problems, which they reconsider and revise before responding.
There are a LOT of AI models on the market right now, so I wouldn’t blame you for thinking, “Do I really need to start using a reasoning model too?” But these models are legitimately a huge step forward for how we understand and interact with AI – and a big improvement if you’ve struggled with getting reliable and thoughtful responses to date.
Here’s how we got here, and what it means for how you use AI.
The race to the smartest model
The race of the reasoning models kicked off late last year, when OpenAI launched their o1 family of models. Since then:
- DeepSeek, a relatively unknown player out of China, released an R1 model in January 2025 that matched (and arguably exceeded) o1’s capabilities using significantly fewer resources than other AI labs – and made it open source
- Google released Gemini 2.0 Flash Thinking
- xAI has released Grok 3 with a thinking mode
- Anthropic announced Claude 3.7 Sonnet this week, the first “hybrid” model which can produce instant or reasoning responses (meaning you don’t have to switch models to access thinking mode).
Okay, but how are they ‘thinking’?
These reasoning models are trained to break down problems step-by-step, using techniques like "chain of thought" reasoning.
So instead of just spitting out answers like early LLMs, you can watch these reasoning models break down problems into logical steps, weigh different options, examine their own assumptions, self-correct (this one’s big), and explain their logic along the way.
They’re usually trained on data sets where a human generates an answer to a question and gives their thought process on how they arrived at the answer.
Some models, like DeepSeek R1, even learn from their mistakes. This process is called reinforcement learning — they get better at reasoning over time by being rewarded for producing logical “thoughts”.
Here’s an example comparing ChatGPT-4o and ChatGPT o3-mini with the same question: “Give me examples of countries with the letter A in the third position of the name.”
4o gives an obviously wrong answer, and we have no idea what went wrong in the process:

Here’s how o3-mini approached the same question:

And the response:

Now, you’re probably not spending a lot of time at work asking AI about the spelling of various countries. But this kind of transparency is extremely useful in similar cases – such as when you’re asking AI to parse data.
How thinking models change how we work with AI
LLMs have historically been powerful at predicting the next likely word, but they don't explain how they arrived at an answer.
AI power users (like me) learned to work around this by adding phrases like, “Please think step-by-step” or “Let’s solve this systematically” to our prompts. Reasoning models make these workarounds unnecessary.
And the "thinking out loud" approach provides invaluable transparency when things go wrong. When AI makes a wrong turn, I can spot where its thinking was flawed (maybe it misunderstood the question or made an incorrect assumption halfway through its reasoning).
It’s also really useful to have a feedback loop on my prompting. Like it or not, we’re still dependent on good prompting for good answers. As I see how the AI interprets my prompt and works through answers, I can learn to avoid prompts that confuse it.
What this all means for you
How you work with AI will change as you adapt to a model that works differently and needs to be prompted differently. Here are the biggest impacts you’ll see in your day-to-day work with AI:
- You’ll need to think more critically about the process. Just like we learned to interpret search engine results, we’ll now need to learn to read an AI’s chain-of-thought. It’s not just about seeing the steps the model takes — it's about understanding and critiquing them. Knowing what the AI “thinks” will be just as important as the final answer.
You also won’t need this level of thinking on every single AI task you do – so you’ll need to learn the strengths of the different models and choose the one that best suits your needs ad hoc. - You’ll need new styles of prompting. With transparent thinking, the role of the human evolves from “prompt engineer” to “conversation curator.” Rather than trying to coax a black-box model with cleverly worded prompts, we’ll be guiding a more open “thought process,” shaping it in real-time.
- You’ll get a new kind of thought partner. When you watch an AI self-correct in real time, you’ll become more aware of your own logical blind spots.
And as for the reasoning race, this is what I expect to happen within the next 12 months:
- Reasoning goes international. Beyond DeepSeek, other Chinese companies (e.g., ByteDance or Alibaba) will continue closing the technological gap with American counterparts, intensifying global competition.
- Reasoning goes multi-modal. We’ll see models that seamlessly process and reason across multiple data types — text, images, and audio — unlocking applications in fields like autonomous driving or advanced medical diagnostics.
- Reasoning gets granular. More domain-specific reasoning models will emerge. This will likely come from a concerted effort by AI labs to build smaller, more efficient reasoning models that are fine-tuned for specific industries like neuroscience, medicine, law, or finance.