March 14, 2025

We tested two Deep Research tools. One was unusable.

hero image for blog post

As Chief of Staff to our COO, part of my job is helping prepare for our quarterly board meetings, which often means time-consuming research and analysis.

So when Gemini and OpenAI released Deep Research features, I was excited to put them to the test – specifically to conduct competitive analysis on our newest product, an AI-powered coach called ProfAI.

One did a pretty good job – the other gave me research that was borderline unusable. So here’s my honest review about how they did, what it’s like to use them, and what I’ll be using going forward.

Some quick context

Google launched Gemini 1.5 Pro, featuring the Deep Research capability, in December 2024. OpenAI launched their Deep Research feature in February 2025, as an AI agent integrated into ChatGPT’s o3 model.

They both conduct multi-step research to generate detailed reports – but ChatGPT's Deep Research feature does multimodal analysis (including text, images, and PDFs) while Gemini's Deep Research feature only does text-based research and synthesis.

The other big difference in their performance is that OpenAI's Deep Research feature adjusts its research path in real time, while Google's Gemini Deep Research feature follows a structured research plan that users can review and modify before execution. This means Gemini offers more control, but allows for less nuance.

How they performed

To grade how each tool performed on a competitive analysis research task, I gave them the same prompt. I included:

  • The persona I wanted the AI to play (a Section board member), since this affects the lens of the research
  • The specific information I wanted it to bring forth, based on what a board member would care about (description of offering, pricing, traction etc.)
  • The format I wanted the output in, knowing that it would need to be easily digestible for a time-crunched board member

My hope was that these boundaries would keep the output relevant, usable, and akin to what an experienced researcher would provide in the same scenario:  

How OpenAI’s Deep Research performed

  • Tool usability: 4/5
  • Output quality: 4.5/5

High level assessment:

ChatGPT’s research went beyond the big edtech players and included more novel companies like Skillsoft and Datacamp – more accurate competitive solutions to our AI-powered coach. The research focused not only on AI training, but also training powered by AI – a key factor at the crux of Section’s offering.

Overall, it understood what I was asking for and provided more valuable information than Gemini on the first prompt.

Research quality:

✅ It gave me exactly what I asked for – competitors that I may not have otherwise known about or thought of – and shared high value information about how their services compare to ours.

✅ Due to a high-quality initial output, I was able to have a more meaningful conversation with the AI about its findings rather than spend time course correcting.

❌ ChatGPT’s feature noted a lower volume of sources checked in its report than Gemini.

User experience:

✅ Gemini asked clarifying questions before it dove into research to make sure it understood what I was looking for.

❌ Unlike Gemini’s feature, ChatGPT’s didn't present a full plan for me to approve before it began conducting its research.

❌ Every time I wanted to prompt Deep Research, I had to re-select the “Deep Research” mode. This is a minor detail, but it was confusing at first because I had to explicitly select this mode every time I responded to its output.

Presentation:

✅ Gemini adjusted its responses to feedback well and organized information into requested formats, like tables and bulleted lists.

❌ It requires pretty clear prompting around desired output. Otherwise you get a very long wall of text that’s hard to quickly parse for value.

Gemini’s Deep Research

  • Tool usability: 5/5
  • Output quality: 2.5/5

High level assessment:

Gemini’s output focused on edtech at a very high level, citing companies like Udemy, Coursera, and LinkedIn Learning – which didn’t help me in my search for competitors I wasn’t aware of. The research also focused heavily on courses and how these catalogs are changing over time, and included course-based training on AI and ML rather than AI competency programs.

Overall, its research was missing a significant amount of depth – which could come down to the sources it has access to or its ability to think critically about the prompt.

Research quality:

❌ The content quality of Gemini’s output was much weaker than ChatGPT’s because it was too high level to be immediately usable. It provided the kind of information I could pull from a quick Google search.

❌ A lot of what it said lacked deeper meaning, such as “Edtech is being transformed by AI and here are the players…". It also disregarded parts of the prompt I specifically asked for, like significant partnerships, market differentiation, and value propositions.

✅ Gemini noted that it looked at a higher volume of sources in its report than ChatGPT cited.

User experience:

✅ Gemini created a more collaborative experience, presenting a research plan before it dove in and asking for any edits to its plan before starting on its research.

✅ Once you started working in the Deep Research mode, it stayed in that mode, which made it easier to work with.

Presentation:

✅ The formatting of the output was easier to parse than ChatGPT’s response.

✅ Perhaps the most valuable part of researching with Gemini: Its output can quickly be exported to a Google Doc, with all formatting in tact, for sharing and editing.

The verdict

Even though the user experience in Gemini was better and more organized, the win goes to ChatGPT.

Gemini’s lower quality research made the output unusable. And even though I preferred Gemini’s pre-research planning process and easily exportable outputs, ChatGPT’s output was good enough to overlook some of its annoying quirks.

Ultimately, even though Gemini claims to reference more sources, it seems like ChatGPT’s multimodal analysis yields better results. Ironic for an LLM run by a search engine!

So unless ease of use is more important to you in a project than high quality information, I don’t see an instance in which you would choose Gemini’s Deep Research feature over ChatGPT’s.

How to get the most out of Deep Research

Don’t mistake an LLM’s power as an opportunity to slack off. You can’t give it a lukewarm prompt and expect it to intuit what you want. Here are my tips for getting max leverage from the Deep Research:

  1. Specify the format you want to view the output in. Something like “provide this in a scannable table with XYZ columns, ABC bullet points, and X amount of paragraphs”. This helps you avoid getting a dense chunk of 10k words (and you will).

  2. Don’t skimp on the clarifying questions. The more thoughtful your answers to these, the better the output it will provide. You don’t need a wall-of-text answer per question, but address each question thoughtfully to help guide the research output.

  3. Make sure your goal is clear in the initial prompt. This will help ensure that you and the tool are aligned on the purpose of the research. So instead of saying “run a competitive analysis for an Edtech product”, specify intent: “Run a competitive analysis for an Edtech product in order to understand what offerings currently exist at what price point and what the value proposition of each offering is”.
Greg Shove
Ana Silva