On Wednesday, Google announced its long (long, long)-awaited AI product, Gemini. We dug in to understand whether Gemini lives up to the hype, what it signals for OpenAI and Microsoft, and what you can use it for right now.
TL;DR: So far, Google Gemini is more of a catch-up to OpenAI and Microsoft with some strong marketing behind it, not a breakthrough innovation. Gemini’s biggest advantage will be distribution through Gmail, Google Docs, etc., but our tests of that integration showed middling results so far. There’s a ton of promise here, but be careful of buying into excess hype and make sure to double-check results.
P.S. Want to learn more from me on AI? Join AI for Personal Productivity next week for 20% off using discount code AILEARNER, or become a member for 25% off using code AIMEMBER.
What exactly was released this week?
Google announced a new multimodal model that comes in three different sizes: Gemini Ultra, Gemini Pro, and Gemini Nano.
That doesn’t mean you can use it all right now, though. Availability is expected to roll out later this year and into 2024, so the first glimpse we have into the model’s capabilities is through Google Bard (a chatbot roughly equivalent to ChatGPT, Claude, or Poe).
As of December 6, Bard has been updated to use the Gemini Pro model and seems to be available to most personal users.
What will be released later this year / in 2024?
For developers, Gemini Pro will be available through the Gemini API starting from December 13, 2023.
Gemini Ultra, the most capable model, is set to be released in early 2024 after its current phase of testing. This model was designed for complex tasks and is set to take on OpenAI’s GPT-4. Gemini Nano, the model designed for specific tasks and mobile devices, will be integrated into the Google Pixel 8.
Our highly opinionated take on Gemini so far
Gemini is a big step forward for Google (and a big deal for its competitors), but not a game changer for AI.
Google is clearly playing catch-up in an AI race dominated by giants like OpenAI and Microsoft.
This is a defensive move, not an innovative breakthrough – though that could change if Gemini Ultra exceeds GPT-4 capabilities in the real world.
Google has two massive moats: access to data and their ecosystem of Gmail, Sheets, Docs, Calendar, and Meets. Google’s integrations with these products could pose a threat to Microsoft’s Copilot integration with Windows and the larger partnership between OpenAI and Microsoft. We’ll go into detail on the state of the integrations below.
Google’s early demos are impressive – but is it actual technology or great marketing?
Google released a pretty impressive (and much-lauded) Gemini demo on YouTube and its developer blog.
The blog gives a glimpse at Gemini’s full multimodal capabilities and advanced reasoning across different inputs like text, image, video, and audio. Check out this video at 4:28, where a user presents three pictures of the sun, Saturn, and the Earth, and asks, “Is this the right order?”
Gemini responds, “No – the correct order is Sun, Earth, Saturn.”
Pretty impressive – but this is a marketing demo, and it’s been edited slightly to make Gemini appear more powerful than it is. In the accompanying blog post, the prompt is actually: “Is this the right order? Consider the distance from the sun and explain your reasoning.”
There’s nothing wrong with great prompting (every LLM needs it), but it’s important to parse hype from reality. When we tested Gemini’s integration with Google Suite below, it often failed without highly specific prompts.
Google’s massive distribution advantage (e.g., Google Docs, Gmail, etc.) could outshine the actual tech .... except the integration is spotty right now.
Google’s integrations with Docs, Sheets, Gmail, and search will be how it competes with Microsoft. But does it actually work? We tested the integration with Gmail and Google Drive, and found the actual results are middling.
We saw lots of hallucinations, errors, and inability to access certain files. The use cases in Google Suite are endless, and we assume they’ll figure it out soon – but for now, be aware of limitations and fact-check the results.
7 tests on Gemini-backed Bard’s abilities
Can Bard tell me …
- How much I’ve spent on online shopping over the past 30 days?
- How much I’ve spent on daycare over the last three years?
- The top five people who send me emails and vice versa?
- What was on the agenda of a meeting earlier this week?
- How many people I invited to my wedding?
Test: Can Bard tell me how much I’ve spent in online shopping over the past 30 days?
I asked Bard to review all my online shopping order confirmations from the last 30 days and tell me how much I’ve spent.
Grade: A. The data was accurate, and Bard provided it really quickly. The most valuable part: It was able to provide a list of every order confirmation and add up the totals, which would have taken me a while and involved lots of different Boolean searches.
Test: Can Bard tell me how much I’ve paid to daycare in the last three years?
I asked Bard to calculate how much I’ve paid to my daughter’s daycare, via invoices from [daycare name], over the past three years.
Grade: C. While it was able to find the right emails, it only calculated invoices from July 2023 onwards, even though the same invoices are available in email from 2021 onwards.
Test: Can Bard tell me the top five people that sent me emails and vice versa?
I asked Bard to tell me the top five people who send me emails and vice versa.
Grade: C. Lisa has only emailed me once, so that’s certainly a hallucination. The others seem directionally correct, but the error with Lisa makes me uncertain of the results.
Test: Can Bard tell me what was on the agenda of a meeting earlier this week?
I asked Bard to look at the agenda for my neighborhood association meeting and summarize the content.
Grade: F It wasn’t able to do so (possibly because the agenda was attached in a Word doc / PDF). Bard also seemed to hallucinate, and reported information that was not in the email.
Test: Can Bard add up the number of people invited to my wedding by looking at a spreadsheet?
I asked Bard to find a known spreadsheet – my wedding invite list – in Google Drive and tell me how many people were on it.
Grade: F. It couldn’t find the spreadsheet, presumably because Bard doesn’t yet integrate with Google Sheets.
Conclusions
Our conclusion: There’s a ton of promise here. Imagine a world where Gemini functions as your personal assistant/therapist, with a perfect memory of your email and Google Drive over the last 10 years, and the ability to parse sentiment and insights from all that rich data. I’d love the ability to have that “mind” at my fingertips.
So don’t bounce off this tech just because it’s not perfect now. Keep trying it for different use cases, and keep an eye out for updates.
But be aware that in terms of Google’s business strategy, this is less about “the AI tech that will change the world” and more about the power of Google’s data and distribution. Right now, this tech seems only marginally better than OpenAI’s – but that may not matter if Gemini exists in every tool you use for professional and personal organization.
P.S. Want to learn more from me on AI? Join AI for Personal Productivity next week for 20% off using discount code AILEARNER, or become a member for 25% off using code AIMEMBER.