As Chief of Staff to Section’s COO, I use multiple LLMs as my thought partner every week.
In particular, every quarter I use AI as a thought partner to help Greg and Taylor (our CEO and COO) prepare for the board meeting. I upload a draft of the board deck and ask 4 LLMs for feedback: ChatGPT, Claude, Google Gemini, and Microsoft Copilot.
We’ve been doing this since August 2023, and use AI’s feedback to anticipate board member questions, pressure test our point of view, and improve the quality of our conversation with board members. I also take notes at the board meetings and grade each LLM on its ability to help us prepare, based on the overlap between what the LLM highlighted and the feedback our board members actually gave us.
After 12+ months of using AI in this way, here’s how each LLM stacks up, including our favorites, and which ones are most improved.
How I use AI as a board-level thought partner
First, I always use the paid versions of ChatGPT-4o, Claude 3.5 Sonnet, Gemini Advanced, and Microsoft Copilot. Not only do these models give you more advanced and nuanced responses – you also sometimes need the paid versions of these tools to upload reference materials, which is critical.
First, I upload the board deck and ask each LLM three prompts:
- Prompt 1: Solely based on the board deck provided, what feedback and questions would you have if you were a conservative, risk averse board member looking at the deck? Be specific and point to examples where possible. Do not use information from outside of the deck provided.
- Prompt 2: Solely based on the deck provided, what feedback and questions would you have if you were an aggressive, growth-oriented board member looking at the deck? Be specific and point to examples where possible. Do not use information from outside of the deck provided.
- Prompt 3: After reviewing the deck, what are your top 5 concerns as a board member? Be specific and point to examples where possible. Do not use information from outside of the deck provided.
A few things to call out:
- I frequently remind AI not to reference anything but the board deck uploaded – this is my attempt to reduce hallucinations by the LLM
- I give AI a few board member personas to play – aggressive/growth-oriented and conservative/risk averse; AI is great at channeling different personas, which helps us prepare for different scenarios and board members
- I continually ask AI to be specific – this helps reduce vague/generic responses that could apply to any company
How the top LLMs rank as thought partners
I rate each LLM based on two criteria: its overlap with how our human board responds in the board meeting and the novelness/level of insights we get from AI (a more subjective assessment). Here are the results.
1st Place: Claude 3.5 Sonnet
My score: A- in value of feedback. 85% overlap with human board responses.
What it did well:
- Third-order thinking – Claude plays the aggressive vs. conservative business personas well and demonstrates great business sense when thinking about the cascading downstream effects of a decision.
- Specificity – Out of all the LLMs, Claude referenced the most stats from the board deck and compared them against specific industry standards (e.g. cited strong “renewal rates (85%) and NDR (98%)”)
Where it fell short:
- Claude’s outputs were relevant, but it grasped the prompt largely at a surface level. It didn’t seek to understand any underlying assumptions and take the conversation one step further – which is something our human board will always do. Having Claude prompt me back on the content I shared would have been a better approximation of the conversation I was preparing for.
2nd place: Microsoft Copilot
My score: B+ in value of feedback. 83% overlap with human board responses.
What it did well:
- Third-order thinking – Like Claude, Copilot excelled in considering the ripple effects of decisions; it also asked the most thought-provoking questions of any LLM.
- Collaboration – Because Copilot has an inquisitive rather than prescriptive nature, it was much better at asking me questions to understand the underlying assumptions of action items in the deck. It was the only LLM to do so.
- Expanded on the discussion – Copilot dug into specific topic areas and asked follow up questions that sparked further discussion, such as:some text
- “How can we leverage the strong enterprise renewals and upsells to drive more aggressive growth in new consumer sales? Are there opportunities to cross-sell or upsell to existing enterprise customers?
- “How can we accelerate the transition to AI avatar videos for monthly updates and customization to stay ahead of competitors? Are there additional resources or partnerships that can help speed up this process?”
Where it fell short:
- Copilot almost leaned too heavily on the questioning. It would’ve been helpful to get some advice to react to upfront as well.
Note: Microsoft Copilot is also the most improved LLM – when we used it back in April 2024, it had much lower overlap (72%) with our human board than other LLMs.
Third place: ChatGPT-4o
My score: B in value of feedback. 81% overlap with human board responses.
What it did well:
- Professional tone – ChatGPT-4 was the most eloquent tool in its responses. It had a good grasp of key terminology.
- Thorough – It asked thought-provoking questions, provided pushback on some of the content in the deck, and offered some tailored suggestions based on our data.
Where it fell short:
- Some of the advice that ChatGPT gave was fairly generic and unhelpful. In trying to sound professional, it sounded almost like an assistant that was trying too hard to impress. For example:
- “Product Development: Fast-track the development of ProfAI 2.0 and avatar-based videos. Allocate additional resources to speed up innovation, ensuring that the core AI curriculum is fully transitioned by mid-2025.”
- “Product Development: Fast-track the development of ProfAI 2.0 and avatar-based videos. Allocate additional resources to speed up innovation, ensuring that the core AI curriculum is fully transitioned by mid-2025.”
- Like Claude, ChatGPT-4 grasped the prompt at a basic level, but unlike Claude, it was less focused on the quantitative stats or performance, and more focused on concepts and products. It didn’t dig deeper into any underlying assumptions of deck content.
Last place: Gemini Advanced
My score: C in value of feedback. 47% overlap with human board responses.
What it did well:
- Conciseness – Gemini summarized the key points of the board deck in a human-readable way.
Where it fell short:
- Its responses were by far the most rudimentary. It mostly restated facts without providing recommendations.
- There was little nuance in its responses. It typically just restated information in the same format, such as:
- “The company has a strong foundation in AI education.”
- “The company has a growing brand presence in the AI space."
- “The company has demonstrated the potential for significant revenue generation through B2B collaborations.”
Getting the most out of using AI as a thought partner
Claude has been our go-to for thought partnership over the last year or so, and it’s solidified its place at the top for another quarter. Gemini, on the other hand, hasn’t provided very much ROI on the paid plan that’s required for uploading files so I likely won’t use it again unless Google releases significant updates.
The jury is still out on ChatGPT o1 – our team has been impressed in testing o1 vs. Claude in other scenarios, but o1’s main limitation right now is the inability to upload files. Until this changes, we can’t upload our board deck to o1, so the feedback we can get is limited.
AI doesn’t replace our human board – but preparing for board meetings this way improves the final deck we share and the quality of the conversation with our stakeholders. AI has already prepped us for the more obvious questions or pushback we might get, so we can move on to second and third order implications in the board meeting.
If you’re looking to use AI similarly to prepare for high stakes moments, here’s my advice:
- Define the personas you want an LLM to take. Reflect on the qualities, tendencies, and habits of the people you’re presenting to and ask the LLM for feedback through their lens. For example: Instead of saying “I want feedback from the perspective of a savvy businessman”, say “My boss has a short attention span, is very data-oriented, and tends to poke holes in my ideas”. This will prepare you for more meaningful conversations when you do get face time with stakeholders.
- Track the prompts you use. Once you’ve crafted a prompt that gets you good results, document it so you can quickly go back and forth between LLMs and have it handy for every time you need to roleplay a scenario.
- Have AI run a self-assessment. Feed an LLM the outputs of other LLMs based on the same prompt and ask it to identify the strengths and differences of the various approaches. This can help you refine your prompt and help the LLM create a memory to reference when you ask it to roleplay in the future.