Jan 1, 2026

Our Favorite LLMs of 2025 (and Why Feel Still Matters)

2025 was an extraordinary year for generative AI. Large language models didn’t just get faster or bigger - they became more distinct, more opinionated, and more revealing in their strengths and limitations.

At the same time, this was the year we also became more cautious. Enterprise AI adoption accelerated rapidly, often faster than organizations were prepared for. We saw firsthand how naïve implementations can introduce risk, erode trust, or create confident answers without accountability. Done right, AI is transformative. Done carelessly, it is dangerous.

This tension shaped everything we built this year.

We learned that models are not interchangeable. Each one has a personality, quirks, and ideal use cases. Understanding those nuances became critical to delivering high-quality, accurate, and trustworthy systems. It also drove heavy R&D across our stack as we tested models in real enterprise conditions - not just benchmarks.

While our core focus remains generative AI for the enterprise, we also expanded into highly tailored personas and AI personalities for social media managers, individuals, corporations, defense analysis, and political candidates. That work reinforced a key insight: performance alone does not determine success. How a model communicates matters just as much.

Below are the models that defined 2025 for us.

OpenAI GPT-4o: The Spark That Started It All

We loved the writing style of GPT-4o. Its conversational, natural tone felt different from anything that came before it. This was the model that truly inspired me to start building an enterprise-focused generative AI company around deeply tailored personas.

GPT-4o showed what was possible. It had charm. It felt human. But it was also hard-wired to a specific personality.

That limitation became the genesis of our AI Persona Architect product line. We wanted to capture that warmth and natural voice, but make it configurable, multi-layered, and precise enough to represent people, companies, brands, and worldviews. GPT-4o didn’t just perform well—it changed how we thought about what AI should feel like.

No model since has quite replaced that original charm.

GPT-4.1 and GPT-4.1-mini: The RAG Workhorses

GPT-4.1 and especially GPT-4.1-mini quietly powered much of our production work in 2025.

GPT-4.1-mini was, without question, our most used model of the year. It delivered fast performance with a surprisingly strong level of reasoning for Retrieval-Augmented Generation. We almost always led demos and rapid pilots with Mini. It punched far above its weight class.

For production, we typically graduated clients to GPT-4.1 for improved accuracy and reduced hallucinations. Still, if I had to pick a personal favorite for 2025, Mini would probably be it. It wasn’t perfect, but it was dependable, efficient, and incredibly practical.

GPT-5.x: Powerful, Complicated, and Still Evolving

GPT-5.x produced the most mixed reactions across our team.

Developers largely avoided early versions, while business users leaned heavily into them. GPT-5.0 was simply not viable for RAG in our environment, so we stayed on 4.1. GPT-5.1 and 5.2 showed real improvement, and 5.2 is now slowly taking over some roles previously held by 4.1.

We are still in transition.

I do miss the simplicity of temperature control from the 4.x line. The new verbosity and reasoning controls in 5.x are powerful, but they require more experimentation and discipline. That said, GPT-5.2 has become the primary engine for our Persona Architect products, and GPT-5.2 Chat Latest now anchors our Social Media Assist workflows.

Even so, the newer models feel more robotic and sterile. The charm is gone. Our personas are what put the humanity back in.

Google Gemini 2.5 and 3: A Developer’s Game Changer

Gemini 2.5, released early in 2025, was a literal game changer for me personally. It dramatically accelerated rapid prototyping, coding, and experimentation. With Gemini 3, that momentum only increased.

Today, Gemini is my preferred developer assistant. For RAG, GPT still wins. But in some recent Defense Analyst personas we built for intelligence analysis, Gemini 3 absolutely knocked it out of the park. We fully expect Gemini to take on an expanded role in our advanced persona work going forward.

Anthropic Claude, Sora, and the Reality of Mixed Stacks

My development team overwhelmingly prefers Claude, though they routinely mix models depending on the task.

One quote from our founding engineer sums it up perfectly:

“I like Claude Opus for coding assistance and deep research, ChatGPT 5.2 Pro for ideation and creative tasks, and Gemini 3 Pro Image (NanoBananaPro) for detailed infographics and realistic photos. When I want more animated or sci-fi media, I use Sora.”

That stack mentality defines 2025. No single model wins everywhere.

The Bigger Lesson from 2025

2025 was a breakthrough year for LLM adoption. But the biggest lesson was not about benchmarks or leaderboards.

Different models excel in different roles. Favorites emerge for emotional reasons as much as technical ones. In generative AI, how a model makes you feel still matters. Sometimes it matters more than raw performance.

That is why personas, context, and governance are not optional in the enterprise. Models will continue to evolve. Charm will come and go. But trust, explainability, and alignment are what make AI sustainable.

If you’re looking to adopt generative AI in a safe, compliant, and persona-driven way, I’m always happy to share how we approach this at CompanyInsights.AI. You can connect with me directly (David Norris) for a free consultation—or even book a same-day demo.

What were your favorite models of 2025? And which ones do you see yourself relying on in 2026?