Beyond Chat: How Voice AI Changes User Expectations

Matt Gvazdinskas
November 11, 2025

The buzz around Agentforce Voice at Dreamforce 2025 made one thing clear: the enterprise AI conversation is moving beyond text. To date, chatbots and text-based agents have redefined how businesses scale service and engagement. But voice introduces something fundamentally different. It’s not another interface. It’s another expectation.

Voice interactions tap into something deeply human. They demand fluidity, empathy, and instant responsiveness in ways chat never had to. And that means enterprises stepping into voice AI are entering a new era of design and performance considerations.

The Human Expectation Gap

Voice is harder because people expect conversation, not computation. With chat, a short delay feels like thoughtfulness; in voice, that same delay feels like disconnection. There’s no spinner, no “typing” indicator – only silence. And in voice, silence sounds like failure.

Real conversations also aren’t linear. People pause mid-thought, interrupt themselves, or change direction. They expect to be heard and to be understood, without having to repeat themselves. For AI, this introduces a complexity that goes beyond words: managing timing, emotion, and flow.

Tone adds another layer. The same sentence can sound helpful or dismissive depending on inflection. Accuracy alone doesn’t build trust; delivery does. Enterprises that move into voice need to think less like system designers and more like conversation architects.

What We’ve Learned from Early Experiments

To understand the gap between chat and voice, we ran controlled tests using generally available platforms like Lindy.ai to see how these systems behave in the wild. What we found wasn’t surprising – but it was revealing.

Voice agents that seemed accurate on paper still stumbled over natural pacing. Even a two-second processing delay felt awkward. A simple “let me check that for you…” changed how users perceived intelligence and patience. Micro-details – pauses, fillers, and confirmations helped shape the entire experience.

These lessons are now guiding our next phase: a new Agentforce Voice proof of concept focused on inbound support, launching before Thanksgiving. It’s not a product announcement; it’s a proving ground. Because understanding where voice succeeds, and where it doesn’t, takes experimentation, not theory.

Designing for Patience, Empathy, and Trust

“The shift to voice isn’t just technical – it’s human. Designing for empathy, tone, and rhythm will require cross-functional collaboration between technologists and people leaders. That’s where the real transformation happens.”

–Kristin Langlois, Chief People Officer at 10K

Success in voice AI isn’t about perfect accuracy. It’s about believable imperfection: how the system handles uncertainty, how it recovers, and how it makes users feel heard.

That means designing moments of patience into every exchange. Short verbal cues, natural fillers, and confirmation loops help bridge the latency gap. A voice agent that says, “Let me take a second to find that,” feels more human than one that goes silent. The goal isn’t to eliminate delay, but to fill it with empathy.

For example, during our Lindy.ai POC, users reported feeling more comfortable when the system acknowledged the pause verbally, even when the response took longer. It wasn’t faster – just more human.

Voice is also where brand personality becomes tangible. Every pause, tone, and transition contributes to how a customer experiences the organization itself. For enterprises, that’s both opportunity and responsibility.

From Scripts to Skills

Most early voice agents will start simple with scripted, repeatable flows like password resets, order updates, or appointment confirmations. But the real power emerges when voice agents evolve from handling basic tasks to collaborating with humans and other agents.

In the future, a single AI agent may act as the quarterback of a conversation, orchestrating actions across multiple sub-agents and systems. Instead of “hold, transfer; hold, transfer,” a single intelligent voice will coordinate the entire resolution process behind the scenes while maintaining one seamless human interaction.

That shift, from conversation to capability, is what will define enterprise success in the next phase of voice AI. But it begins with understanding the constraints today and experimenting early.

The Road Ahead

“Voice AI represents a turning point. It’s where technology stops being a utility and starts becoming a collaborator. The companies that experiment early will define what trust and intelligence sound like in the enterprise.”

–Matt Gvazdinskas, Chief Strategy Officer at 10K

Agentforce Voice is only just becoming available, and it will take at least a year of innovation and roadmap progress before enterprises can truly shape the voice experience to reflect their brand. The key right now isn’t about crafting the perfect emotional tone – it’s about building understanding through experimentation.

Over the next year, the technology is expected to mature, with latency shrinking, control over voice characteristics expanding, and orchestration between chat and voice becoming more seamless. The key for businesses is to start testing early, learning where voice fits, and staying close to each evolution of the technology.

Every iteration brings the technology closer to feeling humanlike, but the early advantage goes to organizations that are already experimenting. Those experiments will inform the standards of what great voice experiences look and sound like when the technology fully catches up.

Categories:

Topics: