Latency is the delay between when you stop speaking and when the AI starts responding. In an AI voice call, this pause determines whether the conversation feels natural or painfully awkward. Under 500ms feels instant. Over 1200ms? People hang up.
Why Latency Actually Matters
Here's the thing about human conversation: we're wired for specific timing.
When someone finishes talking, we expect a response within 200-500 milliseconds. That's not a preference. It's biology. Longer pauses trigger frustration, confusion, even distrust.
In AI voice call systems, the numbers break down like this:
- Under 500ms: Feels instant and natural
- 500-800ms: Noticeable but tolerable
- 800-1200ms: Clearly delayed and frustrating
- Over 1200ms: Conversation falls apart
Miss the timing window, and you've lost the caller. They don't know why the conversation feels "off." They just know it does.
What Actually Creates Latency
Total delay in an AI voice call isn't one thing. It's a chain of processes, each adding milliseconds.
Network Travel Time (10-200ms)
Audio has to get from the caller to the server and back.
- Australian caller to Australian server: 10-30ms
- Australian caller to US server: 150-200ms
Geography matters. A lot. Latency Down Under: Why Local Hosting Matters for AI Voice
Voice Activity Detection (30-100ms)
The system needs to figure out you've stopped talking. Too fast, and it cuts you off. Too slow, and the pause stretches.
What Is VAD (Voice Activity Detection)?
Speech-to-Text (100-400ms)
Your voice becomes text the AI can understand.
AI Processing (200-1500ms)
The language model generates a response. This is the wildcard. Simple answers come fast. Complex ones take time.
Text-to-Speech (100-300ms)
The AI's text becomes audio you can hear.
Network Return (10-200ms)
Audio travels back to your phone.
Add it all up: 450-2700ms total range.
That's the difference between "seamless" and "painful."
The Australian Problem
If you're running an AI voice call system for Australian customers from US servers, you're starting with a handicap.
Using US-hosted AI:
