Handling Barge-In and Interruption: Solving AI Call Problems on Australian Networks
Handling Barge-In and Interruption: Solving AI Call Problems on Australian Networks
V
Voxworks Team
·
In a natural person-to-person conversation, interruptions happen. An interjection, a clarification, a change of direction. Human conversations handle interruptions gracefully. Yet AI systems struggle.
For AI calls in Australia, this problem is compounded by network latency. We've developed solutions specifically for the Australian context. Here's how we solved the barge-in problem.
What is Barge-In?
Barge-in is the ability for the caller or the agent to interrupt the other party mid-speech and have the system immediately react to that interruption. In other words, barge-in occurs when a caller starts speaking while the AI is still talking. Common scenarios:
Clarification:
AI: "I can book you for Tuesday at—" Caller: "Wait, not Tuesday, I said Thursday"
Agreement:
AI: "So you're looking for a general consultation—" Caller: "Yes, exactly"
Redirection:
AI: "Let me tell you about our services—" Caller: "Actually, I just need to cancel my appointment"
Why It's Hard for AI
Detection challenge:
Must distinguish speech from noise
Must identify when speech starts
Must do this in milliseconds
Response challenge:
Must stop current audio immediately
Must not play queued audio
Must process the interruption
Must generate appropriate response
Latency challenge:
Audio travels over network
Detection → response has delay
AI may keep talking during this delay
The Australian Latency Factor
Australian phone networks introduce inherent delay:
Mobile networks: 50-150ms typical
Regional connections: Higher
International routing: 150-200ms additional
This delay compounds the barge-in problem:
Caller starts speaking
Audio travels to server (50-100ms)
Server detects speech (30-50ms)
Stop command sent (10ms)
Stop propagates to audio (50-100ms)
Total: 140-260ms of overlap
That's nearly a quarter-second where AI is still talking after the caller has started speaking. It sounds terrible. What Is Latency in AI Voice Calls?
Why US Solutions Fail Here
Platforms built for US networks optimise for US latency:
Our engineers have spent thousands of hours carefully calibrating our proprietary voice engine to handle barge-in gracefully, specifically tuning our systems to work within the Australian telecom environment.
Adaptive Threshold Calibration
We adjust voice activity detection based on conversation context:
When AI is asking a question:
Expect response soon
Lower threshold for detection
Faster response to any speech
When AI is providing information:
Interruption less expected
Slightly higher threshold
Avoid false positives from background noise
Audio Queue Management
The problem: Audio is generated and queued before playing. Even after detecting interruption, queued audio might play.
The solution: Intelligent queue management. When interrupt signal arrives, flush queue immediately. Stop current playback.
Graceful Recovery
After detecting interruption, AI must recover gracefully:
Acknowledgment:
AI was saying: "I can book you for Tuesday at 2pm, Wednesday at—" Caller: "Tuesday works" AI: "Tuesday at 2pm. Perfect. Let me confirm..."
Context preservation: AI remembers what it was saying when interrupted, in case caller asks to continue.
No repetition: AI doesn't restart from the beginning.
Sub-Frame Processing
Instead of processing audio in large chunks:
Traditional: Process 100ms chunks Voxworks: Process 20ms chunks with overlap
Smaller chunks = faster detection = less overlap on interruption.
The Results
Industry standard (US platform in AU):
Average overlap on interruption: 450ms
Caller repeats themselves: 35% of interruptions
Caller frustration: Common
Voxworks (Australian-optimised):
Average overlap on interruption: 180ms
Caller repeats themselves: 8% of interruptions
Natural conversation flow maintained
What Callers Experience
Before (poor barge-in handling):
Caller: "Actually—" AI: [continues talking for 400ms] Caller: "ACTUALLY, I said—" AI: [finally stops] "I'm sorry, could you repeat that?" Caller: [frustrated] "I SAID Thursday not Tuesday"
After (Voxworks barge-in):
Caller: "Actually—" AI: [stops within 200ms] AI: "Yes?" Caller: "I meant Thursday, not Tuesday" AI: "Thursday. Let me check availability..."
Edge Cases We Handle
The Agreeable Interrupter
Some callers interject agreement while AI is talking: "yes", "right", "uh huh", "exactly"
Challenge: These aren't requests to stop, they're standard affirmations and a natural part of human conversation.
Solution: Context-aware handling
If short affirmation during AI information delivery: Continue
If affirmation at natural pause: Acknowledge and continue
If extended speech: Treat as genuine interruption
Background Noise
Australian environments have their own noise profiles:
Outdoor conversations (wind, traffic)
Busy cafes
Office environments
Solution: Noise-robust VAD calibrated for Australian conditions.
Multi-Speaker Confusion
Sometimes multiple people near the phone speak:
Partner commenting in background
Kids talking
Radio or TV
Solution: Speaker diarisation (identifying different speakers). Focus on primary speaker. Don't respond to background voices.
The Bottom Line
Barge-in handling is essential for natural AI calls in Australia. Our network latency makes it harder than in other markets.