Turn Detection & Interruption Handling

Overview

NextEVI’s turn detection system enables natural conversation flow by intelligently detecting when users start and stop speaking. This allows for seamless interruption handling, natural pauses, and smooth conversation transitions.

Turn detection is crucial for creating natural-feeling voice conversations, preventing awkward overlaps and ensuring responsive AI interactions.

How Turn Detection Works

Voice Activity Detection (VAD)

Key Components

Voice Activity Detection: Identifies when user starts speaking
Silence Detection: Monitors for natural pause points
Speech Confidence: Validates detected speech quality
Turn Timing: Manages conversation flow timing
Interruption Handling: Manages when user interrupts AI response

Configuration

Basic Turn Detection Settings

import { useVoice } from '@nextevi/voice-react';

function VoiceChatWithTurnDetection() {
  const { connect } = useVoice();
  
  const handleConnect = async () => {
    await connect({
      auth: {
        apiKey: "oak_your_api_key",
        projectId: "your_project_id", 
        configId: "your_config_id"
      },
      sessionSettings: {
        turn_detection: {
          enabled: true,
          silence_threshold: 0.5,        // Seconds of silence to end turn
          min_speaking_time: 0.3,        // Minimum speech duration
          speech_threshold: 0.3,         // Voice activity threshold
          interrupt_threshold: 0.8,      // Interrupt sensitivity
          natural_pauses: true           // Allow natural conversation pauses
        }
      }
    });
  };
  
  return (
    <div>
      <button onClick={handleConnect}>
        Start Turn-Aware Chat
      </button>
    </div>
  );
}

Advanced Configuration

{
  "type": "session_settings",
  "data": {
    "turn_detection": {
      "enabled": true,
      "mode": "adaptive",
      "silence_threshold": 0.8,
      "min_speaking_time": 0.5,
      "max_turn_duration": 30.0,
      "speech_threshold": 0.4,
      "interrupt_threshold": 0.7,
      "natural_pauses": true,
      "background_noise_adaptation": true,
      "speaking_rate_adaptation": true,
      "context_awareness": {
        "enabled": true,
        "conversation_style": "conversational",
        "user_preferences": "standard"
      }
    }
  }
}

enabled

boolean

default:"true"

Enable/disable turn detection

mode

string

default:"standard"

Detection mode: “standard”, “adaptive”, “aggressive”, “conservative”

silence_threshold

number

default:"0.5"

Seconds of silence before ending user turn

min_speaking_time

number

default:"0.3"

Minimum speech duration to register as valid input

max_turn_duration

number

default:"30.0"

Maximum duration for a single turn

speech_threshold

number

default:"0.3"

Voice activity detection sensitivity (0-1)

interrupt_threshold

number

default:"0.8"

Sensitivity for detecting interruptions (0-1)

natural_pauses

boolean

default:"true"

Allow natural pauses without ending turn

Turn Detection Events

React SDK Integration

Monitor turn detection events:

import { useVoice, useVoiceTurnDetection } from '@nextevi/voice-react';

function TurnAwareInterface() {
  const { messages, isRecording, isTTSPlaying } = useVoice();
  const { 
    isUserTurn, 
    isAssistantTurn, 
    turnDuration,
    silenceDuration 
  } = useVoiceTurnDetection();
  
  return (
    <div className="turn-aware-chat">
      <div className="turn-indicator">
        <div className={`user-turn ${isUserTurn ? 'active' : ''}`}>
          👤 Your turn {isUserTurn && `(${turnDuration.toFixed(1)}s)`}
        </div>
        
        <div className={`assistant-turn ${isAssistantTurn ? 'active' : ''}`}>
          🤖 Assistant turn {isTTSPlaying && '(speaking)'}
        </div>
        
        <div className="silence-indicator">
          {silenceDuration > 0 && (
            <span>Silence: {silenceDuration.toFixed(1)}s</span>
          )}
        </div>
      </div>
      
      <div className="recording-status">
        <div className={`mic-indicator ${isRecording ? 'recording' : ''}`}>
          {isRecording ? '🎤 Listening' : '🎤 Ready'}
        </div>
      </div>
      
      <div className="messages">
        {messages.map(message => (
          <MessageWithTurnInfo key={message.id} message={message} />
        ))}
      </div>
    </div>
  );
}

function MessageWithTurnInfo({ message }) {
  const turnInfo = message.metadata?.turnInfo;
  
  return (
    <div className="message">
      <div className="content">{message.content}</div>
      
      {turnInfo && (
        <div className="turn-metadata">
          <span>Duration: {turnInfo.duration}s</span>
          <span>Confidence: {turnInfo.confidence}</span>
          {turnInfo.interrupted && <span>⚠️ Interrupted</span>}
        </div>
      )}
    </div>
  );
}

WebSocket Events

Listen for turn detection events:

const ws = new WebSocket('wss://api.nextevi.com/ws/voice/conn-123?api_key=oak_your_api_key&config_id=your_config_id');

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  
  switch (message.type) {
    case 'turn_start':
      handleTurnStart(message.data);
      break;
      
    case 'turn_end':
      handleTurnEnd(message.data);
      break;
      
    case 'turn_interrupted':
      handleTurnInterrupted(message.data);
      break;
      
    case 'voice_activity':
      handleVoiceActivity(message.data);
      break;
  }
};

function handleTurnStart(data) {
  console.log('User started speaking');
  
  // Stop any ongoing TTS if user interrupts
  if (data.interrupted_tts) {
    stopTTSPlayback();
  }
  
  // Show visual feedback
  showRecordingIndicator();
}

function handleTurnEnd(data) {
  console.log('User finished speaking');
  console.log('Turn duration:', data.duration);
  console.log('Speech confidence:', data.confidence);
  
  // Hide recording indicator
  hideRecordingIndicator();
  
  // Process the completed turn
  if (data.transcript) {
    displayUserMessage(data.transcript);
  }
}

function handleTurnInterrupted(data) {
  console.log('Turn was interrupted');
  console.log('Interruption type:', data.interruption_type);
  
  // Handle different interruption types
  switch (data.interruption_type) {
    case 'user_speech':
      // User started speaking while assistant was talking
      stopAssistantSpeech();
      break;
      
    case 'system_timeout':
      // Turn exceeded maximum duration
      processTurnTimeout();
      break;
      
    case 'silence_timeout':
      // Long pause detected
      processSilenceTimeout();
      break;
  }
}

function handleVoiceActivity(data) {
  // Real-time voice activity updates
  updateVoiceVisualization(data.activity_level);
  
  if (data.background_noise_level > 0.7) {
    showNoiseWarning();
  }
}

Interruption Handling

Types of Interruptions

User Interruption

User starts speaking while AI is responding

System Timeout

Turn exceeds maximum duration limit

Silence Timeout

Extended silence detected during turn

Audio Quality

Poor audio quality interrupts processing

Intelligent Interruption Response

function SmartInterruptionHandler() {
  const { messages } = useVoice();
  const [interruptionCount, setInterruptionCount] = useState(0);
  const [adaptiveSettings, setAdaptiveSettings] = useState({
    interrupt_threshold: 0.8,
    silence_threshold: 0.5
  });
  
  useEffect(() => {
    const recentInterruptions = messages
      .filter(msg => msg.metadata?.interrupted)
      .slice(-5);
    
    setInterruptionCount(recentInterruptions.length);
    
    // Adapt sensitivity based on interruption patterns
    if (recentInterruptions.length > 3) {
      // User interrupts frequently - make less sensitive
      setAdaptiveSettings(prev => ({
        ...prev,
        interrupt_threshold: Math.min(prev.interrupt_threshold + 0.1, 1.0)
      }));
    } else if (recentInterruptions.length === 0) {
      // No interruptions - make more responsive
      setAdaptiveSettings(prev => ({
        ...prev,
        interrupt_threshold: Math.max(prev.interrupt_threshold - 0.1, 0.3)
      }));
    }
  }, [messages]);
  
  const updateTurnSettings = useCallback(async () => {
    // Send updated settings to session
    const settingsUpdate = {
      type: "session_settings",
      data: {
        turn_detection: adaptiveSettings
      }
    };
    
    // Send via WebSocket or SDK method
  }, [adaptiveSettings]);
  
  useEffect(() => {
    updateTurnSettings();
  }, [adaptiveSettings, updateTurnSettings]);
  
  return (
    <div className="interruption-handler">
      <div className="stats">
        Recent interruptions: {interruptionCount}
      </div>
      
      <div className="adaptive-settings">
        Interrupt sensitivity: {adaptiveSettings.interrupt_threshold}
      </div>
    </div>
  );
}

Advanced Features

Context-Aware Turn Detection

function ContextAwareTurnDetection() {
  const { messages } = useVoice();
  const [conversationContext, setConversationContext] = useState('casual');
  
  useEffect(() => {
    // Analyze conversation to determine context
    const context = analyzeConversationContext(messages);
    setConversationContext(context);
    
    // Adjust turn detection based on context
    const contextSettings = getTurnSettingsForContext(context);
    updateTurnDetectionSettings(contextSettings);
  }, [messages]);
  
  const analyzeConversationContext = (messages) => {
    // Analyze recent messages for context clues
    const recentContent = messages
      .slice(-5)
      .map(msg => msg.content)
      .join(' ');
    
    if (recentContent.includes('urgent') || recentContent.includes('emergency')) {
      return 'urgent';
    } else if (recentContent.includes('explain') || recentContent.includes('help')) {
      return 'educational';
    } else if (recentContent.includes('sorry') || recentContent.includes('problem')) {
      return 'support';
    }
    
    return 'casual';
  };
  
  const getTurnSettingsForContext = (context) => {
    const settings = {
      casual: {
        silence_threshold: 0.8,
        interrupt_threshold: 0.7,
        natural_pauses: true
      },
      urgent: {
        silence_threshold: 0.3,
        interrupt_threshold: 0.9,
        natural_pauses: false
      },
      educational: {
        silence_threshold: 1.2,
        interrupt_threshold: 0.5,
        natural_pauses: true
      },
      support: {
        silence_threshold: 0.6,
        interrupt_threshold: 0.8,
        natural_pauses: true
      }
    };
    
    return settings[context] || settings.casual;
  };
}

Multi-Speaker Detection

function MultiSpeakerHandler() {
  const [speakers, setSpeakers] = useState([]);
  const [activeSpeaker, setActiveSpeaker] = useState(null);
  
  const handleSpeakerDetection = useCallback((data) => {
    if (data.type === 'speaker_change') {
      setActiveSpeaker(data.speaker_id);
      
      // Update speaker list if new speaker detected
      if (!speakers.find(s => s.id === data.speaker_id)) {
        setSpeakers(prev => [...prev, {
          id: data.speaker_id,
          confidence: data.confidence,
          characteristics: data.voice_characteristics
        }]);
      }
    }
  }, [speakers]);
  
  return (
    <div className="multi-speaker-interface">
      <div className="speakers">
        <h4>Detected Speakers:</h4>
        {speakers.map(speaker => (
          <div key={speaker.id} className={`speaker ${
            activeSpeaker === speaker.id ? 'active' : ''
          }`}>
            Speaker {speaker.id}
            {activeSpeaker === speaker.id && ' (speaking)'}
          </div>
        ))}
      </div>
    </div>
  );
}

Turn Analytics

function TurnAnalytics() {
  const { messages } = useVoice();
  const [analytics, setAnalytics] = useState({});
  
  useEffect(() => {
    const turnData = messages
      .filter(msg => msg.metadata?.turnInfo)
      .map(msg => msg.metadata.turnInfo);
    
    if (turnData.length === 0) return;
    
    const stats = {
      averageTurnDuration: calculateAverageDuration(turnData),
      interruptionRate: calculateInterruptionRate(turnData),
      silencePatterns: analyzeSilencePatterns(turnData),
      conversationPacing: analyzeConversationPacing(turnData)
    };
    
    setAnalytics(stats);
  }, [messages]);
  
  return (
    <div className="turn-analytics">
      <h3>Conversation Analytics</h3>
      
      <div className="stat">
        Average turn duration: {analytics.averageTurnDuration}s
      </div>
      
      <div className="stat">
        Interruption rate: {analytics.interruptionRate}%
      </div>
      
      <div className="stat">
        Conversation pacing: {analytics.conversationPacing}
      </div>
    </div>
  );
}

Best Practices

Sensitivity Tuning

Start with conservative settings and adjust based on user behavior
Account for different speaking styles and speeds
Consider background noise levels in the user’s environment
Test with diverse accents and languages

User Experience

Provide clear visual feedback for turn states
Handle interruptions gracefully without jarring stops
Allow users to adjust sensitivity settings
Provide helpful error messages for turn detection issues

Performance Optimization

Use efficient voice activity detection algorithms
Minimize latency between turn detection and response
Optimize for mobile devices with limited processing power
Monitor and adjust settings based on connection quality

Accessibility

Support users with different speech patterns
Provide alternative input methods for users who cannot speak clearly
Consider cognitive load in turn timing decisions
Allow customization for users with disabilities

Troubleshooting

Common Issues

Frequent False Starts

Problem: Turn detection triggers on background noiseSolutions:

Increase speech_threshold value
Enable background_noise_adaptation
Increase min_speaking_time
Use noise cancellation on client side

Delayed Turn Detection

Problem: System doesn’t detect when user stops speakingSolutions:

Decrease silence_threshold
Adjust for user’s natural speaking pace
Check audio quality and connection stability
Verify microphone sensitivity

Excessive Interruptions

Problem: AI gets interrupted too frequentlySolutions:

Increase interrupt_threshold
Enable natural_pauses
Adjust TTS speed and pacing
Implement interrupt recovery strategies

Poor Audio Quality Detection

Problem: Turn detection fails with poor audioSolutions:

Implement audio quality monitoring
Provide audio setup guidance to users
Use adaptive thresholds based on audio quality
Enable client-side audio preprocessing

Turn detection performance varies significantly based on user environment, speaking style, and audio quality. Always provide fallback mechanisms and user controls for optimal experience.

Getting Started

Speech-to-Speech

API Reference

Turn Detection & Interruption Handling

Overview

How Turn Detection Works

Voice Activity Detection (VAD)

Key Components

Configuration

Basic Turn Detection Settings

Advanced Configuration

Turn Detection Events

React SDK Integration

WebSocket Events

Interruption Handling

Types of Interruptions

User Interruption

System Timeout

Silence Timeout

Audio Quality

Intelligent Interruption Response

Advanced Features

Context-Aware Turn Detection

Multi-Speaker Detection

Turn Analytics

Best Practices

Troubleshooting

Common Issues

Getting Started

Speech-to-Speech

API Reference

​Overview

​How Turn Detection Works

​Voice Activity Detection (VAD)

​Key Components

​Configuration

​Basic Turn Detection Settings

​Advanced Configuration

​Turn Detection Events

​React SDK Integration

​WebSocket Events

​Interruption Handling

​Types of Interruptions

User Interruption

System Timeout

Silence Timeout

Audio Quality

​Intelligent Interruption Response

​Advanced Features

​Context-Aware Turn Detection

​Multi-Speaker Detection

​Turn Analytics

​Best Practices

​Troubleshooting

​Common Issues

Overview

How Turn Detection Works

Voice Activity Detection (VAD)

Key Components

Configuration

Basic Turn Detection Settings

Advanced Configuration

Turn Detection Events

React SDK Integration

WebSocket Events

Interruption Handling

Types of Interruptions

Intelligent Interruption Response

Advanced Features

Context-Aware Turn Detection

Multi-Speaker Detection

Turn Analytics

Best Practices

Troubleshooting

Common Issues