WebSocket Message Protocol
This guide covers the complete message protocol for NextEVI’s WebSocket API, including message types, data formats, and communication patterns.
All messages use consistent JSON structure:
{
"type" : "message_type" ,
"timestamp" : 1645123456.789 ,
"message_id" : "uuid-string" ,
"data" : {
// Message-specific payload
}
}
Unix timestamp in seconds with millisecond precision
Unique identifier for this message (UUID recommended)
Message-specific data payload
Client-to-Server Messages
Session Settings
Configure audio settings and enable features:
{
"type" : "session_settings" ,
"timestamp" : 1645123456.789 ,
"message_id" : "settings-1" ,
"data" : {
"emotion_detection" : { "enabled" : true },
"turn_detection" : { "enabled" : true , "silence_threshold" : 0.5 },
"audio" : {
"sample_rate" : 24000 ,
"channels" : 1 ,
"encoding" : "linear16"
}
}
}
Show Session Settings Parameters
enabled (boolean): Enable real-time emotion detection
enabled (boolean): Enable intelligent turn detection
silence_threshold (number): Silence duration to detect turn end (seconds)
sample_rate (number): Audio sample rate (24000 recommended)
channels (number): Audio channels (1 for mono)
encoding (string): Audio encoding format (“linear16”)
Send audio data for processing:
{
"type" : "audio_input" ,
"timestamp" : 1645123456.789 ,
"message_id" : "audio-1" ,
"data" : {
"audio" : "base64-encoded-audio-data" ,
"chunk_id" : "chunk-001"
}
}
Show Audio Input Parameters
Base64-encoded PCM audio data (16-bit, mono, 24kHz)
Optional identifier for audio chunk ordering
Keep Alive
Maintain connection during idle periods:
{
"type" : "keep_alive" ,
"timestamp" : 1645123456.789 ,
"message_id" : "ping-1"
}
Server-to-Client Messages
Sent immediately after successful connection:
{
"type" : "connection_metadata" ,
"timestamp" : 1645123456.789 ,
"message_id" : "meta-1" ,
"data" : {
"connection_id" : "conn-xyz789" ,
"status" : "connected" ,
"config" : {
"audio_format" : "pcm_24khz_16bit_mono" ,
"encoding" : "linear16" ,
"sample_rate" : 24000 ,
"channels" : 1
},
"project_id" : "project-123" ,
"config_id" : "config-abc"
}
}
Transcription
Real-time speech-to-text results:
{
"type" : "transcription" ,
"timestamp" : 1645123456.789 ,
"message_id" : "transcript-1" ,
"data" : {
"transcript" : "Hello, how can I help you today?" ,
"confidence" : 0.95 ,
"is_final" : true ,
"is_speech_final" : true ,
"session_id" : "conn-xyz789" ,
"words" : [
{
"word" : "Hello" ,
"start" : 1.2 ,
"end" : 1.6 ,
"confidence" : 0.98
}
],
"accumulated_transcript" : "Hello, how can I help you today?" ,
"is_turn_incomplete" : false ,
"original_fragment" : "Hello, how can I help you today?"
}
}
Show Transcription Parameters
Transcribed text from speech
Transcription confidence score (0-1)
Whether this transcription is final or partial
Whether the user has finished speaking
Word-level timing and confidence information
Complete accumulated text for this conversation turn
Whether the user’s turn is still continuing
LLM Response Chunk
Streaming text responses from the language model:
{
"type" : "llm_response_chunk" ,
"timestamp" : 1645123456.789 ,
"message_id" : "llm-chunk-1" ,
"data" : {
"content" : "I'd be happy to help you with" ,
"is_final" : false ,
"generation_id" : "gen-abc123" ,
"chunk_index" : 1
}
}
TTS Audio Chunk
Audio response chunks for playback:
{
"type" : "tts_chunk" ,
"timestamp" : 1645123456.789 ,
"message_id" : "tts-1" ,
"content" : "base64-encoded-audio-data"
}
Emotion Update
Real-time emotion detection results:
{
"type" : "emotion_update" ,
"timestamp" : 1645123456.789 ,
"message_id" : "emotion-1" ,
"data" : {
"top_emotions" : [
{ "name" : "Joy" , "score" : 0.85 },
{ "name" : "Excitement" , "score" : 0.72 }
],
"all_emotions" : {
"Joy" : 0.85 ,
"Sadness" : 0.12 ,
"Anger" : 0.03 ,
"Fear" : 0.05 ,
"Surprise" : 0.15 ,
"Disgust" : 0.02 ,
"Contempt" : 0.01 ,
"Excitement" : 0.72 ,
"Calmness" : 0.45
},
"processing_time" : 0.045 ,
"utterance_duration" : 2.3 ,
"connection_id" : "conn-xyz789" ,
"session_id" : "conn-xyz789"
}
}
Top detected emotions with confidence scores
Complete emotion analysis results
Time taken to process emotion detection (seconds)
Duration of analyzed speech segment (seconds)
Turn Detection Events
Conversation turn management:
{
"type" : "turn_start" ,
"timestamp" : 1645123456.789 ,
"message_id" : "turn-1" ,
"data" : {
"turn_id" : "turn-abc123"
}
}
{
"type" : "turn_end" ,
"timestamp" : 1645123456.789 ,
"message_id" : "turn-2" ,
"data" : {
"turn_id" : "turn-abc123" ,
"duration" : 3.2 ,
"is_complete" : true
}
}
TTS Interruption
Indicates AI speech was interrupted:
{
"type" : "tts_interruption" ,
"timestamp" : 1645123456.789 ,
"message_id" : "interrupt-1" ,
"content" : ""
}
Status Messages
System status updates:
{
"type" : "status" ,
"timestamp" : 1645123456.789 ,
"message_id" : "status-1" ,
"data" : {
"status" : "ready" ,
"details" : {
"session_settings" : {
"sample_rate" : 24000 ,
"channels" : 1 ,
"encoding" : "linear16"
}
}
}
}
Error Messages
Error notifications:
{
"type" : "error" ,
"timestamp" : 1645123456.789 ,
"message_id" : "error-1" ,
"data" : {
"error_code" : "AUDIO_PROCESSING_FAILED" ,
"error_message" : "Failed to process audio chunk" ,
"details" : {
"chunk_id" : "chunk-001"
}
}
}
Binary Audio Messages
For efficiency, audio can be sent as binary WebSocket messages instead of base64-encoded JSON. Send raw PCM audio data (16-bit, mono, 24kHz) directly as binary frames.
// Send binary audio
const audioBuffer = new Int16Array ( audioSamples );
websocket . send ( audioBuffer . buffer );
Message Flow Examples
Basic Voice Conversation
Error Handling
Best Practices
Message IDs
Use UUIDs for message_id fields
Include sequence numbers for audio chunks
Track message correlation for debugging
Error Handling
Implement exponential backoff for reconnections
Handle partial message scenarios
Log all error messages for debugging
Send audio in 100-200ms chunks for optimal latency
Use binary messages for audio when possible
Implement client-side audio buffering
Next Steps