WebSocket Message Protocol

This guide covers the complete message protocol for NextEVI’s WebSocket API, including message types, data formats, and communication patterns.

Message Format

All messages use consistent JSON structure:

{
  "type": "message_type",
  "timestamp": 1645123456.789,
  "message_id": "uuid-string",
  "data": {
    // Message-specific payload
  }
}

type

string

required

Message type identifier

timestamp

number

required

Unix timestamp in seconds with millisecond precision

message_id

string

required

Unique identifier for this message (UUID recommended)

data

object

Message-specific data payload

Client-to-Server Messages

Session Settings

Configure audio settings and enable features:

{
  "type": "session_settings",
  "timestamp": 1645123456.789,
  "message_id": "settings-1",
  "data": {
    "emotion_detection": { "enabled": true },
    "turn_detection": { "enabled": true, "silence_threshold": 0.5 },
    "audio": { 
      "sample_rate": 24000, 
      "channels": 1, 
      "encoding": "linear16" 
    }
  }
}

Show Session Settings Parameters

emotion_detection

object

enabled (boolean): Enable real-time emotion detection

turn_detection

object

enabled (boolean): Enable intelligent turn detection
silence_threshold (number): Silence duration to detect turn end (seconds)

audio

object

sample_rate (number): Audio sample rate (24000 recommended)
channels (number): Audio channels (1 for mono)
encoding (string): Audio encoding format (“linear16”)

Audio Input

Send audio data for processing:

{
  "type": "audio_input",
  "timestamp": 1645123456.789,
  "message_id": "audio-1",
  "data": {
    "audio": "base64-encoded-audio-data",
    "chunk_id": "chunk-001"
  }
}

Show Audio Input Parameters

audio

string

required

Base64-encoded PCM audio data (16-bit, mono, 24kHz)

chunk_id

string

Optional identifier for audio chunk ordering

Keep Alive

Maintain connection during idle periods:

{
  "type": "keep_alive",
  "timestamp": 1645123456.789,
  "message_id": "ping-1"
}

Server-to-Client Messages

Connection Metadata

Sent immediately after successful connection:

{
  "type": "connection_metadata",
  "timestamp": 1645123456.789,
  "message_id": "meta-1",
  "data": {
    "connection_id": "conn-xyz789",
    "status": "connected",
    "config": {
      "audio_format": "pcm_24khz_16bit_mono",
      "encoding": "linear16",
      "sample_rate": 24000,
      "channels": 1
    },
    "project_id": "project-123",
    "config_id": "config-abc"
  }
}

Transcription

Real-time speech-to-text results:

{
  "type": "transcription",
  "timestamp": 1645123456.789,
  "message_id": "transcript-1",
  "data": {
    "transcript": "Hello, how can I help you today?",
    "confidence": 0.95,
    "is_final": true,
    "is_speech_final": true,
    "session_id": "conn-xyz789",
    "words": [
      {
        "word": "Hello",
        "start": 1.2,
        "end": 1.6,
        "confidence": 0.98
      }
    ],
    "accumulated_transcript": "Hello, how can I help you today?",
    "is_turn_incomplete": false,
    "original_fragment": "Hello, how can I help you today?"
  }
}

Show Transcription Parameters

transcript

string

Transcribed text from speech

confidence

number

Transcription confidence score (0-1)

is_final

boolean

Whether this transcription is final or partial

is_speech_final

boolean

Whether the user has finished speaking

words

array

Word-level timing and confidence information

accumulated_transcript

string

Complete accumulated text for this conversation turn

is_turn_incomplete

boolean

Whether the user’s turn is still continuing

LLM Response Chunk

Streaming text responses from the language model:

{
  "type": "llm_response_chunk",
  "timestamp": 1645123456.789,
  "message_id": "llm-chunk-1",
  "data": {
    "content": "I'd be happy to help you with",
    "is_final": false,
    "generation_id": "gen-abc123",
    "chunk_index": 1
  }
}

TTS Audio Chunk

Audio response chunks for playback:

{
  "type": "tts_chunk",
  "timestamp": 1645123456.789,
  "message_id": "tts-1",
  "content": "base64-encoded-audio-data"
}

Emotion Update

Real-time emotion detection results:

{
  "type": "emotion_update",
  "timestamp": 1645123456.789,
  "message_id": "emotion-1",
  "data": {
    "top_emotions": [
      { "name": "Joy", "score": 0.85 },
      { "name": "Excitement", "score": 0.72 }
    ],
    "all_emotions": {
      "Joy": 0.85,
      "Sadness": 0.12,
      "Anger": 0.03,
      "Fear": 0.05,
      "Surprise": 0.15,
      "Disgust": 0.02,
      "Contempt": 0.01,
      "Excitement": 0.72,
      "Calmness": 0.45
    },
    "processing_time": 0.045,
    "utterance_duration": 2.3,
    "connection_id": "conn-xyz789",
    "session_id": "conn-xyz789"
  }
}

Show Emotion Parameters

top_emotions

array

Top detected emotions with confidence scores

all_emotions

object

Complete emotion analysis results

processing_time

number

Time taken to process emotion detection (seconds)

utterance_duration

number

Duration of analyzed speech segment (seconds)

Turn Detection Events

Conversation turn management:

{
  "type": "turn_start",
  "timestamp": 1645123456.789,
  "message_id": "turn-1",
  "data": {
    "turn_id": "turn-abc123"
  }
}

{
  "type": "turn_end",
  "timestamp": 1645123456.789,
  "message_id": "turn-2",
  "data": {
    "turn_id": "turn-abc123",
    "duration": 3.2,
    "is_complete": true
  }
}

TTS Interruption

Indicates AI speech was interrupted:

{
  "type": "tts_interruption",
  "timestamp": 1645123456.789,
  "message_id": "interrupt-1",
  "content": ""
}

Status Messages

System status updates:

{
  "type": "status",
  "timestamp": 1645123456.789,
  "message_id": "status-1",
  "data": {
    "status": "ready",
    "details": {
      "session_settings": {
        "sample_rate": 24000,
        "channels": 1,
        "encoding": "linear16"
      }
    }
  }
}

Error Messages

Error notifications:

{
  "type": "error",
  "timestamp": 1645123456.789,
  "message_id": "error-1",
  "data": {
    "error_code": "AUDIO_PROCESSING_FAILED",
    "error_message": "Failed to process audio chunk",
    "details": {
      "chunk_id": "chunk-001"
    }
  }
}

Binary Audio Messages

For efficiency, audio can be sent as binary WebSocket messages instead of base64-encoded JSON. Send raw PCM audio data (16-bit, mono, 24kHz) directly as binary frames.

// Send binary audio
const audioBuffer = new Int16Array(audioSamples);
websocket.send(audioBuffer.buffer);

Getting Started

Speech-to-Speech

API Reference

WebSocket Protocol

WebSocket Message Protocol

Message Format

Client-to-Server Messages

Session Settings

Audio Input

Keep Alive

Server-to-Client Messages

Connection Metadata

Transcription

LLM Response Chunk

TTS Audio Chunk

Emotion Update

Turn Detection Events

TTS Interruption

Status Messages

Error Messages

Binary Audio Messages

Message Flow Examples

Basic Voice Conversation

Error Handling

Best Practices

Message IDs

Error Handling

Performance

Next Steps

Connection Examples

Error Reference

Getting Started

Speech-to-Speech

API Reference

​WebSocket Message Protocol

​Message Format

​Client-to-Server Messages

​Session Settings

​Audio Input

​Keep Alive

​Server-to-Client Messages

​Connection Metadata

​Transcription

​LLM Response Chunk

​TTS Audio Chunk

​Emotion Update

​Turn Detection Events

​TTS Interruption

​Status Messages

​Error Messages

​Binary Audio Messages

​Message Flow Examples

​Basic Voice Conversation

​Error Handling

​Best Practices

​Message IDs

​Error Handling

​Performance

​Next Steps

Connection Examples

Error Reference

WebSocket Message Protocol

Message Format

Client-to-Server Messages

Session Settings

Audio Input

Keep Alive

Server-to-Client Messages

Connection Metadata

Transcription

LLM Response Chunk

TTS Audio Chunk

Emotion Update

Turn Detection Events

TTS Interruption

Status Messages

Error Messages

Binary Audio Messages

Message Flow Examples

Basic Voice Conversation

Error Handling

Best Practices

Message IDs

Error Handling

Performance

Next Steps