WebSocket API Reference

Complete reference documentation for NextEVI’s real-time WebSocket API for Speech-to-Speech voice communication.

Base URL

wss://api.nextevi.com/ws/voice/{connection_id}

Connection

Endpoint

wss://api.nextevi.com/ws/voice/{connection_id}

connection_id

string

required

Unique connection identifier. Generate a UUID v4 for each new connection.

Authentication

API Key (Query Parameter)
JWT Token (Header)
JWT Token (Query Parameter)

Pass your organization API key as a query parameter:

wss://api.nextevi.com/ws/voice/{connection_id}?api_key=oak_your_api_key&config_id=your_config_id

Pass JWT token via Authorization header (recommended for client applications):

Authorization: Bearer your_jwt_token

wss://api.nextevi.com/ws/voice/{connection_id}?config_id=your_config_id

Pass JWT token as query parameter (browser-compatible):

wss://api.nextevi.com/ws/voice/{connection_id}?authorization=Bearer%20your_jwt_token&config_id=your_config_id

Query Parameters

api_key

string

Organization API key (starts with oak_). Required if not using JWT authentication.

config_id

string

required

Voice configuration identifier from your NextEVI dashboard.

project_id

string

Project identifier (optional, auto-detected from config if not provided)

authorization

string

JWT token as ‘Bearer token’ - alternative to Authorization header

Response

Connection establishment follows standard WebSocket handshake. Upon successful connection, server sends:

Connection Metadata - Connection details and configuration
Ready for Messages - Client can now send session settings and audio

Connection Flow

WebSocket Handshake: Client initiates WebSocket connection
Authentication: Server validates API key or JWT token
Connection Metadata: Server sends connection details
Session Settings: Client configures audio and feature settings
Ready: Connection ready for voice communication

Message Format

All WebSocket messages use consistent JSON structure:

{
  "type": "message_type",
  "timestamp": 1645123456.789,
  "message_id": "uuid-string",
  "data": {
    // Message-specific payload
  }
}

type

string

required

Message type identifier (see message types below)

timestamp

number

required

Unix timestamp in seconds with millisecond precision

message_id

string

required

Unique identifier for this message (UUID recommended)

data

object

Message-specific data payload (varies by message type)

Client Messages

Messages sent from client to server.

Session Settings

Configure audio settings and enable features for the connection.

{
  "type": "session_settings",
  "timestamp": 1645123456.789,
  "message_id": "settings-1",
  "data": {
    "emotion_detection": { "enabled": true },
    "turn_detection": { "enabled": true, "silence_threshold": 0.5 },
    "audio": { 
      "sample_rate": 24000, 
      "channels": 1, 
      "encoding": "linear16" 
    }
  }
}

data.emotion_detection

object

enabled (boolean): Enable real-time emotion detection

data.turn_detection

object

enabled (boolean): Enable intelligent turn detection
silence_threshold (number): Silence duration to detect turn end (seconds)

data.audio

object

required

sample_rate (number): Audio sample rate (24000 recommended)
channels (number): Audio channels (1 for mono)
encoding (string): Audio encoding format (“linear16”)

Audio Input

Send audio data for speech processing.

{
  "type": "audio_input", 
  "timestamp": 1645123456.789,
  "message_id": "audio-1",
  "data": {
    "audio": "base64-encoded-audio-data",
    "chunk_id": "chunk-001"
  }
}

data.audio

string

required

Base64-encoded PCM audio data (16-bit, mono, 24kHz)

data.chunk_id

string

Optional identifier for audio chunk ordering

Alternative: Binary Audio For efficiency, send raw PCM audio data (16-bit, mono, 24kHz) as binary WebSocket frames:

const audioBuffer = new Int16Array(audioSamples);
websocket.send(audioBuffer.buffer);

Keep Alive

Maintain connection during idle periods.

{
  "type": "keep_alive",
  "timestamp": 1645123456.789, 
  "message_id": "ping-1"
}

Server Messages

Messages sent from server to client.

Connection Metadata

Sent immediately after successful connection establishment.

{
  "type": "connection_metadata",
  "timestamp": 1645123456.789,
  "message_id": "meta-1", 
  "data": {
    "connection_id": "conn-xyz789",
    "status": "connected",
    "config": {
      "audio_format": "pcm_24khz_16bit_mono",
      "encoding": "linear16", 
      "sample_rate": 24000,
      "channels": 1
    },
    "project_id": "project-123",
    "config_id": "config-abc"
  }
}

data.connection_id

string

Confirmed connection identifier

data.status

string

Connection status (“connected”)

data.config

object

Audio configuration details

data.project_id

string

Associated project identifier

data.config_id

string

Voice configuration identifier

Transcription

Real-time speech-to-text results from user audio input.

{
  "type": "transcription",
  "timestamp": 1645123456.789,
  "message_id": "transcript-1",
  "data": {
    "transcript": "Hello, how can I help you today?",
    "confidence": 0.95,
    "is_final": true,
    "is_speech_final": true,
    "session_id": "conn-xyz789",
    "words": [
      {
        "word": "Hello",
        "start": 1.2,
        "end": 1.6, 
        "confidence": 0.98
      }
    ],
    "accumulated_transcript": "Hello, how can I help you today?",
    "is_turn_incomplete": false,
    "original_fragment": "Hello, how can I help you today?"
  }
}

data.transcript

string

Transcribed text from speech input

data.confidence

number

Transcription confidence score (0-1)

data.is_final

boolean

Whether this transcription is final (true) or partial (false)

data.is_speech_final

boolean

Whether the user has finished speaking this utterance

data.session_id

string

Session identifier for this connection

data.words

array

Word-level timing and confidence information

word (string): The word
start (number): Start time in seconds
end (number): End time in seconds
confidence (number): Word confidence score (0-1)

data.accumulated_transcript

string

Complete accumulated text for this conversation turn

data.is_turn_incomplete

boolean

Whether the user’s conversation turn is still continuing

data.original_fragment

string

Original transcript fragment before accumulation

LLM Response Chunk

Streaming text responses from the language model.

{
  "type": "llm_response_chunk",
  "timestamp": 1645123456.789,
  "message_id": "llm-chunk-1",
  "data": {
    "content": "I'd be happy to help you with",
    "is_final": false,
    "generation_id": "gen-abc123", 
    "chunk_index": 1
  }
}

data.content

string

Text content chunk from language model

data.is_final

boolean

Whether this is the final chunk in the response

data.generation_id

string

Unique identifier for this response generation

data.chunk_index

number

Sequential index of this chunk in the response

TTS Audio Chunk

Audio response chunks for playback to user.

{
  "type": "tts_chunk",
  "timestamp": 1645123456.789,
  "message_id": "tts-1", 
  "content": "base64-encoded-audio-data"
}

content

string

Base64-encoded audio data (WAV format) for playback

Emotion Update

Real-time emotion detection results from user speech.

{
  "type": "emotion_update",
  "timestamp": 1645123456.789,
  "message_id": "emotion-1",
  "data": {
    "top_emotions": [
      { "name": "Joy", "score": 0.85 },
      { "name": "Excitement", "score": 0.72 }
    ],
    "all_emotions": {
      "Joy": 0.85,
      "Sadness": 0.12,
      "Anger": 0.03,
      "Fear": 0.05, 
      "Surprise": 0.15,
      "Disgust": 0.02,
      "Contempt": 0.01,
      "Excitement": 0.72,
      "Calmness": 0.45
    },
    "processing_time": 0.045,
    "utterance_duration": 2.3,
    "connection_id": "conn-xyz789",
    "session_id": "conn-xyz789"
  }
}

data.top_emotions

array

Top detected emotions with confidence scores

name (string): Emotion name
score (number): Confidence score (0-1)

data.all_emotions

object

Complete emotion analysis results with scores for all emotions

data.processing_time

number

Time taken to process emotion detection (seconds)

data.utterance_duration

number

Duration of analyzed speech segment (seconds)

data.connection_id

string

Connection identifier

data.session_id

string

Session identifier

Turn Detection Events

Conversation turn management events. Turn Start

{
  "type": "turn_start", 
  "timestamp": 1645123456.789,
  "message_id": "turn-1",
  "data": {
    "turn_id": "turn-abc123"
  }
}

Turn End

{
  "type": "turn_end",
  "timestamp": 1645123456.789,
  "message_id": "turn-2", 
  "data": {
    "turn_id": "turn-abc123",
    "duration": 3.2,
    "is_complete": true
  }
}

data.turn_id

string

Unique identifier for this conversation turn

data.duration

number

Duration of the turn in seconds (turn_end only)

data.is_complete

boolean

Whether the turn was completed naturally (turn_end only)

TTS Interruption

Indicates AI speech was interrupted by user.

{
  "type": "tts_interruption",
  "timestamp": 1645123456.789,
  "message_id": "interrupt-1",
  "content": ""
}

Status Messages

System status updates and confirmations.

{
  "type": "status",
  "timestamp": 1645123456.789,
  "message_id": "status-1",
  "data": {
    "status": "ready", 
    "details": {
      "session_settings": {
        "sample_rate": 24000,
        "channels": 1,
        "encoding": "linear16"
      }
    }
  }
}

data.status

string

Current system status

ready: System ready for voice communication
processing: Processing audio or generating response
error: Error state

data.details

object

Additional status details and configuration

Error Messages

Error notifications and debugging information.

{
  "type": "error",
  "timestamp": 1645123456.789, 
  "message_id": "error-1",
  "data": {
    "error_code": "AUDIO_PROCESSING_FAILED",
    "error_message": "Failed to process audio chunk",
    "details": {
      "chunk_id": "chunk-001"
    }
  }
}

data.error_code

string

Standardized error code (see Error Reference)

data.error_message

string

Human-readable error message

data.details

object

Additional error context and debugging information

Response Codes

WebSocket connections use standard HTTP status codes during handshake, then WebSocket close codes:

HTTP Status Codes (Handshake)

Code	Description
`101`	Switching Protocols - Connection successful
`400`	Bad Request - Invalid connection parameters
`401`	Unauthorized - Authentication failed
`403`	Forbidden - Access denied
`404`	Not Found - Invalid endpoint
`429`	Too Many Requests - Rate limited
`500`	Internal Server Error - Server error

WebSocket Close Codes

Code	Description	Retry
`1000`	Normal Closure - Clean disconnect	No
`1001`	Going Away - Server restart	Yes
`1002`	Protocol Error - Invalid message format	No
`1003`	Unsupported Data - Invalid data type	No
`1006`	Abnormal Closure - Network error	Yes
`1011`	Internal Error - Server error	Yes
`4001`	Unauthorized - Authentication failed	No
`4002`	Invalid Config - Config not found	No
`4003`	Access Denied - Insufficient permissions	No
`4004`	Rate Limited - Too many connections	Yes

Rate Limits

Connection Limits

Limit Type	Limit	Window
Connections per API Key	100	1 minute
Connections per IP	50	1 minute
Audio Messages	1000	1 minute
Text Messages	100	1 minute

Audio Limits

Metric	Limit
Max Audio Chunk Size	1 MB
Max Message Rate	100/second
Max Session Duration	60 minutes
Max Concurrent Sessions	10 per API key

Rate limits are enforced per API key and IP address. Exceeded limits result in HTTP 429 or WebSocket close code 4004.

Best Practices

Connection Management

Generate unique connection IDs (UUID v4 recommended)
Implement exponential backoff for reconnections
Handle connection lifecycle properly (open/message/error/close)
Use keep-alive messages for long idle periods

Audio Streaming

Send audio in 100-200ms chunks for optimal latency
Use 24kHz, 16-bit, mono PCM format
Implement audio buffering on client side
Use binary WebSocket frames for audio when possible

Error Handling

Always handle WebSocket error and close events
Implement retry logic with backoff for network errors
Don’t retry authentication failures (4xxx codes)
Log errors with sufficient context for debugging

Performance

Minimize message payloads where possible
Use efficient audio encoding (binary vs base64)
Implement client-side audio processing (noise reduction)
Monitor connection health and latency

Security

Use secure WebSocket connections (wss://) only
Validate all message payloads
Implement proper authentication token refresh
Don’t log sensitive data in error messages

Code Examples

JavaScript Connection

const ws = new WebSocket(
  'wss://api.nextevi.com/ws/voice/conn-123?' + 
  new URLSearchParams({
    api_key: 'oak_your_api_key',
    config_id: 'your_config_id'
  })
);

ws.onopen = () => {
  // Send session settings
  ws.send(JSON.stringify({
    type: 'session_settings',
    timestamp: Date.now() / 1000,
    message_id: 'settings-1',
    data: {
      emotion_detection: { enabled: true },
      audio: { sample_rate: 24000, channels: 1, encoding: 'linear16' }
    }
  }));
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  console.log('Received:', message);
};

Python Connection

import asyncio
import websockets
import json

async def connect():
    uri = "wss://api.nextevi.com/ws/voice/conn-123?api_key=oak_your_api_key&config_id=your_config_id"
    
    async with websockets.connect(uri) as websocket:
        # Send session settings
        await websocket.send(json.dumps({
            "type": "session_settings",
            "timestamp": time.time(),
            "message_id": "settings-1", 
            "data": {
                "emotion_detection": {"enabled": True},
                "audio": {"sample_rate": 24000, "channels": 1, "encoding": "linear16"}
            }
        }))
        
        # Listen for messages
        async for message in websocket:
            data = json.loads(message)
            print("Received:", data)

asyncio.run(connect())

cURL Connection Test

# Test WebSocket connection with cURL
curl -i -N -H "Connection: Upgrade" \
     -H "Upgrade: websocket" \
     -H "Sec-WebSocket-Version: 13" \
     -H "Sec-WebSocket-Key: $(echo -n 'test' | base64)" \
     -H "Authorization: Bearer your_jwt_token" \
     "wss://api.nextevi.com/ws/voice/conn-123?config_id=your_config_id"

WebSocket Connection Guide

Learn how to establish connections

Message Protocol

Detailed message format guide

Authentication

Authentication methods and security

Error Reference

Complete error codes and solutions

Getting Started

Speech-to-Speech

API Reference

​WebSocket API Reference

​Base URL

​Connection

​Endpoint

​Authentication

​Query Parameters

​Response

​Connection Flow

​Message Format

​Client Messages

​Session Settings

​Audio Input

​Keep Alive

​Server Messages

​Connection Metadata

​Transcription

​LLM Response Chunk

​TTS Audio Chunk

​Emotion Update

​Turn Detection Events

​TTS Interruption

​Status Messages

​Error Messages

​Response Codes

​HTTP Status Codes (Handshake)

​WebSocket Close Codes

​Rate Limits

​Connection Limits

​Audio Limits

​Best Practices

​Connection Management

​Audio Streaming

​Error Handling

​Performance

​Security

​Code Examples

​JavaScript Connection

​Python Connection

​cURL Connection Test

​Related Documentation

WebSocket Connection Guide

Message Protocol

Authentication

Error Reference

WebSocket API Reference

Base URL

Connection

Endpoint

Authentication

Query Parameters

Response

Connection Flow

Message Format

Client Messages

Session Settings

Audio Input

Keep Alive

Server Messages

Connection Metadata

Transcription

LLM Response Chunk

TTS Audio Chunk

Emotion Update

Turn Detection Events

TTS Interruption

Status Messages

Error Messages

Response Codes

HTTP Status Codes (Handshake)

WebSocket Close Codes

Rate Limits

Connection Limits

Audio Limits

Best Practices

Connection Management

Audio Streaming

Error Handling

Performance

Security

Code Examples

JavaScript Connection

Python Connection

cURL Connection Test

Related Documentation