Sogni Voice API Documentation

Authentication (Optional)

API key authentication can be enabled for production deployments. When enabled, all API endpoints (including Kokoro TTS, Pocket TTS, Qwen3 TTS, and Transcription) require authentication, except health check and auth status.

Check Auth Status

GET /auth/status

Check if authentication is enabled on this server.

Response

{
  "authEnabled": true
}

Authenticating Requests

When authentication is enabled, include your API key using one of these methods:

Option 1: X-API-Key Header (recommended)

curl -X POST https://voice.sogni.ai/tts \
  -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world"}'

Option 2: Authorization Bearer Header

curl -X POST https://voice.sogni.ai/tts \
  -H "Authorization: Bearer your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world"}'

JavaScript Example

const API_KEY = 'your_api_key_here';

const response = await fetch('https://voice.sogni.ai/tts', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': API_KEY
  },
  body: JSON.stringify({
    text: 'Hello, world!'
  })
});

Python Example

import requests

API_KEY = 'your_api_key_here'

response = requests.post(
    'https://voice.sogni.ai/tts',
    headers={'X-API-Key': API_KEY},
    json={'text': 'Hello, world!'}
)

with open('output.wav', 'wb') as f:
    f.write(response.content)

Public Endpoints

These endpoints are always accessible without authentication:

GET /health - Health check for monitoring
GET /auth/status - Check if auth is enabled

Error Response

When authentication fails, the API returns a 401 Unauthorized response:

{
  "statusCode": 401,
  "error": "Unauthorized",
  "message": "Missing API key. Provide X-API-Key header or Authorization: Bearer <key>"
}

1. STT (Speech-to-Text) - Parakeet

Upload an audio file and receive a text transcript.

POST /transcribe

Request

Send audio as multipart/form-data.

Parameter	Type	Description
`file` *	File	Audio file to transcribe (supports common formats: mp3, wav, webm, m4a, etc.)
`timestamps` (optional)	string	Set to "true" to include sentence-level timings with start/end times for each segment
`wordTimestamps` (optional)	string	Set to "true" to include word-level timings with start/end times for each word (overrides timestamps)

Response (default)

{
  "success": true,
  "transcript": "The transcribed text appears here.",
  "filename": "recording.mp3"
}

Response (with sentence timestamps)

When timestamps=true, the response includes sentence-level timing data for subtitle generation:

{
  "success": true,
  "timestamps": [
    { "start": 0.00, "end": 2.34, "text": "Hello and welcome" },
    { "start": 2.34, "end": 5.67, "text": "to our presentation today" },
    { "start": 5.67, "end": 8.90, "text": "we will cover several topics" }
  ]
}

Response (with word timestamps)

When wordTimestamps=true, the response includes word-level timing data for precise subtitle synchronization:

{
  "success": true,
  "timestamps": [
    { "start": 0.00, "end": 0.48, "text": "Hello" },
    { "start": 0.48, "end": 0.72, "text": "and" },
    { "start": 0.72, "end": 1.20, "text": "welcome" },
    { "start": 1.20, "end": 1.44, "text": "to" },
    { "start": 1.44, "end": 1.68, "text": "our" },
    { "start": 1.68, "end": 2.34, "text": "presentation" }
  ]
}

Each timestamp object contains:

start - Start time in seconds
end - End time in seconds
text - The transcribed text (sentence or word depending on mode)

cURL Examples

# Basic transcription
curl -X POST https://voice.sogni.ai/transcribe \
  -F "file=@/path/to/audio.mp3"

# With sentence-level timings
curl -X POST https://voice.sogni.ai/transcribe \
  -F "file=@/path/to/audio.mp3" \
  -F "timestamps=true"

# With word-level timings
curl -X POST https://voice.sogni.ai/transcribe \
  -F "file=@/path/to/audio.mp3" \
  -F "wordTimestamps=true"

JavaScript Example

const formData = new FormData();
formData.append('file', audioFile);

const response = await fetch('https://voice.sogni.ai/transcribe', {
  method: 'POST',
  body: formData
});

const data = await response.json();
console.log(data.transcript);

Python Example

import requests

with open('audio.mp3', 'rb') as f:
    response = requests.post(
        'https://voice.sogni.ai/transcribe',
        files={'file': f}
    )

data = response.json()
print(data['transcript'])

2. TTS (Text-to-Speech) - Kokoro

Convert text to spoken audio using Kokoro TTS. Returns a WAV audio file by default.

POST /tts

Request

Send JSON with the text and optional parameters.

Parameter	Type	Default	Description
`text` *	string	-	Text to convert to speech (1-10,000 characters)
`voice` (optional)	string	af_heart	Voice to use (see /tts/voices for available voices)
`speed` (optional)	number	1.0	Speech speed multiplier (0.5 to 2.0)
`format` (optional)	string	wav	Output format: "wav", "opus" (returns audio file) or "buffer" (returns base64 JSON)
`timestamps` (optional)	boolean	false	Include word-level timestamps for subtitle generation (forces JSON response)

Response (format: wav)

Returns binary WAV audio data with Content-Type: audio/wav

Response (format: buffer)

{
  "success": true,
  "audio": "UklGRiQAAABXQVZFZm10IBAA...",  // base64 encoded WAV
  "voice": "af_heart",
  "speed": 1.0,
  "format": "wav"
}

cURL Example

curl -X POST https://voice.sogni.ai/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!", "voice": "af_heart", "speed": 1.0}' \
  --output output.wav

JavaScript Example

const response = await fetch('https://voice.sogni.ai/tts', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    text: 'Hello, world!',
    voice: 'af_heart',
    speed: 1.0
  })
});

const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();

Python Example

import requests

response = requests.post(
    'https://voice.sogni.ai/tts',
    json={'text': 'Hello, world!', 'voice': 'af_heart', 'speed': 1.0}
)

with open('output.wav', 'wb') as f:
    f.write(response.content)

List Available Voices

GET /tts/voices

Response

{
  "voices": ["af_heart", "af_bella", "am_adam", ...],
  "default": "af_heart"
}

3. TTS (Text-to-Speech) - Pocket

Kyutai Pocket TTS is a lightweight 100M-parameter, CPU-only, English-only TTS with ~200ms latency and voice cloning support. Requires POCKET_TTS_ENABLED=true.

Generate Speech

POST /pocket-tts

Parameter	Type	Default	Description
`text` *	string	-	Text to convert to speech (1-10,000 characters)
`voice` (optional)	string	alba	Built-in voice: alba, marius, javert, jean, fantine, cosette, eponine, azelma
`format` (optional)	string	wav	Output format: "wav", "opus" (returns audio file) or "buffer" (returns base64 JSON)

cURL Example

curl -X POST https://voice.sogni.ai/pocket-tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!", "voice": "alba"}' \
  --output output.wav

List Voices

GET /pocket-tts/voices

Response

{
  "voices": ["alba", "marius", "javert", "jean", "fantine", "cosette", "eponine", "azelma"],
  "clones": ["my_clone"],
  "default": "alba"
}

Create Voice Clone

POST /pocket-tts/voices/clone

Upload a reference audio file to create a voice clone. No transcript needed.

Parameter	Type	Description
`audio` *	File	Reference audio file (WAV, MP3, OGG)
`cloneId` (optional)	string	Custom name for the clone (alphanumeric, underscore, hyphen)

cURL Example

curl -X POST https://voice.sogni.ai/pocket-tts/voices/clone \
  -F "audio=@/path/to/reference.wav" \
  -F "cloneId=my_voice"

Response

{
  "success": true,
  "cloneId": "my_voice",
  "message": "Voice clone created successfully"
}

Generate with Cloned Voice

POST /pocket-tts/voices/clone/{cloneId}/generate

cURL Example

curl -X POST https://voice.sogni.ai/pocket-tts/voices/clone/my_voice/generate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from my cloned voice!"}' \
  --output output.wav

Delete Voice Clone

DELETE /pocket-tts/voices/clone/{cloneId}

cURL Example

curl -X DELETE https://voice.sogni.ai/pocket-tts/voices/clone/my_voice

Download Voice Clone

GET /pocket-tts/voices/clone/{cloneId}/download

Download a voice clone as a ZIP file containing the reference audio and metadata. Useful for backup or transferring clones between servers.

Response

Returns a ZIP file (Content-Type: application/zip) containing:

reference.wav - The reference audio file
metadata.json - Clone metadata (clone ID, source audio filename)

cURL Example

curl https://voice.sogni.ai/pocket-tts/voices/clone/my_voice/download \
  --output my_voice.zip

JavaScript Example

const response = await fetch(
  'https://voice.sogni.ai/pocket-tts/voices/clone/my_voice/download'
);

const blob = await response.blob();
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'my_voice.zip';
a.click();

Python Example

import requests

response = requests.get(
    'https://voice.sogni.ai/pocket-tts/voices/clone/my_voice/download'
)

with open('my_voice.zip', 'wb') as f:
    f.write(response.content)

Import Voice Clone

POST /pocket-tts/voices/clone/import

Import a previously exported voice clone from a ZIP file. The ZIP must contain a reference.wav file.

Parameter	Type	Description
`file` *	File	ZIP file containing the voice clone
`cloneId` (optional)	string	Custom name for the imported clone. If omitted, uses the name from metadata.

cURL Example

curl -X POST https://voice.sogni.ai/pocket-tts/voices/clone/import \
  -F "file=@my_voice.zip" \
  -F "cloneId=restored_voice"

Response

{
  "success": true,
  "cloneId": "restored_voice",
  "message": "Voice clone imported successfully"
}

JavaScript Example

const formData = new FormData();
formData.append('file', zipFile);
formData.append('cloneId', 'my_imported_voice');

const response = await fetch('https://voice.sogni.ai/pocket-tts/voices/clone/import', {
  method: 'POST',
  body: formData
});

const data = await response.json();
console.log('Imported clone:', data.cloneId);

Python Example

import requests

with open('my_voice.zip', 'rb') as f:
    response = requests.post(
        'https://voice.sogni.ai/pocket-tts/voices/clone/import',
        files={'file': f},
        data={'cloneId': 'restored_voice'}
    )

data = response.json()
print(f"Imported clone: {data['cloneId']}")

4. TTS (Text-to-Speech) - Qwen3

Qwen3-TTS is a powerful multilingual TTS model with voice cloning, emotion control, and voice design capabilities. Requires QWEN_TTS_ENABLED=true. Supports 11 languages including English, Chinese, Japanese, Korean, French, German, Spanish, and more.

Generate Speech

POST /qwen-tts

Parameter	Type	Default	Description
`text` *	string	-	Text to convert to speech (1-10,000 characters)
`voice` (optional)	string	Chelsie	Voice to use (see /qwen-tts/voices for available voices)
`language` (optional)	string	English	Language for synthesis
`format` (optional)	string	wav	Output format: "wav", "opus", or "buffer" (base64 JSON)

cURL Example

curl -X POST https://voice.sogni.ai/qwen-tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!", "voice": "Chelsie"}' \
  --output output.wav

Custom Voice (Emotion/Style Control)

POST /qwen-tts/custom-voice

Generate speech with emotion and style instructions. Requires the CustomVoice model variant.

Parameter	Type	Description
`text` *	string	Text to convert to speech
`speaker` (optional)	string	Speaker voice to use (default: Chelsie)
`instruct` *	string	Emotion/style instruction (e.g., "Very happy and excited", "Speak slowly with a calm tone")
`format` (optional)	string	Output format: "wav", "opus", or "buffer"

cURL Example

curl -X POST https://voice.sogni.ai/qwen-tts/custom-voice \
  -H "Content-Type: application/json" \
  -d '{"text": "I am so excited!", "speaker": "Chelsie", "instruct": "Very happy and enthusiastic"}' \
  --output excited.wav

Voice Design (Create Voice from Description)

POST /qwen-tts/voice-design

Generate speech using a voice created from a text description. Requires the VoiceDesign model variant.

Parameter	Type	Description
`text` *	string	Text to convert to speech
`instruct` *	string	Voice description (e.g., "A deep male voice with a warm, calm tone")
`format` (optional)	string	Output format: "wav", "opus", or "buffer"

cURL Example

curl -X POST https://voice.sogni.ai/qwen-tts/voice-design \
  -H "Content-Type: application/json" \
  -d '{"text": "Welcome to our service.", "instruct": "A deep male voice with calm, professional tone"}' \
  --output designed_voice.wav

List Voices

GET /qwen-tts/voices

Response

{
  "voices": ["Chelsie", "Ethan", "Serena", "Vivian", "Ryan", "Aiden", "Eric", "Dylan"],
  "clones": ["my_clone"],
  "default": "Chelsie",
  "defaultLanguage": "English",
  "modelVariants": {
    "base": "base-0.6b",
    "customVoice": "custom-voice"
  },
  "features": ["voice_cloning", "custom_voice"]
}

Create Voice Clone

POST /qwen-tts/voices/clone

Upload a reference audio file with its transcript to create a voice clone. Requires the Base model variant.

Parameter	Type	Description
`audio` *	File	Reference audio file (3-10 seconds, WAV/MP3/OGG)
`transcript` *	string	Exact text spoken in the reference audio
`cloneId` (optional)	string	Custom name for the clone (alphanumeric, underscore, hyphen)

cURL Example

curl -X POST https://voice.sogni.ai/qwen-tts/voices/clone \
  -F "audio=@/path/to/reference.wav" \
  -F "transcript=Hello, this is my voice sample." \
  -F "cloneId=my_voice"

Response

{
  "success": true,
  "cloneId": "my_voice",
  "message": "Voice clone created successfully"
}

Generate with Cloned Voice

POST /qwen-tts/voices/clone/{cloneId}/generate

Parameter	Type	Description
`text` *	string	Text to convert to speech
`language` (optional)	string	Language for synthesis (default: English)
`format` (optional)	string	Output format: "wav", "opus", or "buffer"

cURL Example

curl -X POST https://voice.sogni.ai/qwen-tts/voices/clone/my_voice/generate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from my cloned voice!"}' \
  --output cloned_output.wav

Rename Voice Clone

PATCH /qwen-tts/voices/clone/{cloneId}

Parameter	Type	Description
`newCloneId` *	string	New name for the voice clone

cURL Example

curl -X PATCH https://voice.sogni.ai/qwen-tts/voices/clone/my_voice \
  -H "Content-Type: application/json" \
  -d '{"newCloneId": "renamed_voice"}'

Delete Voice Clone

DELETE /qwen-tts/voices/clone/{cloneId}

cURL Example

curl -X DELETE https://voice.sogni.ai/qwen-tts/voices/clone/my_voice

Response

{
  "success": true,
  "cloneId": "my_voice",
  "message": "Voice clone deleted successfully"
}

5. Health Check

Check if the API server is running and healthy.

GET /health

Response

{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "uptime": 3600
}

cURL Example

curl https://voice.sogni.ai/health