API Docs
API Docs
Convert multi-speaker audio into structured text with speaker attribution and timestamps.
/api/ms-stt
This endpoint converts multi-speaker audio into structured, speaker-attributed text using Narris Multi-Speaker Speech to Text models. Speaker diarization is applied automatically.
The request is processed asynchronously. Once accepted, the API returns a unique log_id which can be used to track transcription progress and retrieve the final output.
Request Body
project_title (string, required)
A human-readable title to identify the multi-speaker speech to text project.
Example: "My Project"
file (file, optional)
Audio file to be transcribed. Required if youtube_url is not provided. Do not provide both file and youtube_url.
Example: "audio.wav"
youtube_url (string, optional)
YouTube video URL to transcribe audio from. Required if file is not provided. Do not provide both file and youtube_url.
Example: "https://www.youtube.com/watch?v=XXXX"
stt_model (string, required)
Speech to Text model to be used for multi-speaker transcription.
Example: "narris_fast"
output_file_type (string, optional)
Type of output to generate. Allowed values are audio or video. Defaults to audio if not provided.
Example: "video"
Notes:
• You must provide either an audio file or a youtube_url. Providing both or neither will result in a validation error.
• The output_file_type field controls the generated output format. If omitted, the API defaults to audio.
Response
On successful submission, the API returns a unique log_id.
Use this log_id with the Fetch Multi-Speaker Speech to Text By ID endpoint to retrieve speaker-attributed transcripts and output files.
{
"log_id": "69504cea7d5247d58c029ec9"
}curl
curl --location 'https://api.dev.narris.io/api/ms-stt' \ --header 'x-api-key: YOUR_API_KEY' \ --form 'project_title="My Project"' \ --form 'language="hindi"' \ --form 'stt_model="narris_fast"' \ --form 'file=@"/path/to/audio.wav"' \ --form 'output_file_type="video"'
/api/ms-stt/logs
This endpoint allows you to fetch a paginated list of previously created multi-speaker speech to text requests.
Each entry represents a diarized transcription job and includes its current status and creation timestamp.
Request Body
page (number, optional)
Page number for pagination.
Example: 1
limit (number, optional)
Number of records to return per page.
Example: 10
Response
On success, the API returns a paginated list of multi-speaker speech to text logs.
Each log contains a unique _id which can be used with the Fetch Multi-Speaker Speech to Text By ID endpoint to retrieve speaker-attributed transcripts and output metadata.
{
"total": 10,
"page": 1,
"limit": 10,
"logs": [
{
"_id": "69504cea7d5247d58c029ec9",
"project_title": "My Project",
"status": "finished",
"createdAt": "2025-12-27T21:17:30.352Z"
},
{
"_id": "694b07cd28310102bd38547a",
"project_title": "My Project",
"status": "failed",
"createdAt": "2025-12-23T21:21:17.791Z"
},
{
"_id": "694b0639767af4524d81c784",
"project_title": "My Project",
"status": "processing",
"createdAt": "2025-12-23T21:14:33.599Z"
}
]
}curl
curl --location 'https://api.dev.narris.io/api/ms-stt/logs?page=1&limit=10' \ --header 'Content-Type: application/json' \ --header 'x-api-key: YOUR_API_KEY'
/api/ms-stt/{log_id}
This endpoint allows you to fetch the complete details of a multi-speaker speech to text request using its unique log_id.
The response includes diarized speaker segments, transcription metadata, timestamps, and output configuration once the job is completed.
Request Body
log_id (string, required)
Unique identifier of the multi-speaker speech to text request returned during creation or from logs.
Example: "69504cea7d5247d58c029ec9"
Response
On success, the API returns detailed information about the multi-speaker speech to text request.
Each segment includes speaker labels, timestamps, and transcribed text.
{
"_id": "69504cea7d5247d58c029ec9",
"project_title": "My Project",
"output_file_type": "audio",
"input_file": "https://lingui-dev.s3.amazonaws.com/input/20251218195657_20251218195649_1766870248756.wav",
"language": "hindi",
"stt_model": "narris_fast",
"status": "finished",
"createdAt": "2025-12-27T21:17:30.352Z",
"updatedAt": "2025-12-27T21:17:35.853Z",
"segments": [
{
"original_speaker": "SPEAKER_01",
"speed": null,
"pitch": null,
"original_text": "आपका पसंदीदा जानवर कौन सा है?",
"start_time": "00:00:00,290",
"end_time": "00:00:02,210"
},
{
"original_speaker": "SPEAKER_00",
"speed": null,
"pitch": null,
"original_text": "मेरे पसंदीदा जानवर कुट्टे, बिल्लिया डॉल्फिन है.",
"start_time": "00:00:02,750",
"end_time": "00:00:05,720"
}
]
}curl
curl --location 'https://api.dev.narris.io/api/ms-stt/69504cea7d5247d58c029ec9' \ --header 'Content-Type: application/json' \ --header 'x-api-key: YOUR_API_KEY'
• Multi-speaker speech to text requests are processed asynchronously. Always store the returned log_id to track transcription status.
• Speaker diarization automatically separates and labels speakers (for example: SPEAKER_00, SPEAKER_01).
• Requests may remain in pending or processing state depending on audio length, number of speakers, and system load.
• Use the Fetch Multi-Speaker Speech to Text List endpoint to view all jobs and Fetch Multi-Speaker Speech to Text By ID to retrieve speaker-attributed transcripts and output files.