API Docs

    Overview
    Authentication
    Check Permission
    Text APIs
      Text Translation
      Text to Speech
      Text to Speech (Voice Cloning)
      Multi-Speaker Text to Text
      Multi-Speaker Text to Speech (Voice Cloning)
    Speech Recognition APIs
      Speech to Text
      Multi-Speaker Speech to Text
    Video APIs
      Video Subtitling
      Video Translation
      Video Translation (Voice Cloning)

API Docs

Multi-Speaker Speech to Text

Convert multi-speaker audio into structured text with speaker attribution and timestamps.

Create Multi-Speaker Speech to Text Request

POST

/api/ms-stt

This endpoint converts multi-speaker audio into structured, speaker-attributed text using Narris Multi-Speaker Speech to Text models. Speaker diarization is applied automatically.

The request is processed asynchronously. Once accepted, the API returns a unique log_id which can be used to track transcription progress and retrieve the final output.


Request Body

project_title (string, required)

A human-readable title to identify the multi-speaker speech to text project.

Example: "My Project"

file (file, optional)

Audio file to be transcribed. Required if youtube_url is not provided. Do not provide both file and youtube_url.

Example: "audio.wav"

youtube_url (string, optional)

YouTube video URL to transcribe audio from. Required if file is not provided. Do not provide both file and youtube_url.

Example: "https://www.youtube.com/watch?v=XXXX"

language (string, required)

Language spoken in the audio or video.

View example →

stt_model (string, required)

Speech to Text model to be used for multi-speaker transcription.

Example: "narris_fast"

output_file_type (string, optional)

Type of output to generate. Allowed values are audio or video. Defaults to audio if not provided.

Example: "video"

Notes:
• You must provide either an audio file or a youtube_url. Providing both or neither will result in a validation error.
• The output_file_type field controls the generated output format. If omitted, the API defaults to audio.


Response

On successful submission, the API returns a unique log_id.
Use this log_id with the Fetch Multi-Speaker Speech to Text By ID endpoint to retrieve speaker-attributed transcripts and output files.

{
  "log_id": "69504cea7d5247d58c029ec9"
}

curl

curl --location 'https://api.dev.narris.io/api/ms-stt' \
--header 'x-api-key: YOUR_API_KEY' \
--form 'project_title="My Project"' \
--form 'language="hindi"' \
--form 'stt_model="narris_fast"' \
--form 'file=@"/path/to/audio.wav"' \
--form 'output_file_type="video"'

Fetch Multi-Speaker Speech to Text List

GET

/api/ms-stt/logs

This endpoint allows you to fetch a paginated list of previously created multi-speaker speech to text requests.

Each entry represents a diarized transcription job and includes its current status and creation timestamp.


Request Body

page (number, optional)

Page number for pagination.

Example: 1

limit (number, optional)

Number of records to return per page.

Example: 10


Response

On success, the API returns a paginated list of multi-speaker speech to text logs.
Each log contains a unique _id which can be used with the Fetch Multi-Speaker Speech to Text By ID endpoint to retrieve speaker-attributed transcripts and output metadata.

{
  "total": 10,
  "page": 1,
  "limit": 10,
  "logs": [
    {
      "_id": "69504cea7d5247d58c029ec9",
      "project_title": "My Project",
      "status": "finished",
      "createdAt": "2025-12-27T21:17:30.352Z"
    },
    {
      "_id": "694b07cd28310102bd38547a",
      "project_title": "My Project",
      "status": "failed",
      "createdAt": "2025-12-23T21:21:17.791Z"
    },
    {
      "_id": "694b0639767af4524d81c784",
      "project_title": "My Project",
      "status": "processing",
      "createdAt": "2025-12-23T21:14:33.599Z"
    }
  ]
}

curl

curl --location 'https://api.dev.narris.io/api/ms-stt/logs?page=1&limit=10' \
--header 'Content-Type: application/json' \
--header 'x-api-key: YOUR_API_KEY'

Fetch Multi-Speaker Speech to Text By ID

GET

/api/ms-stt/{log_id}

This endpoint allows you to fetch the complete details of a multi-speaker speech to text request using its unique log_id.

The response includes diarized speaker segments, transcription metadata, timestamps, and output configuration once the job is completed.


Request Body

log_id (string, required)

Unique identifier of the multi-speaker speech to text request returned during creation or from logs.

Example: "69504cea7d5247d58c029ec9"


Response

On success, the API returns detailed information about the multi-speaker speech to text request.
Each segment includes speaker labels, timestamps, and transcribed text.

{
  "_id": "69504cea7d5247d58c029ec9",
  "project_title": "My Project",
  "output_file_type": "audio",
  "input_file": "https://lingui-dev.s3.amazonaws.com/input/20251218195657_20251218195649_1766870248756.wav",
  "language": "hindi",
  "stt_model": "narris_fast",
  "status": "finished",
  "createdAt": "2025-12-27T21:17:30.352Z",
  "updatedAt": "2025-12-27T21:17:35.853Z",
  "segments": [
    {
      "original_speaker": "SPEAKER_01",
      "speed": null,
      "pitch": null,
      "original_text": "आपका पसंदीदा जानवर कौन सा है?",
      "start_time": "00:00:00,290",
      "end_time": "00:00:02,210"
    },
    {
      "original_speaker": "SPEAKER_00",
      "speed": null,
      "pitch": null,
      "original_text": "मेरे पसंदीदा जानवर कुट्टे, बिल्लिया डॉल्फिन है.",
      "start_time": "00:00:02,750",
      "end_time": "00:00:05,720"
    }
  ]
}

curl

curl --location 'https://api.dev.narris.io/api/ms-stt/69504cea7d5247d58c029ec9' \
--header 'Content-Type: application/json' \
--header 'x-api-key: YOUR_API_KEY'

Notes for Developers

• Multi-speaker speech to text requests are processed asynchronously. Always store the returned log_id to track transcription status.
• Speaker diarization automatically separates and labels speakers (for example: SPEAKER_00, SPEAKER_01).
• Requests may remain in pending or processing state depending on audio length, number of speakers, and system load.
• Use the Fetch Multi-Speaker Speech to Text List endpoint to view all jobs and Fetch Multi-Speaker Speech to Text By ID to retrieve speaker-attributed transcripts and output files.