API Docs

    Overview
    Authentication
    Check Permission
    Text APIs
      Text Translation
      Text to Speech
      Text to Speech (Voice Cloning)
      Multi-Speaker Text to Text
      Multi-Speaker Text to Speech (Voice Cloning)
    Speech Recognition APIs
      Speech to Text
      Multi-Speaker Speech to Text
    Video APIs
      Video Subtitling
      Video Translation
      Video Translation (Voice Cloning)

API Docs

Multi-Speaker Text to Text

Convert multi-speaker conversational input into structured, speaker-attributed text.

Create Multi-Speaker Text to Text Request

POST

/api/ms-ttt

This endpoint creates a multi-speaker text to text request where each speaker segment is processed independently while preserving speaker attribution.

The request is processed asynchronously. Once accepted, the API returns a unique log_id that can be used to fetch the translated, speaker-separated output.


Request Body

project_title (string, required)

A human-readable title to identify the multi-speaker text to text project.

Example: "My Project"

segments (array, required)

List of speaker segments containing original text and timing information.

Example: [ { "original_speaker": "SPEAKER_00", "original_text": "regardless of my", "start_time": "00:00:00,070", "end_time": "00:00:00,910" } ]

segments[].original_speaker (string, required)

Identifier of the speaker for the segment (for example: SPEAKER_00, SPEAKER_01).

segments[].original_text (string, required)

Original text spoken by the speaker in this segment.

segments[].start_time (string, required)

Start timestamp of the segment in HH:MM:SS,ms format.

Example: "00:00:00,070"

segments[].end_time (string, required)

End timestamp of the segment in HH:MM:SS,ms format.

Example: "00:00:00,910"

segments[].speed (number | null, optional)

Optional speech speed metadata associated with the segment.

segments[].pitch (number | null, optional)

Optional pitch metadata associated with the segment.

input_language (string, required)

Language of the input speaker segments.

View example →

output_language (string, required)

Target language for the translated speaker segments.

View example →

model (string, required)

Translation model used for processing multi-speaker segments.

View example →

Response

On successful submission, the API returns a unique log_id.
Use this log_id with the Fetch Multi-Speaker Text to Text By ID endpoint to retrieve the translated, speaker-attributed output.

{
  "log_id": "695031597d5247d58c029e93"
}

curl

curl --location 'https://api.narris.io/api/ms-ttt' \
--header 'Content-Type: application/json' \
--header 'x-api-key: YOUR_API_KEY' \
--data '{
  "project_title": "My Project",
  "segments": [
    {
      "original_speaker": "SPEAKER_00",
      "speed": null,
      "pitch": null,
      "original_text": "regardless of my",
      "start_time": "00:00:00,070",
      "end_time": "00:00:00,910"
    }
  ],
  "input_language": "english",
  "output_language": "hindi",
  "model": "narris"
}'

Fetch Multi-Speaker Text to Text List

GET

/api/ms-ttt/logs

This endpoint allows you to fetch a paginated list of previously created multi-speaker text to text requests.

Each entry represents a multi-speaker processing job and includes its current status and creation timestamp.


Request Body

page (number, optional)

Page number for pagination.

Example: 1

limit (number, optional)

Number of records to return per page.

Example: 20


Response

On success, the API returns a paginated list of multi-speaker text to text logs.
Each log contains a unique _id which can be used with the Fetch Multi-Speaker Text to Text By ID endpoint to retrieve the speaker-attributed translated output.

{
  "total": 10,
  "page": 1,
  "limit": 20,
  "logs": [
    {
      "_id": "695031597d5247d58c029e93",
      "project_title": "My Project",
      "status": "finished",
      "createdAt": "2025-12-27T19:19:53.378Z"
    },
    {
      "_id": "694459d4d408648d0d9efe0c",
      "project_title": "My Project",
      "status": "failed",
      "createdAt": "2025-12-18T19:45:24.359Z"
    }
  ]
}

curl

curl --location 'https://api.narris.io/api/ms-ttt/logs?page=1&limit=20' \
--header 'Content-Type: application/json' \
--header 'x-api-key: YOUR_API_KEY'

Fetch Multi-Speaker Text to Text By ID

GET

/api/ms-ttt/{log_id}

This endpoint allows you to fetch the complete details of a multi-speaker text to text request using its unique log_id.

The response preserves speaker attribution and returns translated text for each speaker segment along with timing metadata.


Request Body

log_id (string, required)

Unique identifier of the multi-speaker text to text request returned during creation or from logs.

Example: "694b08eebca52cafa955756d"


Response

On success, the API returns detailed information about the multi-speaker text to text request.
Each segment includes both the original and translated text while preserving speaker labels and timestamps.

{
  "_id": "694b08eebca52cafa955756d",
  "project_title": "My Project",
  "segments": [
    {
      "original_speaker": "SPEAKER_00",
      "speed": null,
      "pitch": null,
      "original_text": "regardless of my",
      "start_time": "00:00:00,070",
      "end_time": "00:00:00,910",
      "translated_text": "मेरी परवाह किए बिना"
    }
  ],
  "input_language": "english",
  "output_language": "hindi",
  "model": "narris",
  "status": "finished",
  "createdAt": "2025-12-23T21:26:06.579Z",
  "updatedAt": "2025-12-23T21:26:21.750Z"
}

curl

curl --location 'https://api.narris.io/api/ms-ttt/694b08eebca52cafa955756d' \
--header 'Content-Type: application/json' \
--header 'x-api-key: YOUR_API_KEY'

Notes for Developers

• Multi-speaker text to text requests are processed asynchronously. Always store the returned log_id to track processing status.
• Each request may contain multiple speaker segments, and the final output will preserve speaker boundaries and labels.
• Jobs may remain in pending or processing state depending on conversation length and system load.
• Use the Fetch Multi-Speaker Text to Text List endpoint to view all jobs and Fetch Multi-Speaker Text to Text By ID to retrieve the final structured output.