Docs/Batch Transcription

Batch Transcription

Understand how KoeIQ processes uploaded audio files through the AmiVoice async transcription pipeline.

Processing pipeline

Batch transcription follows this sequence of steps:

Browser/API → FastAPI → AWS S3 → SQS → Worker → AmiVoice async HTTP → PostgreSQL
  1. Upload: File sent to FastAPI via the browser UI or Ingest API.
  2. S3 storage: Audio file saved to AWS S3.
  3. SQS queue: Processing job added to the SQS queue.
  4. Worker: Worker service picks up the job and submits it to the AmiVoice async HTTP API.
  5. Result storage: Transcript saved to PostgreSQL.
  6. Auto-analytics: If auto-generate is on, OpenAI analytics run after transcription completes.

Processing time

Processing time depends on file length and server load.

Call lengthTypical processing time
3 minutes3–6 minutes
5 minutes5–10 minutes
10 minutes10–20 minutes
30 minutes30–60 minutes
ℹ️Expect roughly 1–2× real time. Peak load periods may take longer.

Status polling

Monitor progress in the Dashboard or Call Logs page (auto-refreshes every 5 seconds).

StatusMeaning
QueuedUploaded to S3 and waiting in the SQS queue.
ProcessingWorker is transcribing with AmiVoice.
DoneTranscription (and analytics if enabled) complete.
FailedAn error occurred. Check server logs.

Language codes

  • ja-JP — Japanese (AmiVoice Japanese engine)
  • en-US — English

Stereo channel splitting

Set the environment variable TRANSCRIBE_BY_CHANNEL=true to split stereo files before transcription:

  • Channel 0 (left): Agent audio
  • Channel 1 (right): Customer audio

Each channel is transcribed separately and speaker labels are assigned automatically. Ideal for contact centre recording systems with dedicated per-party channels.

💡Stereo channel splitting often produces more accurate speaker separation than mono diarisation.

Next steps

All DocsContact Support →