Batch Transcription
Understand how KoeIQ processes uploaded audio files through the AmiVoice async transcription pipeline.
Processing pipeline
Batch transcription follows this sequence of steps:
Browser/API → FastAPI → AWS S3 → SQS → Worker → AmiVoice async HTTP → PostgreSQL
- Upload: File sent to FastAPI via the browser UI or Ingest API.
- S3 storage: Audio file saved to AWS S3.
- SQS queue: Processing job added to the SQS queue.
- Worker: Worker service picks up the job and submits it to the AmiVoice async HTTP API.
- Result storage: Transcript saved to PostgreSQL.
- Auto-analytics: If auto-generate is on, OpenAI analytics run after transcription completes.
Processing time
Processing time depends on file length and server load.
| Call length | Typical processing time |
|---|---|
| 3 minutes | 3–6 minutes |
| 5 minutes | 5–10 minutes |
| 10 minutes | 10–20 minutes |
| 30 minutes | 30–60 minutes |
ℹ️Expect roughly 1–2× real time. Peak load periods may take longer.
Status polling
Monitor progress in the Dashboard or Call Logs page (auto-refreshes every 5 seconds).
| Status | Meaning |
|---|---|
| Queued | Uploaded to S3 and waiting in the SQS queue. |
| Processing | Worker is transcribing with AmiVoice. |
| Done | Transcription (and analytics if enabled) complete. |
| Failed | An error occurred. Check server logs. |
Language codes
ja-JP— Japanese (AmiVoice Japanese engine)en-US— English
Stereo channel splitting
Set the environment variable TRANSCRIBE_BY_CHANNEL=true to split stereo files before transcription:
- Channel 0 (left): Agent audio
- Channel 1 (right): Customer audio
Each channel is transcribed separately and speaker labels are assigned automatically. Ideal for contact centre recording systems with dedicated per-party channels.
💡Stereo channel splitting often produces more accurate speaker separation than mono diarisation.