Audio Series
Whisper-1 Audio Transcription
- Supports speech recognition in 99 languages
- Multiple output formats: json, text, srt, vtt, etc.
- Maximum file size 25 MB
POST
Documentation Index
Fetch the complete documentation index at: https://docs.apimart.ai/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
All interfaces require Bearer Token authenticationGet API Key:Visit API Key Management Page to get your API KeyAdd to request header:
Body
⚠️ Online testing (Try it) is not supported for this endpointDue to file upload limitations, please test using:
- Apifox / Postman - Manually change
fileparameter to file type after importing - cURL - Refer to code examples on the right
- SDK - Use SDK examples in various languages
Audio file to transcribe (File type)⚠️ Note: When testing with Apifox or similar tools:
- After importing, manually change this parameter type to
file - Ensure request Content-Type is
multipart/form-data
Speech recognition model nameExample:
"whisper-1"Language code of the audio (ISO-639-1 format)Specifying the language can improve accuracy and speedSupported languages include: zh (Chinese), en (English), ja (Japanese), ko (Korean), and 99 other languagesExample:
"en"Optional text prompt to guide the transcription style or continue from previous audioMaximum 224 tokens
Output formatSupported formats:
json- JSON format (text only)text- Plain textsrt- SRT subtitle formatverbose_json- Verbose JSON format (includes timestamps and metadata)vtt- WebVTT subtitle format
Sampling temperature, range 0 to 1Higher values (like 0.8) make output more random, lower values (like 0.2) make it more deterministic and consistent
Response
Transcribed text content
Task type, fixed as
transcribeOnly returned in verbose_json formatDetected or specified language codeOnly returned in verbose_json format
Audio duration (seconds)Only returned in verbose_json format
Array of text segmentsOnly returned in verbose_json format