Whisper-1 Audio Transcription

curl --request POST \
  --url https://api.apimart.ai/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=@/path/to/audio.mp3' \
  --form 'model=whisper-1' \
  --form 'language=en' \
  --form 'response_format=json'

{
  "text": "This is a transcribed text from the test audio."
}

Authorizations

Authorization

string

required

All interfaces require Bearer Token authenticationGet API Key:Visit API Key Management Page to get your API KeyAdd to request header:

Authorization: Bearer YOUR_API_KEY

Body

⚠️ Online testing (Try it) is not supported for this endpointDue to file upload limitations, please test using:

Apifox / Postman - Manually change file parameter to file type after importing
cURL - Refer to code examples on the right
SDK - Use SDK examples in various languages

file

string

required

Audio file to transcribe (File type)⚠️ Note: When testing with Apifox or similar tools:

After importing, manually change this parameter type to file
Ensure request Content-Type is multipart/form-data

Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webmMaximum file size: 25 MB

model

string

default:"whisper-1"

required

Speech recognition model nameExample: "whisper-1"

language

string

Language code of the audio (ISO-639-1 format)Specifying the language can improve accuracy and speedSupported languages include: zh (Chinese), en (English), ja (Japanese), ko (Korean), and 99 other languagesExample: "en"

prompt

string

Optional text prompt to guide the transcription style or continue from previous audioMaximum 224 tokens

response_format

string

default:"json"

Output formatSupported formats:

json - JSON format (text only)
text - Plain text
srt - SRT subtitle format
verbose_json - Verbose JSON format (includes timestamps and metadata)
vtt - WebVTT subtitle format

temperature

number

default:"0"

Sampling temperature, range 0 to 1Higher values (like 0.8) make output more random, lower values (like 0.2) make it more deterministic and consistent

Response

text

string

Transcribed text content

task

string

Task type, fixed as transcribeOnly returned in verbose_json format

language

string

Detected or specified language codeOnly returned in verbose_json format

duration

number

Audio duration (seconds)Only returned in verbose_json format

segments

array

Array of text segmentsOnly returned in verbose_json format

Show Properties

integer

Segment ID

start

number

Segment start time (seconds)

end

number

Segment end time (seconds)

text

string

Segment text content

temperature

number

Sampling temperature used

avg_logprob

number

Average log probability

compression_ratio

number

Compression ratio

no_speech_prob

number

No speech probability

Documentation Index

​Authorizations

​Body

​Response

Authorizations

Body

Response