Kling v3 Omni Video Generation

curl --request POST \
  --url https://api.apimart.ai/v1/videos/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "kling-v3-omni",
    "prompt": "Make the person in <<<image_1>>> wave at the camera",
    "image_urls": ["https://upload.apimart.ai/f/models/9998230426123070-e9d6af04-cb5e-4731-8ae7-abf144cb0d29-9998230586368386-29641169-f698-4ab9-9b6d-380899e6521e-9998230593110693-c1741a3a-.webp"],
    "mode": "std",
    "duration": 5,
    "aspect_ratio": "16:9"
  }'

{
  "code": 200,
  "data": [
    {
      "status": "submitted",
      "task_id": "task_xxxxxxxxxx"
    }
  ]
}

POST

videos

generations

curl --request POST \
  --url https://api.apimart.ai/v1/videos/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "kling-v3-omni",
    "prompt": "Make the person in <<<image_1>>> wave at the camera",
    "image_urls": ["https://upload.apimart.ai/f/models/9998230426123070-e9d6af04-cb5e-4731-8ae7-abf144cb0d29-9998230586368386-29641169-f698-4ab9-9b6d-380899e6521e-9998230593110693-c1741a3a-.webp"],
    "mode": "std",
    "duration": 5,
    "aspect_ratio": "16:9"
  }'

{
  "code": 200,
  "data": [
    {
      "status": "submitted",
      "task_id": "task_xxxxxxxxxx"
    }
  ]
}

curl --request POST \
  --url https://api.apimart.ai/v1/videos/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "kling-v3-omni",
    "prompt": "Make the person in <<<image_1>>> wave at the camera",
    "image_urls": ["https://upload.apimart.ai/f/models/9998230426123070-e9d6af04-cb5e-4731-8ae7-abf144cb0d29-9998230586368386-29641169-f698-4ab9-9b6d-380899e6521e-9998230593110693-c1741a3a-.webp"],
    "mode": "std",
    "duration": 5,
    "aspect_ratio": "16:9"
  }'

{
  "code": 200,
  "data": [
    {
      "status": "submitted",
      "task_id": "task_xxxxxxxxxx"
    }
  ]
}

Authorization

string

required

All API endpoints require Bearer Token authenticationGet your API Key:Visit the API Key Management Page to get your API KeyAdd it to the request header:

Authorization: Bearer YOUR_API_KEY

Request Parameters

model

string

required

Video generation model nameSupported models:

kling-v3-omni - Kling v3 Omni (unified interface)

prompt

string

required

Positive text promptSupports referencing images from image_urls using <<<image_N>>> syntax, where N starts from 1.Example: "Make the person in <<<image_1>>> wave at the camera"

If images are provided but the prompt does not contain any <<<image_N>>> reference, the system will automatically prepend <<<image_1>>> to the prompt.

negative_prompt

string

Negative prompt used to exclude unwanted content. Maximum length is 2500 characters.

mode

string

default:"std"

Generation modeOptions:

std - Standard mode (720P)
pro - Professional mode (1080P)
4k - 4K ultra HD mode

Default: std

duration

integer

default:"5"

Default: 5 Video duration (seconds)Range: 3-15 (minimum 3 seconds, maximum 15 seconds)⚠️ Note: Must be a plain number (e.g. 6), do not add quotes, otherwise an error will occur

aspect_ratio

string

default:"16:9"

Video aspect ratioOptions:

16:9 - Landscape
9:16 - Portrait
1:1 - Square

Default: 16:9

image_urls

array<url>

Image URL array for image referencingReference corresponding images in the prompt using <<<image_N>>> syntax (N starts from 1)Example: ["https://example.com/photo.jpg"]

Image URLs must be publicly accessible without hotlink protection
In image-to-video mode, aspect_ratio may be overridden by the actual image ratio

image_with_roles

array<object>

Role-based image array, recommended for image-to-video.Each item format: { "url": "...", "role": "..." }

first_frame: first frame
last_frame: last frame
reference: reference image

image_urls and image_with_roles are mutually exclusive. Use only one.

Reference video list (URL-based), up to 1 video.Use refer_type to distinguish types:

base: video to be edited (default)
feature: feature reference video

Use keep_original_sound to control original audio:

no: do not keep (default)
yes: keep original sound

Request format:

"video_list":[
  { "video_url": "video_url", "refer_type": "base", "keep_original_sound": "no" }
]

video_url cannot be empty, and the video URL must be accessible
When refer_type=base:
- Start/end frames cannot be defined
- Reference video must be 3-10 seconds
- Generated video duration follows the uploaded video
When refer_type=feature and video_url is not empty:
- image_urls can only include a first-frame image
Video requirements: MP4/MOV only; duration at least 3 seconds; resolution 720px-2160px; frame rate 24-60fps (output is 24fps); size no more than 200MB

multi_shot

boolean

default:"false"

Whether to enable multi-shot mode.

shot_type

string

Shot split method: customize / intelligence.Required when multi_shot=true.

multi_prompt

array<object>

Multi-shot list, each item is { index, prompt, duration }.

Minimum 1 shot, maximum 6 shots
Each shot duration must be an integer and >= 1
Sum of all shot durations must equal top-level duration
index must start from 1 and increase continuously
Required when multi_shot=true and shot_type=customize

Example:

[
  { "index": 1, "prompt": "a happy dog in running@element_cat", "duration": 3 },
  { "index": 2, "prompt": "a happy dog play with a cat@element_dog", "duration": 3 }
]

element_list

array<object>

Reference subject list, up to 3 subjects. Supports:

Create subjects on the fly with name, description, element_input_urls

Common format:

[
  {
    "name": "element_dog",
    "description": "a golden retriever, fluffy fur, friendly expression",
    "element_input_urls": [
      "https://example.com/image1.png",
      "https://example.com/image2.png"
    ]
  },
  {
    "name": "element_cat",
    "description": "an orange tabby cat, round face, bright eyes",
    "element_input_urls": [
      "https://example.com/image1.png",
      "https://example.com/image2.png"
    ]
  }
]

Notes:

For on-the-fly creation, name, description, element_input_urls are required
element_input_urls: 2 to 4 images per subject (first as frontal image, others as references)
Use @name in prompt, e.g. "@element_dog and @element_cat are playing on the grass"

watermark

boolean

Whether to add watermark

audio

boolean

default:"false"

Whether to generate video with audio

This parameter is mutually exclusive with video_list.When video_list has a value, the audio parameter is not needed.

Parameter Constraints and Boundaries

image_urls and image_with_roles are mutually exclusive
mode=4k is available for kling-v3-omni
Last-frame-only input (last_frame without first frame) is invalid
Start/end frames and video edit are mutually exclusive: when video_list.refer_type=base (or omitted), start/end frames are not allowed
When video_list is present, audio is ignored
video_list supports at most 1 video
multi_prompt supports up to 6 shots, with index starting from 1 and increasing continuously

Image Reference Syntax

The Omni model uses <<<image_N>>> syntax to reference images in prompts, providing a unified text-to-video/image-to-video experience:

Syntax	Description
`<<<image_1>>>`	References the 1st image in the `image_urls` array
`<<<image_2>>>`	References the 2nd image in the `image_urls` array

Auto Reference: If image_urls is provided but the prompt does not contain any <<<image_N>>> reference, the system will automatically prepend <<<image_1>>> to the prompt.

Response

code

integer

Response status code, 200 on success

data

array

Response data array

Show Array Elements

status

string

Task status, submitted when initially submitted

task_id

string

Unique task identifier for querying task status and results

Use Cases

Case 1: Text-to-Video (Standard Mode)

{
  "model": "kling-v3-omni",
  "prompt": "A golden retriever running on the beach, sunset, cinematic",
  "mode": "std",
  "duration": 5,
  "aspect_ratio": "16:9"
}

Case 2: Image Reference (Single Image)

{
  "model": "kling-v3-omni",
  "prompt": "Make the person in <<<image_1>>> wave at the camera",
  "image_urls": ["https://upload.apimart.ai/f/models/9998230426123070-e9d6af04-cb5e-4731-8ae7-abf144cb0d29-9998230586368386-29641169-f698-4ab9-9b6d-380899e6521e-9998230593110693-c1741a3a-.webp"],
  "mode": "pro",
  "duration": 5
}

Case 3: Multiple Image References

{
  "model": "kling-v3-omni",
  "prompt": "The character in <<<image_1>>> walks toward the scene in <<<image_2>>>",
  "image_urls": [
    "https://example.com/character.jpg",
    "https://example.com/scene.jpg"
  ],
  "mode": "pro",
  "duration": 5
}

Case 4: Image Provided Without Explicit Reference (Auto-added)

{
  "model": "kling-v3-omni",
  "prompt": "The person slowly turns and smiles",
  "image_urls": ["https://upload.apimart.ai/f/models/9998230426123070-e9d6af04-cb5e-4731-8ae7-abf144cb0d29-9998230586368386-29641169-f698-4ab9-9b6d-380899e6521e-9998230593110693-c1741a3a-.webp"],
  "mode": "std",
  "duration": 5
}

The system will automatically prepend <<<image_1>>> to the prompt, equivalent to "<<<image_1>>>The person slowly turns and smiles".

Case 5: Generate Video with Audio

{
  "model": "kling-v3-omni",
  "prompt": "A yellow canary singing on a branch",
  "audio": true,
  "mode": "std",
  "duration": 5
}

Note: audio is mutually exclusive with video_list. When video_list has a value, the audio parameter is not needed.

Query Task ResultsVideo generation is an async task that returns a task_id upon submission. Use the Get Task Status endpoint to query generation progress and results.

Kling v3 Video Generation Kling Video O1 Video Generation

Documentation Index

​Authorization

​Request Parameters

​Parameter Constraints and Boundaries

​Image Reference Syntax

​Response

​Use Cases

​Case 1: Text-to-Video (Standard Mode)

​Case 2: Image Reference (Single Image)

​Case 3: Multiple Image References

​Case 4: Image Provided Without Explicit Reference (Auto-added)

​Case 5: Generate Video with Audio

Authorization

Request Parameters

Parameter Constraints and Boundaries

Image Reference Syntax

Response

Use Cases

Case 1: Text-to-Video (Standard Mode)

Case 2: Image Reference (Single Image)

Case 3: Multiple Image References

Case 4: Image Provided Without Explicit Reference (Auto-added)

Case 5: Generate Video with Audio