Kling v3 Omni
Kling v3 Omni Video Generation
- Async processing mode, returns task ID for subsequent queries
- Unified text-to-video/image-to-video interface with image reference syntax
- Supports standard mode (720P), professional mode (1080P), and 4K mode
- Reference images in prompts using image_N syntax
- Supports generating videos with audio (mutually exclusive with video_list)
POST
Documentation Index
Fetch the complete documentation index at: https://docs.apimart.ai/llms.txt
Use this file to discover all available pages before exploring further.
Authorization
All API endpoints require Bearer Token authenticationGet your API Key:Visit the API Key Management Page to get your API KeyAdd it to the request header:
Request Parameters
Video generation model nameSupported models:
kling-v3-omni- Kling v3 Omni (unified interface)
Positive text promptSupports referencing images from
image_urls using <<<image_N>>> syntax, where N starts from 1.Example: "Make the person in <<<image_1>>> wave at the camera"If images are provided but the prompt does not contain any
<<<image_N>>> reference, the system will automatically prepend <<<image_1>>> to the prompt.Negative prompt used to exclude unwanted content. Maximum length is 2500 characters.
Generation modeOptions:
std- Standard mode (720P)pro- Professional mode (1080P)4k- 4K ultra HD mode
stdDefault:
5
Video duration (seconds)Range: 3-15 (minimum 3 seconds, maximum 15 seconds)⚠️ Note: Must be a plain number (e.g. 6), do not add quotes, otherwise an error will occurVideo aspect ratioOptions:
16:9- Landscape9:16- Portrait1:1- Square
16:9Image URL array for image referencingReference corresponding images in the prompt using
<<<image_N>>> syntax (N starts from 1)Example: ["https://example.com/photo.jpg"]Role-based image array, recommended for image-to-video.Each item format:
{ "url": "...", "role": "..." }first_frame: first framelast_frame: last framereference: reference image
Reference video list (URL-based), up to 1 video.Use
refer_type to distinguish types:base: video to be edited (default)feature: feature reference video
keep_original_sound to control original audio:no: do not keep (default)yes: keep original sound
Whether to enable multi-shot mode.
Shot split method:
customize / intelligence.Required when multi_shot=true.Multi-shot list, each item is
{ index, prompt, duration }.- Minimum 1 shot, maximum 6 shots
- Each shot
durationmust be an integer and >= 1 - Sum of all shot durations must equal top-level
duration indexmust start from 1 and increase continuously- Required when
multi_shot=trueandshot_type=customize
Reference subject list, up to 3 subjects. Supports:Notes:
- Create subjects on the fly with
name,description,element_input_urls
- For on-the-fly creation,
name,description,element_input_urlsare required element_input_urls: 2 to 4 images per subject (first as frontal image, others as references)- Use
@nameinprompt, e.g."@element_dog and @element_cat are playing on the grass"
Whether to add watermark
Whether to generate video with audio
Parameter Constraints and Boundaries
image_urlsandimage_with_rolesare mutually exclusivemode=4kis available forkling-v3-omni- Last-frame-only input (
last_framewithout first frame) is invalid - Start/end frames and video edit are mutually exclusive: when
video_list.refer_type=base(or omitted), start/end frames are not allowed - When
video_listis present,audiois ignored video_listsupports at most 1 videomulti_promptsupports up to 6 shots, withindexstarting from 1 and increasing continuously
Image Reference Syntax
The Omni model uses<<<image_N>>> syntax to reference images in prompts, providing a unified text-to-video/image-to-video experience:
| Syntax | Description |
|---|---|
<<<image_1>>> | References the 1st image in the image_urls array |
<<<image_2>>> | References the 2nd image in the image_urls array |
Auto Reference: If
image_urls is provided but the prompt does not contain any <<<image_N>>> reference, the system will automatically prepend <<<image_1>>> to the prompt.Response
Response status code, 200 on success
Response data array
Use Cases
Case 1: Text-to-Video (Standard Mode)
Case 2: Image Reference (Single Image)
Case 3: Multiple Image References
Case 4: Image Provided Without Explicit Reference (Auto-added)
The system will automatically prepend<<<image_1>>>to the prompt, equivalent to"<<<image_1>>>The person slowly turns and smiles".
Case 5: Generate Video with Audio
Note:audiois mutually exclusive withvideo_list. Whenvideo_listhas a value, theaudioparameter is not needed.
Query Task ResultsVideo generation is an async task that returns a
task_id upon submission. Use the Get Task Status endpoint to query generation progress and results.