All API endpoints require Bearer Token authenticationGet your API Key:Visit the API Key Management Page to get your API KeyAdd it to the request header:
Video content descriptionRequired for text-to-video; optional for image-to-video or video-reference-to-videoIt is recommended to clearly specify the subject, action, camera movement, and style for better generation resultsExample: "A kitten yawning at the camera"
Whether to return the last frame imageWhen set to true, the task result will additionally return the URL of the video’s last frame image, which can be used for continuous video generationDefault: false
{ "model": "doubao-seedance-2.0", "prompt": "The kitten stands up and walks toward the camera", "image_urls": ["https://example.com/cat.jpg"], "duration": 5}
{ "model": "doubao-seedance-2.0", "prompt": "A scene of a person speaking", "video_urls": ["https://example.com/reference.mp4"], "audio_urls": ["https://example.com/speech.wav"], "size": "16:9", "duration": 11}
{ "model": "doubao-seedance-2.0", "prompt": "A man stops a woman and says: \"Remember, you must never point your finger at the moon.\"", "generate_audio": true}
Case 9: Reference Images + Reference Video + Reference Audio (Multi-Modal Video)
Combine reference images, reference video, and reference audio to generate an immersive first-person perspective advertisement video. Ideal for product promotions, brand ads, and other scenarios requiring multi-source material fusion.
{ "model": "doubao-seedance-2.0", "prompt": "Use video 1's first-person perspective throughout, and use audio 1 as the background music throughout. First-person POV fruit tea advertisement for seedance brand 'Peace Apple' apple fruit tea limited edition. First frame is image 1: your hand picks a dewy Aksu red apple with a crisp apple collision sound. 2-4s: quick cut, your hand drops apple chunks into a shaker cup, adds ice and tea base, shakes vigorously, ice collision and shaking sounds sync with upbeat drum beats, background voice: 'Fresh-cut, fresh-shaken'. 4-6s: first-person close-up of the finished product, layered fruit tea poured into a clear cup, your hand gently squeezes cream cap spreading on top, sticks a pink label on the cup, camera zooms in on the layered texture of cream cap and fruit tea. 6-8s: first-person handheld cup raise, you lift the fruit tea from image 2 toward the camera (simulating handing it to the viewer), cup label clearly visible, background voice 'Take a sip of freshness', final frame freezes on image 2. Background voice consistently uses a female tone.", "image_urls": [ "https://example.com/tea_pic1.jpg", "https://example.com/tea_pic2.jpg" ], "video_urls": ["https://example.com/tea_video1.mp4"], "audio_urls": ["https://example.com/tea_audio1.mp3"], "generate_audio": true, "size": "16:9", "duration": 11}
Query Task ResultsVideo generation is an async task that returns a task_id upon submission. Use the Get Task Status endpoint to query generation progress and results.