Synthesize audio files (songs/human voices) with photos to generate a lip-sync singing video. The system will automatically drive the mouth shapes, expressions and movements of the characters in the pictures based on the audio content, achieving a natural singing effect.
URL of the audio file. Supported: MP3/WAV/M4A. Quality: clear voice, minimal noise. Duration must be between 2–30 seconds (business validation will enforce).
URL of the photo. Supported: JPG/PNG/JFIF. Recommended: clear, frontal face. Max size 4.7 MB (business validation will enforce).
Successful Response