Viral Empire State Proposal Inspires AI Trend: How to Turn Your Photo Into a Cinematic Video, Prompt Here
The idea put forward by the Russian couple of Ivan Kuznetsov and Angela Nikolau at the Empire State Building has led to the rise of an exciting trend with respect to AI technology. Although many people have already managed to replicate the majestic skyscraper picture through the use of AI, the new trend has gone a notch higher and turned the static picture into a moving video.

AI-generated summary, reviewed by editors
Thanks to Google's new version of Gemini AI, one can now transform a static picture into a quality video with realistic camera movements and fabric dynamics.
AI Video Trend Is Taking Over Social Media
Influencers on Instagram, X, and YouTube Shorts have been posting cinematic AI videos that seemingly feature humans standing on top of large skyscrapers or communication masts.
Without having to employ pricey drones or taking risks at risky locations, all one needs to do is upload a generated AI image and create an extensive video prompt to bring the images to life.
The outcome looks like a scene from a movie, with slow-moving cameras, fluttering flags, and motion from the environment.
How to Turn Your AI Photo Into a Video
Once you have created your AI image, upload it into the upgraded version of Google Gemini, which supports AI-powered image-to-video generation.
Then use the following prompt exactly as written:
Viral Video Prompt
Create a cinematic, photorealistic video inspired by the uploaded image. The scene shows two masked people standing on top of a tall beacon tower between modern high-rise buildings in a hazy urban environment. A large black flag extends dramatically from the tower.
The camera performs a slow, smooth zoom-out, maintaining a stable, cinematic feel with no sudden movements or cuts.
The flag moves naturally in the wind with realistic cloth physics. Display "[USER_TEXT]" in large, bold white letters across the flag. The text should appear as if printed directly onto the fabric, naturally following the folds, ripples, and movement of the flag while remaining clear and readable.
The two people remain calm and largely stationary, with only subtle natural balance adjustments typical of standing still. Preserve their overall appearance, clothing, and positions throughout the shot.
Maintain the same dramatic urban atmosphere, realistic lighting, soft haze, muted color palette, and cinematic depth of field. Keep the beacon tower and surrounding buildings consistent with the reference while allowing only subtle environmental motion such as the flag gently waving.
Style: Ultra-realistic, cinematic, high-end film look, natural motion, physically accurate cloth simulation.
Duration: 5-8 seconds.
Why This Prompt Works
The prompt gives the AI detailed instructions about every element of the scene. It specifies:
- Slow cinematic zoom-out camera movement
- Realistic flag movement with cloth physics
- Stable character positions with subtle body movement
- Natural lighting and atmospheric haze
- Photorealistic textures and cinematic depth of field
- A short 5-8 second duration ideal for Instagram Reels and YouTube Shorts
- Because the instructions are highly descriptive, the AI is able to generate a much more realistic and polished video.
Tips Before You Generate
- For the best results:
- Upload a high-resolution AI-generated image.
- Use the upgraded version of Google Gemini, which supports image-to-video generation.
- Replace [USER_TEXT] with your preferred message before generating the video.
- Keep the aspect ratio vertical if you plan to post the final output on Instagram Reels, YouTube Shorts or other short-video platforms.
The more detailed your reference image and prompt are, the more cinematic and realistic the final AI-generated video is likely to appear.
Note: This image-to-video feature requires the upgraded version of Google Gemini. It is not available in the standard free version.












Click it and Unblock the Notifications