← Back to Research

May 24, 2026

Running NVIDIA SANA-WM on RTX 5090 and H100

Comparing NVIDIA's SANA-WM world model video generation on RTX 5090 (32GB) vs H100 (80GB) GPUs — with walkthrough videos, 360° panoramas, and generation benchmarks.

Running NVIDIA SANA-WM on RTX 5090 and H100

NVIDIA SANA-WM is a 2.6B-parameter open-source world model that generates minute-scale 720p videos with precise 6-DoF camera control. Unlike standard text-to-video models, SANA-WM takes a single image and camera trajectory as input, producing immersive first-person walkthrough videos.

We tested SANA-WM on two GPU configurations:

  • RTX 5090 (32GB VRAM) — RunPod community cloud
  • H100 SXM (80GB VRAM) — RunPod secure cloud

Our orchestration framework sana_WM handles the full setup, inference, and result download over SSH from a local WSL environment.


Input Images

We used four diverse scenes to test SANA-WM’s capabilities:

Mansion Interior

Mansion — Grand interior entrance

Hiking Trail

Hiking Trail — Mountain forest path

Mars Outpost

Mars Outpost — Futuristic research station

Modern House

Modern House — Minimalist interior


RTX 5090 Results (32GB VRAM)

The RTX 5090 can only run stage-1 generation due to VRAM constraints. The second-stage refiner (which requires ~75GB VRAM) must be disabled using --no_refiner --offload_vae.

Mansion Walkthrough — Progressive Duration Test

5 Seconds (81 frames)

Prompt: “Cinematic first-person walk through grand mansion entrance” | Action: w-40 | Gen time: ~35s

10 Seconds (161 frames)

Prompt: “Cinematic first-person walk through grand mansion entrance” | Action: w-80,d-40,w-40 | Gen time: ~1m 15s

20 Seconds (321 frames)

Prompt: “Cinematic first-person walk through grand mansion entrance” | Action: w-100,d-30,w-60 | Gen time: ~3m 30s

Why RTX 5090 Fails at 30+ Seconds

Attempting 30-second generation (481 frames) on RTX 5090 results in CUDA Out of Memory:

torch.OutOfMemoryError: CUDA out of memory. 
Tried to allocate 1.2 GiB. GPU has 31.74 GiB total, 
28.6 GiB used by model + latents.

Key limitations:

  • Stage-1 only (no refiner) → visible artifacts and temporal drift
  • 32GB VRAM exhausted at ~400 latent frames
  • Longer videos show increasing blur and scene morphing

H100 Results (80GB VRAM)

The H100 SXM runs the full two-stage pipeline including the 17B-parameter refiner, producing significantly higher quality output.

Mansion Walkthrough — Progressive Duration Test

5 Seconds (81 frames) — With Refiner

Prompt: “Cinematic first-person walk through grand mansion entrance” | Action: w-40 | Gen time: ~1m (stage-1: 31s, refiner: 1s)

10 Seconds (161 frames) — With Refiner

Prompt: “Cinematic first-person walk through grand mansion entrance” | Action: w-80,d-40,w-40 | Gen time: ~2m (stage-1: 55s, refiner: 5s)

30 Seconds (481 frames) — With Refiner ✓

Prompt: “Cinematic first-person walk through grand mansion entrance” | Action: w-100,d-20,w-100,a-20,w-100 | Gen time: ~4m (stage-1: 1m 45s, refiner: 14s)

Key improvements over RTX 5090:

  • ✅ Sharper textures with refiner
  • ✅ Reduced temporal drift
  • ✅ 30-second generation possible
  • ✅ Better scene consistency

H100 — Other Scenes (20 Seconds)

Hiking Trail

Prompt: “First-person hike along mountain trail with forest views” | Action: w-100,a-10,w-60 | Gen time: ~2m

Mars Outpost

Prompt: “Approach to futuristic Mars research station on red planet surface” | Action: w-120,d-20,w-40 | Gen time: ~2m

Modern House

Prompt: “Smooth forward walk through modern minimalist house interior” | Action: w-80,d-15,w-80 | Gen time: ~2m


Experimental: Long-Duration Video (40s)

We attempted a 60-second Mars interior walkthrough, but hit OOM during the refiner stage (~80GB required for 961 latent frames). Reducing to 40 seconds (641 frames) succeeded.

Mars Interior — 40 Seconds

Prompt: “First-person camera approach to Mars outpost entrance, door slides open revealing modern interior lounge with panoramic windows, glide through living area toward open-plan kitchen, pan right to wall-mounted board with sticky notes” | Action: w-150,d-30,w-100,d-60,w-100 | Gen time: ~5m

60s OOM Analysis:

VRAM breakdown for 60s (961 frames):
- Stage-1 model: ~15GB
- Refiner (17B LTX-2): ~25GB
- 961 latent frames: ~40GB
- Total: ~80GB (at limit!)

Experimental: 360° Panoramic Rotation

We tested whether SANA-WM could generate a stationary 360° rotation — the camera fixed in place, rotating to reveal the full environment.

Hiking Trail — 360° Attempt

Prompt: “Stationary camera fixed at one point on mountain hiking trail, smooth 360-degree panoramic rotation revealing complete surrounding environment, seamless circular sweep returning to exact starting viewpoint with no forward or backward movement” | Action: d-90,d-90,d-90,d-90

Modern House — 360° Attempt

Prompt: “Stationary camera positioned in center of modern minimalist house interior, smooth 360-degree panoramic rotation from fixed viewpoint, revealing complete open-plan layout, seamless circular sweep returning to exact starting position with no camera movement” | Action: d-90,d-90,d-90,d-90

Why 360° Rotation Fails

Both attempts show the camera drifting horizontally rather than rotating in place:

  1. Training data bias: SANA-WM was trained primarily on egocentric walkthrough videos (forward movement + turns), not tripod-style panoramas
  2. Action string semantics: The d-N action encodes “turn right N degrees while walking” — there’s no pure “rotate in place” primitive
  3. Temporal model limitations: The model hallucinates forward motion to maintain temporal coherence with its training distribution

Workaround: For true 360° panoramas, consider NVIDIA Lyra 2.0 which explicitly supports orbital trajectories.


Summary: RTX 5090 vs H100

MetricRTX 5090 (32GB)H100 SXM (80GB)
Max video duration~20s~40s
Refiner enabled❌ No✅ Yes
Typical qualityBlurry, driftSharp, stable
Cost (RunPod)~$1.50/hr~$3.00/hr
20s video cost~$0.10~$0.10
Best use caseQuick testsProduction

Recommendation: Use H100 for any video over 10 seconds or where quality matters. RTX 5090 is viable for rapid prototyping under 20 seconds.


Getting Started

Full setup instructions and config examples: github.com/tech-microcosm/sana_WM

# Clone and configure
git clone https://github.com/tech-microcosm/sana_WM.git
cd sana_WM
cp .env.example .env
# Edit .env with your RunPod pod IP and SSH port

# Run inference
python main.py infer -c config/examples/mansion_wm_simple.yaml \
    --host <pod-ip> --port <ssh-port>