← Back to Research

May 15, 2026

Running NVIDIA Lyra 2.0 on RunPod H100

How to set up and run NVIDIA Lyra 2.0 world generation on a RunPod H100 GPU pod using the lyra3d orchestration framework — with example outputs from real inference runs.

Running NVIDIA Lyra 2.0 on RunPod H100

NVIDIA Lyra 2.0 is a state-of-the-art generative model for creating immersive 3D world videos from a single image. Given one photo and a text caption, it generates a zoom-in and zoom-out video that explores the scene — building a coherent 3D world around it. We tested it on a RunPod H100 pod using lyra3d, a lightweight orchestration framework that handles the full bootstrap, inference, and result download over SSH.

Results

Both tests used 81 frames per direction at 480×832 resolution on an H100 80GB.

Test 1 — Desert Military Outpost (Zoom In / Zoom Out)

Input: Desert Military Outpost

Input image

Lyra 2.0 output — zoom in then zoom out

Caption used: “A cinematic aerial view of a massive futuristic desert military outpost with modular domed buildings, solar panel arrays, glass skyscrapers, pipelines, and military vehicles rising from golden sand dunes under a warm hazy sky.”


Test 2 — Urban Glass Skyscraper (Orbit Horizontal)

Input: Urban Glass Skyscraper

Input image

Lyra 2.0 output — horizontal orbit trajectory

Caption used: “A modern glass curtain wall skyscraper reflecting an ornate classical building, lush green trees in the foreground, urban city scene with bright blue sky.”


Setup Overview

The full setup runs on a fresh RunPod H100 pod with no pre-installed dependencies. The lyra3d bootstrap script handles everything automatically.

1. Launch a RunPod pod

Go to runpod.io and launch a pod:

  • GPU: H100 80GB (required — model uses ~75GB VRAM)
  • Template: RunPod PyTorch (Ubuntu 24.04, CUDA 12.8)
  • Add your SSH public key under Settings → SSH Public Keys

2. Clone lyra3d and configure

git clone https://github.com/tech-microcosm/lyra3d.git
cd lyra3d
cp .env.example .env
# Edit .env with the pod IP and SSH port from the RunPod dashboard

3. Upload and run the bootstrap

# Upload bootstrap script to the pod
scp -i ~/.ssh/id_ed25519 -P <PORT> remote/bootstrap/lyra2_bootstrap.sh root@<IP>:/workspace/lyra3d/

# SSH in and run it in a detached screen session (~38 min on a fresh pod)
ssh -i ~/.ssh/id_ed25519 -p <PORT> root@<IP>
screen -dmS lyra_boot bash /workspace/lyra3d/lyra2_bootstrap.sh
screen -r lyra_boot   # to monitor

4. Place your input and run inference

# Upload your image
scp -P <PORT> my_image.png root@<IP>:/workspace/lyra3d/inputs/

# On the pod: set up sample dir and launch inference
LYRA=/workspace/lyra3d/Lyra/Lyra-2
mkdir -p ${LYRA}/assets/my_sample
cp /workspace/lyra3d/inputs/my_image.png ${LYRA}/assets/my_sample/00.png
echo "Your scene description here." > ${LYRA}/assets/my_sample/00.txt

# Run inference (zoom in + out, minimum valid frame count)
export NVTE_FUSED_ATTN=0
VENV=/workspace/lyra3d/venv
PYTHONPATH=${LYRA} ${VENV}/bin/python -m lyra_2._src.inference.lyra2_zoomgs_inference \
    --input_image_path ${LYRA}/assets/my_sample \
    --sample_id 0 \
    --experiment lyra2 \
    --checkpoint_dir checkpoints/model \
    --prompt_dir ${LYRA}/assets/my_sample \
    --output_path /workspace/lyra3d/outputs/my_run \
    --num_frames_zoom_in 81 \
    --num_frames_zoom_out 81

5. Download results

scp -P <PORT> -r root@<IP>:/workspace/lyra3d/outputs/my_run ./outputs/

Timing (H100 80GB)

StageTime
Bootstrap (fresh pod)~38 min
Checkpoint loading~13 min
Inference (81 + 81 frames)~22 min
Total first run~73 min

Subsequent runs on the same pod skip bootstrap and checkpoint loading, bringing inference to ~22 minutes.


Trajectory Options

Lyra supports several camera trajectory modes via --zoom_in_trajectory:

ModeEffect
horizontal_zoomDefault zoom in/out with horizontal drift
orbit_horizontalHorizontal orbital sweep around the scene
orbit_verticalVertical orbital arc
spiralInward spiral towards scene center
spiral_outwardsOutward spiral away from center

The frame count must satisfy: (frames − 1) must be divisible by 80 — minimum valid value is 81.


Key Notes

  • NVTE_FUSED_ATTN=0 is required to avoid a cuDNN error on H100 with transformer_engine
  • The model uses ~75GB VRAM — an H100 80GB is the minimum recommended GPU
  • Checkpoints are ~40GB and are downloaded once from HuggingFace (nvidia/Lyra-2.0)
  • Full source and bootstrap scripts: github.com/tech-microcosm/lyra3d