Fix vLLM CUDA Version Mismatch and undefined symbol Errors

Q: `nvidia-smi` shows "CUDA Version: 12.8" but my old notes mention nvcc 12.1 — which one matters for vLLM?

For prebuilt wheels, neither `nvcc` nor a system CUDA toolkit matters — the wheel bundles its own CUDA runtime and PyTorch. What matters is that your **driver** (the `nvidia-smi` "CUDA Version" field is the max it supports) is recent enough for the wheel's CUDA build: driver >= 570 for a CUDA 12.8 wheel, >= 580 for CUDA 13.0. You only need a matching `nvcc` toolkit when building vLLM from source.

Q: I fixed every version and still get `undefined symbol` on import — why?

There's a second `torch` on the import path. Run `python -c "import torch; print(torch.__file__)"`; if it points anywhere other than your active env's `site-packages`, uninstall the stray copy (often a conda `pytorch` package) or rebuild the env from scratch. ABI undefined-symbol errors are almost always "wrong torch wins," not a vLLM bug.

Q: My GPU is a V100 / RTX 2080 Ti — does current vLLM still support it?

Compute capability `7.0` (V100) and `7.5` (RTX 2080 Ti) are increasingly second-class on the newest vLLM releases; some kernels ship only for `sm_80`+ (`8.0`/`8.6` and up). If you hit `no kernel image is available`, pin an older vLLM release that still built `sm_70`/`sm_75` kernels, or switch to llama.cpp/Ollama, which have mature support for older cards.

Q: My driver is too old to update (locked cluster). Any way to run a newer CUDA build?

NVIDIA's forward-compatibility `cuda-compat` package plus `VLLM_ENABLE_CUDA_COMPATIBILITY=1` lets a newer CUDA runtime run on an older datacenter driver in some cases. It's a workaround, not a substitute for a driver that meets the minimum in the table above.

vLLM crashes on startup with undefined symbol, no kernel image, or CUDA mismatch. Install into a clean env with uv --torch-backend=auto and align driver, CUDA, and PyTorch.

Published: May 25, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You run python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3.1-8B-Instruct and the process dies before the server ever binds a port. The error is one of: ImportError: ... vllm/_C.abi3.so: undefined symbol: _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_jb, RuntimeError: CUDA error: no kernel image is available for execution on the device, or torch.cuda.is_available() quietly returns False. The root cause is almost always the same: the compiled vLLM wheel, PyTorch, and your CUDA driver are not built for each other.

Fastest fix (as of June 2026): start from a clean virtual environment and let the installer pick the matching CUDA backend, instead of hand-installing torch and vllm separately:

uv venv --python 3.12 && source .venv/bin/activate
uv pip install vllm --torch-backend=auto

--torch-backend=auto inspects your installed driver and pulls the PyTorch build that matches it. The vLLM wheel already bundles its own PyTorch, so the two can never drift apart. If you don’t use uv, the pip path and the manual version-alignment steps are below.

Why “just pip install vllm” breaks: vLLM compiles its own CUDA kernels (_C.abi3.so). Those kernels are ABI-locked to one exact PyTorch build. If a stray torch from another install wins on the import path, you get the undefined symbol crash even though every individual package looks “installed.” This is the single most common reason vLLM won’t start on a fresh machine.

What changed recently

This article was first written against vLLM 0.4. As of June 2026 the toolchain looks different, and the old advice to “install PyTorch first, then vLLM” now causes the mismatch rather than preventing it:

vLLM is on the 0.11.x line. The wheel bundles PyTorch (2.11) and all CUDA deps. Do not install torch separately first.
Official wheels are compiled with CUDA 12.9 by default, with prebuilt variants for CUDA 12.8 and 13.0.
NVIDIA Blackwell GPUs (B200, GB200, RTX 50-series) require CUDA >= 12.8. Older 12.1/12.4 wheels will not run on them.
The recommended installer is now uv with --torch-backend=auto.

Common causes

Ordered by hit rate, highest first.

1. A stray PyTorch shadows the one vLLM was built against (undefined symbol)

This is the top cause of undefined symbol: _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_jb. The vLLM wheel ships its own PyTorch, but an older pip install torch (or a conda pytorch) is earlier on the import path. vLLM’s compiled _C.abi3.so then loads against the wrong PyTorch C++ ABI and the symbol it needs isn’t there.

How to spot it: Run pip show torch | grep Version and python -c "import vllm; print(vllm.__version__)". Then check that only one torch is importable: python -c "import torch; print(torch.__file__, torch.__version__)". A torch version that doesn’t match what the vLLM wheel pinned (it pins an exact build) is the tell.

2. CUDA build of the wheel doesn’t match your GPU architecture (no kernel image)

no kernel image is available for execution on the device means the wheel was compiled without kernels for your GPU’s compute capability. The most common 2026 case: a Blackwell card (RTX 5080/5090, B200) running a wheel built for CUDA 12.4 or older, which has no sm_100/sm_120 kernels.

How to spot it: Run python -c "import torch; print(torch.cuda.get_device_capability())". RTX 3090 = (8, 6), RTX 4090 = (8, 9), A100 = (8, 0), H100 = (9, 0), RTX 5090 = (12, 0). If you’re on (10, x)/(12, x) you need a CUDA 12.8+ wheel.

3. Driver too old for the CUDA build the wheel needs

The bundled CUDA runtime needs a minimum NVIDIA driver. As of June 2026 (Linux minimums):

CUDA build	Minimum NVIDIA driver
CUDA 12.4	>= 550
CUDA 12.8	>= 570
CUDA 13.0	>= 580

If your driver is older than the wheel’s CUDA build requires, you get no kernel image available or silent is_available() == False.

How to spot it: Run nvidia-smi and read the Driver Version field (top-left) and the CUDA Version field (top-right; this is the max the driver supports, not what’s installed). Cross-reference the table above.

4. Mixed conda / system CUDA on the import path

Conda environments often carry their own cuda-toolkit and pytorch. When the conda CUDA differs from what the wheel expects, the process links the system libcuda.so but tries to load mismatched kernels, producing undefined-symbol errors.

How to spot it: Inside the activated env, run conda list | grep -iE "cuda|torch" and python -c "import torch; print(torch.version.cuda)". If conda installed a pytorch package alongside the vLLM wheel’s torch, that’s the conflict.

5. Editable / source build out of sync with installed kernels

If you git pull a vLLM dev checkout or do uv pip install -e . after C++/kernel changes without rebuilding, the Python tree and the compiled .so drift apart. The same undefined symbol error appears.

How to spot it: You installed with -e / --editable or built from source. Confirmed by pip show vllm | grep Location pointing at your git checkout rather than site-packages.

6. Building from source against a CUDA the kernels don’t cover

When you must build from source (custom CUDA, unsupported platform), an incomplete build can skip kernels for your architecture and fall back to incompatible prebuilt ones.

How to spot it: Reinstall verbosely and scan for skipped compiles: pip install vllm -v 2>&1 | grep -iE "nvcc|compile|skip|error".

Shortest path to fix

Step 1: Establish the ground-truth version chain

# Driver version + the MAX CUDA it supports (top corners of the table)
nvidia-smi
# Driver Version: 570.xx   |   CUDA Version: 12.8

# GPU compute capability (decides which kernels you need)
python -c "import torch; print('compute cap:', torch.cuda.get_device_capability())"

# What PyTorch is actually importable, and its CUDA build
python -c "import torch; print('torch:', torch.__version__, '| cuda:', torch.version.cuda, '| file:', torch.__file__)"

# Current vLLM (may be broken)
python -c "import vllm; print('vllm:', vllm.__version__)" 2>/dev/null || echo "vllm not importable"

The two numbers that must agree: the driver’s max CUDA (from nvidia-smi) must be >= the wheel’s CUDA build, and there must be exactly one torch importable.

Step 2: Start from a clean environment (do not reuse a polluted one)

A fresh env is the single highest-leverage fix because it removes any stray torch from the import path.

# Recommended: uv (auto-selects the matching PyTorch backend)
uv venv --python 3.12 && source .venv/bin/activate
uv pip install vllm --torch-backend=auto

# Equivalent with plain pip + venv, pinning the CUDA index explicitly:
python3.12 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
# CUDA 12.9 (current default build):
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu129
# CUDA 12.8 (Blackwell-minimum, e.g. RTX 50-series):
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128

Do not run pip install torch before this step. The vLLM wheel brings its own matched PyTorch; installing torch yourself is what reintroduces the mismatch.

Step 3: If you must keep an existing env, purge the conflicting torch first

pip uninstall -y vllm torch torchvision torchaudio flash-attn xformers
pip cache purge
# Then redo Step 2 in this same env
uv pip install vllm --torch-backend=auto

Step 4: Verify the full stack imports cleanly

python3 << 'EOF'
import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA build: {torch.version.cuda}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Compute cap: {torch.cuda.get_device_capability(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

import vllm
print(f"vLLM: {vllm.__version__}")
print("All imports OK")
EOF

If this block prints All imports OK with CUDA available: True, the version chain is sound.

Step 5: Smoke-test the server with a tiny model

python -m vllm.entrypoints.openai.api_server \
  --model facebook/opt-125m \
  --max-model-len 512 \
  --host 127.0.0.1 --port 8000 &

sleep 15
curl http://127.0.0.1:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "facebook/opt-125m", "prompt": "Hello", "max_tokens": 10}'

A JSON completion back means the runtime, kernels, and driver all agree. Now swap in your real model.

Step 6: When in doubt, use the official Docker image

docker run --runtime nvidia --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  vllm/vllm-openai:latest \
  --model meta-llama/Meta-Llama-3.1-8B-Instruct \
  --max-model-len 16384

The official image ships a pre-matched driver-compatible CUDA, PyTorch, and vLLM, so it sidesteps host version management entirely. You still need a host NVIDIA driver new enough for the image’s CUDA build (see the table in cause 3).

How to confirm it’s fixed

You’re done when all three are true:

python -c "import vllm; import torch; print(torch.cuda.is_available())" prints True with no import error.
The facebook/opt-125m smoke test in Step 5 returns a JSON completion.
Your real model loads past the “Loading model weights” log line and binds port 8000.

If 1 passes but your real model still fails, that’s no longer a CUDA mismatch — it’s usually VRAM (OOM) or context length, covered in the related articles below.

Prevention

Always install into a fresh venv/conda env and let the wheel bring its own PyTorch. Never pip install torch before vLLM.
Prefer uv pip install vllm --torch-backend=auto so the PyTorch backend is chosen from your actual driver.
Pin in requirements.txt: the exact vllm==<version> plus --extra-index-url https://download.pytorch.org/whl/cu129 (or your CUDA build).
After any NVIDIA driver update, re-run the Step 4 verification before assuming the stack still works.
On Blackwell (RTX 50-series, B200), require CUDA >= 12.8 wheels from day one; 12.1/12.4 wheels have no kernels for those cards.
For production, pin and use the official vllm/vllm-openai Docker image to freeze the whole chain.

FAQ

Q: nvidia-smi shows “CUDA Version: 12.8” but my old notes mention nvcc 12.1 — which one matters for vLLM? A: For prebuilt wheels, neither nvcc nor a system CUDA toolkit matters — the wheel bundles its own CUDA runtime and PyTorch. What matters is that your driver (the nvidia-smi “CUDA Version” field is the max it supports) is recent enough for the wheel’s CUDA build: driver >= 570 for a CUDA 12.8 wheel, >= 580 for CUDA 13.0. You only need a matching nvcc toolkit when building vLLM from source.

Q: I fixed every version and still get undefined symbol on import — why? A: There’s a second torch on the import path. Run python -c "import torch; print(torch.__file__)"; if it points anywhere other than your active env’s site-packages, uninstall the stray copy (often a conda pytorch package) or rebuild the env from scratch. ABI undefined-symbol errors are almost always “wrong torch wins,” not a vLLM bug.

Q: My GPU is a V100 / RTX 2080 Ti — does current vLLM still support it? A: Compute capability 7.0 (V100) and 7.5 (RTX 2080 Ti) are increasingly second-class on the newest vLLM releases; some kernels ship only for sm_80+ (8.0/8.6 and up). If you hit no kernel image is available, pin an older vLLM release that still built sm_70/sm_75 kernels, or switch to llama.cpp/Ollama, which have mature support for older cards.

Q: Can I run vLLM CPU-only? A: There is an experimental CPU backend, but it’s far too slow for serving. For CPU inference use llama.cpp (llama-server) or Ollama instead.

Q: My driver is too old to update (locked cluster). Any way to run a newer CUDA build? A: NVIDIA’s forward-compatibility cuda-compat package plus VLLM_ENABLE_CUDA_COMPATIBILITY=1 lets a newer CUDA runtime run on an older datacenter driver in some cases. It’s a workaround, not a substitute for a driver that meets the minimum in the table above.

Tags: #local-llm #vllm #Troubleshooting