Running LLMs locally fails in completely different ways than cloud APIs: download stalls mid-pull, GPU not detected, VRAM OOM, quantized version loses smarts, chat template mismatch produces garbage output, tool-calling model ignores your JSON schema. This hub covers the most common runtimes: Ollama, LM Studio, llama.cpp, vLLM, MLX. Each article solves one symptom — "why did my model get dumber after I switched quants", "why is the GPU never being used", "why doesn`t ollama list show the model I just pulled". Skips beginner "how to install Ollama" content; goes straight to failure modes + shortest fix + verification checklist.

Common problems