#llamacpp - Tag | AI Tools Guidebook

Troubleshooting

llama.cpp mmap Fails on a Network Drive

llama.cpp crashes or stalls loading a GGUF model from an NFS or SMB share. Fastest fix: add --no-mmap (and --no-direct-io if DirectIO is on), or copy the model to local disk.

May 25, 2026 #local-llm #llama.cpp

Troubleshooting

llama.cpp Quality Drops After Switching to a More Aggressive Quant

Responses degrade after moving from Q5_K_M or Q8_0 to Q4_0, IQ4_XS, or lower in llama.cpp. Pick the right quant tier, fix bad re-quants, and confirm with perplexity.

May 25, 2026 #local-llm #llama.cpp

Troubleshooting

Chat-Template Mismatch Produces Garbage Local LLM Output

A local LLM echoes your prompt, prints literal [INST] or <|im_start|> tags, or loops the same sentence. That is a chat-template mismatch. Find the model's real template and force the engine to use it.

May 25, 2026 #local-llm #llama.cpp

Troubleshooting

Misconfigured RoPE Scaling Garbles Long-Context Output

A local LLM stays coherent up to its native context length, then degenerates into repetition or gibberish. Diagnose and fix RoPE scaling (YaRN, llama3, rope_theta) in llama.cpp and vLLM.

May 25, 2026 #local-llm #llama.cpp

Troubleshooting

Tokenizer Drift: Local LLM Token Counts Don't Match

Your app's token count disagrees with the local llama.cpp or Ollama server, causing context overflow or silent truncation. Use the server's own tokenizer as ground truth to fix the drift.

May 25, 2026 #local-llm #llama.cpp