#ollama - Tag | AI Tools Guidebook

Troubleshooting

Local Embedding Server Crashes Under Batched Requests

Ollama, llama-server, vLLM, or sentence-transformers crashes or OOMs on batched embeddings. Fix batch size, num_batch, sequence length, and concurrency — with the exact flags.

May 25, 2026 #local-llm #ollama

Troubleshooting

Multi-GPU Not Used — Local LLM Runs Only on GPU 0

Your local LLM uses one GPU while the others sit at 0%. Fix it with llama.cpp --split-mode, vLLM --tensor-parallel-size, Ollama auto-spread, and the NCCL flags PCIe rigs need.

May 25, 2026 #local-llm #ollama

Troubleshooting

Local LLM Output Truncated Mid-Token (Ollama / llama.cpp)

Your local model stops mid-word with no EOS token. Diagnose num_predict limits, the VRAM-based num_ctx default, stop sequences, proxy buffering, and UTF-8 byte splits.

May 25, 2026 #local-llm #ollama

Troubleshooting

Local LLM Very Slow on First Token After Cold Start

Local LLM takes 30-120s to produce the first token after loading, then runs fast. Diagnose disk I/O, model eviction, CUDA/Metal shader JIT, and KV cache allocation, and pin the model warm.

May 25, 2026 #local-llm #ollama

Troubleshooting

Local Model Ignores the Tool-Calling Format

Local LLM writes tool names in prose instead of structured JSON, or ignores the tools list. Fix it with the right tool-capable model, --jinja in llama-server, and Ollama's format JSON-schema constraint.

May 25, 2026 #local-llm #ollama

Troubleshooting

Local RAG Index Rebuild Is Unbearably Slow

Rebuilding a local vector index from thousands of documents takes hours instead of minutes. Fix batch size, skip unchanged docs, batch-write the vectorstore, and right-size chunks.

May 25, 2026 #local-llm #ollama

Troubleshooting

Ollama Doesn't Detect the GPU, Falls Back to CPU

Ollama ignores your NVIDIA or AMD GPU and runs on CPU only. Read the inference-compute log line, fix driver, CUDA, and ROCm mismatches, and force GPU offloading.

May 25, 2026 #local-llm #ollama

Troubleshooting

Ollama Pull Stalls or Resets Mid-Download — Fixes

ollama pull freezes at a percentage, the bar runs backwards, or you see max retries exceeded: EOF. Diagnose network, disk, and partial-blob causes and resume cleanly.

May 25, 2026 #local-llm #ollama

Troubleshooting

Ollama Pull Succeeds but the Model Isn't in ollama list

ollama pull finishes with no error but the model is missing from ollama list. Fix the OLLAMA_MODELS path split, the ollama service-user mismatch, and corrupted manifests.

May 25, 2026 #local-llm #ollama

Troubleshooting

Ollama Modelfile SYSTEM Prompt Is Ignored

Your Ollama Modelfile SYSTEM directive has no effect on model behavior. Fix it fast: verify the template injects .System, check for RENDERER/PARSER inheritance, and stop your client from overriding the system message.

May 25, 2026 #local-llm #ollama

Troubleshooting

Fix Ollama port already in use (11434)

Ollama won't start because port 11434 is already bound. Find the process holding it, free the port, or move Ollama to another port — exact commands for macOS, Linux, and Windows.

May 25, 2026 #local-llm #ollama