Ollama num_ctx: Why Setting It Higher Than the Model Supports Backfires

The Gotcha
How to Check the Real Limit
What About YaRN and RoPE Scaling?
The Rule

📖 2 minutes read

When running local LLMs with Ollama, you can set num_ctx to control the context window size. But there’s a ceiling you might not expect.

The Gotcha

Every model has an architectural limit baked into its training. Setting num_ctx higher than that limit doesn’t give you more context — it gives you garbage output or silent truncation:

# This model was trained with 8K context
ollama run llama3 --num_ctx 32768
# Result: degraded output beyond 8K, not extended context

The num_ctx parameter allocates memory for the KV cache, but the model’s positional embeddings only know how to handle positions it saw during training.

How to Check the Real Limit

# Check model metadata
ollama show llama3 --modelfile | grep num_ctx

# Or check the model card
ollama show llama3

The model card or GGUF metadata will tell you the trained context length. That’s your actual ceiling.

What About YaRN and RoPE Scaling?

Some models support extended context through YaRN (Yet another RoPE extensioN) or other RoPE scaling methods. These are baked into the model weights during fine-tuning — you can’t just enable them with a flag.

If a model advertises 128K context, it was trained or fine-tuned with RoPE scaling to handle that. If it advertises 8K, setting num_ctx=128000 won’t magically give you 128K.

The Rule

Match num_ctx to what the model actually supports. Going lower saves memory. Going higher wastes memory and produces worse output. Check the model card, not your wishful thinking.

Daryle De Silva

VP of Technology

11+ years building and scaling web applications. Writing about what I learn in the trenches.

Ollama num_ctx: Why Setting It Higher Than the Model Supports Backfires

Table of Contents

The Gotcha

How to Check the Real Limit

What About YaRN and RoPE Scaling?

The Rule

Daryle De Silva

Comments

Leave a Reply Cancel reply

More posts

Code Archaeology: How to Reverse-Engineer a Complex Operation

Per-Step Try/Catch: Don’t Let One Bad Record Kill Your Entire Batch

PR Descriptions: Describe the Final State, Not the Journey

Add Optional Parameters Instead of Creating New Methods

Ollama num_ctx: Why Setting It Higher Than the Model Supports Backfires

Table of Contents ▼

The Gotcha

How to Check the Real Limit

What About YaRN and RoPE Scaling?

The Rule

Daryle De Silva

Related Articles

Why WSL boot Command Doesn’t Work When systemd=true

Docker Background Processes with the & wait Pattern

Docker Build-Time vs Runtime: The Post-Install Hook Pattern

Comments

Leave a Reply Cancel reply

More posts

Code Archaeology: How to Reverse-Engineer a Complex Operation

Per-Step Try/Catch: Don’t Let One Bad Record Kill Your Entire Batch

PR Descriptions: Describe the Final State, Not the Journey

Add Optional Parameters Instead of Creating New Methods

Table of Contents