.rules/docs/ollama/thinking.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

# Thinking / Reasoning Traces

Thinking-capable models emit a separate `thinking` field containing their reasoning trace, distinct from the final answer in `content`/`response`.

## Enabling Thinking

Set the `think` field on `/api/chat` or `/api/generate` requests:

| `think` value | Behavior |
|---|---|
| `true` | Enable thinking (most models) |
| `false` | Disable thinking |
| `"low"` / `"medium"` / `"high"` | GPT-OSS only — controls trace length; `true`/`false` is ignored for this model |

Thinking is **enabled by default** for supported models.

## Response Fields

| Endpoint | Thinking field | Answer field |
|---|---|---|
| `/api/chat` | `message.thinking` | `message.content` |
| `/api/generate` | `thinking` | `response` |

### Non-streaming example

```bash
curl http://localhost:11434/api/chat -d '{
  "model": "qwen3",
  "messages": [{"role": "user", "content": "How many r's in strawberry?"}],
  "think": true,
  "stream": false
}'
```

Response includes both `message.thinking` (reasoning) and `message.content` (final answer).

## Streaming Thinking

When streaming with `think` enabled, chunks arrive in two phases:

1. **Thinking phase** — chunks have `message.thinking` (or `thinking`) populated, `content` empty.
2. **Answer phase** — chunks have `message.content` (or `response`) populated, `thinking` empty.

Detect the transition by checking which field is non-empty on each chunk. See `streaming.md` for accumulation details.

## Supported Models

- Qwen 3 (`qwen3`)
- GPT-OSS (`gpt-oss`) — requires `"low"` / `"medium"` / `"high"` instead of boolean
- DeepSeek V3.1 (`deepseek-v3.1`)
- DeepSeek R1 (`deepseek-r1`)
- Browse latest: [thinking models](https://ollama.com/search?c=thinking)

## Conversation History with Thinking

When maintaining chat history, include the `thinking` field in the assistant message so the model retains context of its reasoning:

```json
{
  "role": "assistant",
  "thinking": "<accumulated thinking>",
  "content": "<accumulated content>"
}
```