6 files changed, 466 insertions, 0 deletions
diff --git a/.rules/docs/ollama/chat.md b/.rules/docs/ollama/chat.md
new file mode 100644
index 0000000..874243d
--- /dev/null
+++ b/.rules/docs/ollama/chat.md
@@ -0,0 +1,171 @@
+# Generate a chat message
+
+`POST /api/chat` — Generate the next message in a conversation between a user and an assistant.
+
+**Server:** `http://localhost:11434`
+
+## Request
+
+| Field | Type | Required | Description |
+|---|---|---|---|
+| `model` | string | yes | Model name |
+| `messages` | ChatMessage[] | yes | Chat history (array of message objects) |
+| `tools` | ToolDefinition[] | no | Function tools the model may call |
+| `format` | `"json"` \| object | no | Response format — `"json"` or a JSON schema |
+| `options` | ModelOptions | no | Runtime generation options (see generate.md) |
+| `stream` | boolean | no | Stream partial responses (default: `true`) |
+| `think` | boolean \| string | no | Enable thinking output (`true`/`false` or `"high"`, `"medium"`, `"low"`) |
+| `keep_alive` | string \| number | no | Keep-alive duration (e.g. `"5m"` or `0` to unload) |
+
+### ChatMessage
+
+| Field | Type | Required | Description |
+|---|---|---|---|
+| `role` | string | yes | `"system"`, `"user"`, `"assistant"`, or `"tool"` |
+| `content` | string | yes | Message text |
+| `images` | string[] | no | Base64-encoded images (multimodal) |
+| `tool_calls` | ToolCall[] | no | Tool calls from the model |
+
+### ToolDefinition
+
+```json
+{
+  "type": "function",
+  "function": {
+    "name": "function_name",
+    "description": "What the function does",
+    "parameters": { /* JSON Schema */ }
+  }
+}
+```
+
+### ToolCall
+
+```json
+{
+  "function": {
+    "name": "function_name",
+    "arguments": { /* key-value args */ }
+  }
+}
+```
+
+## Response (non-streaming, `stream: false`)
+
+| Field | Type | Description |
+|---|---|---|
+| `model` | string | Model name |
+| `created_at` | string | ISO 8601 timestamp |
+| `message.role` | string | Always `"assistant"` |
+| `message.content` | string | Assistant reply text |
+| `message.thinking` | string | Thinking trace (when `think` enabled) |
+| `message.tool_calls` | ToolCall[] | Tool calls requested by assistant |
+| `done` | boolean | Whether the response finished |
+| `done_reason` | string | Why it finished |
+| `total_duration` | integer | Total time (nanoseconds) |
+| `load_duration` | integer | Model load time (nanoseconds) |
+| `prompt_eval_count` | integer | Input token count |
+| `prompt_eval_duration` | integer | Prompt eval time (nanoseconds) |
+| `eval_count` | integer | Output token count |
+| `eval_duration` | integer | Token generation time (nanoseconds) |
+
+## Streaming Response (`stream: true`, default)
+
+Returns `application/x-ndjson`. Each chunk has `message.content` (partial text). Final chunk has `done: true` with duration/count stats.
+
+## Examples
+
+### Basic (streaming)
+```bash
+curl http://localhost:11434/api/chat -d '{
+  "model": "gemma3",
+  "messages": [
+    {"role": "user", "content": "why is the sky blue?"}
+  ]
+}'
+```
+
+### Non-streaming
+```bash
+curl http://localhost:11434/api/chat -d '{
+  "model": "gemma3",
+  "messages": [
+    {"role": "user", "content": "why is the sky blue?"}
+  ],
+  "stream": false
+}'
+```
+
+### Structured output
+```bash
+curl http://localhost:11434/api/chat -d '{
+  "model": "gemma3",
+  "messages": [
+    {"role": "user", "content": "What are the populations of the United States and Canada?"}
+  ],
+  "stream": false,
+  "format": {
+    "type": "object",
+    "properties": {
+      "countries": {
+        "type": "array",
+        "items": {
+          "type": "object",
+          "properties": {
+            "country": {"type": "string"},
+            "population": {"type": "integer"}
+          },
+          "required": ["country", "population"]
+        }
+      }
+    },
+    "required": ["countries"]
+  }
+}'
+```
+
+### Tool calling
+```bash
+curl http://localhost:11434/api/chat -d '{
+  "model": "qwen3",
+  "messages": [
+    {"role": "user", "content": "What is the weather today in Paris?"}
+  ],
+  "stream": false,
+  "tools": [
+    {
+      "type": "function",
+      "function": {
+        "name": "get_current_weather",
+        "description": "Get the current weather for a location",
+        "parameters": {
+          "type": "object",
+          "properties": {
+            "location": {
+              "type": "string",
+              "description": "The location, e.g. San Francisco, CA"
+            },
+            "format": {
+              "type": "string",
+              "description": "celsius or fahrenheit",
+              "enum": ["celsius", "fahrenheit"]
+            }
+          },
+          "required": ["location", "format"]
+        }
+      }
+    }
+  ]
+}'
+```
+
+### Thinking
+```bash
+curl http://localhost:11434/api/chat -d '{
+  "model": "gpt-oss",
+  "messages": [
+    {"role": "user", "content": "What is 1+1?"}
+  ],
+  "think": "low"
+}'
+```
diff --git a/.rules/docs/ollama/embed.md b/.rules/docs/ollama/embed.md
new file mode 100644
index 0000000..9c81ebf
--- /dev/null
+++ b/.rules/docs/ollama/embed.md
@@ -0,0 +1,56 @@
+# Generate embeddings
+
+`POST /api/embed` — Creates vector embeddings representing the input text.
+
+**Server:** `http://localhost:11434`
+
+## Request
+
+| Field | Type | Required | Description |
+|---|---|---|---|
+| `model` | string | yes | Model name (e.g. `"embeddinggemma"`) |
+| `input` | string \| string[] | yes | Text or array of texts to embed |
+| `truncate` | boolean | no | Truncate inputs exceeding context window (default: `true`). If `false`, returns an error. |
+| `dimensions` | integer | no | Number of dimensions for the embedding vectors |
+| `keep_alive` | string | no | Model keep-alive duration |
+| `options` | ModelOptions | no | Runtime options (see generate.md) |
+
+## Response
+
+| Field | Type | Description |
+|---|---|---|
+| `model` | string | Model that produced the embeddings |
+| `embeddings` | number[][] | Array of embedding vectors (one per input) |
+| `total_duration` | integer | Total time (nanoseconds) |
+| `load_duration` | integer | Model load time (nanoseconds) |
+| `prompt_eval_count` | integer | Number of input tokens processed |
+
+## Examples
+
+### Single input
+```bash
+curl http://localhost:11434/api/embed -d '{
+  "model": "embeddinggemma",
+  "input": "Why is the sky blue?"
+}'
+```
+
+### Multiple inputs (batch)
+```bash
+curl http://localhost:11434/api/embed -d '{
+  "model": "embeddinggemma",
+  "input": [
+    "Why is the sky blue?",
+    "Why is the grass green?"
+  ]
+}'
+```
+
+### Custom dimensions
+```bash
+curl http://localhost:11434/api/embed -d '{
+  "model": "embeddinggemma",
+  "input": "Generate embeddings for this text",
+  "dimensions": 128
+}'
+```
diff --git a/.rules/docs/ollama/generate.md b/.rules/docs/ollama/generate.md
new file mode 100644
index 0000000..30534c2
--- /dev/null
+++ b/.rules/docs/ollama/generate.md
@@ -0,0 +1,121 @@
+# Generate a response
+
+`POST /api/generate` — Generates a response for a provided prompt.
+
+**Server:** `http://localhost:11434`
+
+## Request
+
+| Field | Type | Required | Description |
+|---|---|---|---|
+| `model` | string | yes | Model name |
+| `prompt` | string | no | Text for the model to generate a response from |
+| `suffix` | string | no | Fill-in-the-middle text after the prompt, before the response |
+| `images` | string[] | no | Base64-encoded images (for multimodal models) |
+| `format` | string \| object | no | `"json"` or a JSON schema object for structured output |
+| `system` | string | no | System prompt |
+| `stream` | boolean | no | Stream partial responses (default: `true`) |
+| `think` | boolean \| string | no | Enable thinking output (`true`/`false` or `"high"`, `"medium"`, `"low"`) |
+| `raw` | boolean | no | Return raw response without prompt templating |
+| `keep_alive` | string \| number | no | Keep-alive duration (e.g. `"5m"` or `0` to unload immediately) |
+| `options` | ModelOptions | no | Runtime generation options (see below) |
+
+### ModelOptions
+
+| Field | Type | Description |
+|---|---|---|
+| `seed` | integer | Random seed for reproducible outputs |
+| `temperature` | float | Randomness (higher = more random) |
+| `top_k` | integer | Limit next token to K most likely |
+| `top_p` | float | Nucleus sampling cumulative probability threshold |
+| `min_p` | float | Minimum probability threshold |
+| `stop` | string \| string[] | Stop sequences |
+| `num_ctx` | integer | Context length (number of tokens) |
+| `num_predict` | integer | Max tokens to generate |
+
+## Response (non-streaming, `stream: false`)
+
+| Field | Type | Description |
+|---|---|---|
+| `model` | string | Model name |
+| `created_at` | string | ISO 8601 timestamp |
+| `response` | string | Generated text |
+| `thinking` | string | Thinking output (when `think` enabled) |
+| `done` | boolean | Whether generation finished |
+| `done_reason` | string | Why generation stopped |
+| `total_duration` | integer | Total time (nanoseconds) |
+| `load_duration` | integer | Model load time (nanoseconds) |
+| `prompt_eval_count` | integer | Number of input tokens |
+| `prompt_eval_duration` | integer | Prompt eval time (nanoseconds) |
+| `eval_count` | integer | Number of output tokens |
+| `eval_duration` | integer | Token generation time (nanoseconds) |
+
+## Streaming Response (`stream: true`, default)
+
+Returns `application/x-ndjson` — one JSON object per line. Each chunk has the same fields as the non-streaming response. The final chunk has `done: true` with duration/count stats.
+
+## Examples
+
+### Basic (streaming)
+```bash
+curl http://localhost:11434/api/generate -d '{
+  "model": "gemma3",
+  "prompt": "Why is the sky blue?"
+}'
+```
+
+### Non-streaming
+```bash
+curl http://localhost:11434/api/generate -d '{
+  "model": "gemma3",
+  "prompt": "Why is the sky blue?",
+  "stream": false
+}'
+```
+
+### With options
+```bash
+curl http://localhost:11434/api/generate -d '{
+  "model": "gemma3",
+  "prompt": "Why is the sky blue?",
+  "options": {
+    "temperature": 0.8,
+    "top_p": 0.9,
+    "seed": 42
+  }
+}'
+```
+
+### Structured output (JSON schema)
+```bash
+curl http://localhost:11434/api/generate -d '{
+  "model": "gemma3",
+  "prompt": "What are the populations of the United States and Canada?",
+  "stream": false,
+  "format": {
+    "type": "object",
+    "properties": {
+      "countries": {
+        "type": "array",
+        "items": {
+          "type": "object",
+          "properties": {
+            "country": {"type": "string"},
+            "population": {"type": "integer"}
+          },
+          "required": ["country", "population"]
+        }
+      }
+    },
+    "required": ["countries"]
+  }
+}'
+```
+
+### Load / Unload model
+```bash
+# Load
+curl http://localhost:11434/api/generate -d '{"model": "gemma3"}'
+# Unload
+curl http://localhost:11434/api/generate -d '{"model": "gemma3", "keep_alive": 0}'
+```
diff --git a/.rules/docs/ollama/list-models.md b/.rules/docs/ollama/list-models.md
new file mode 100644
index 0000000..f5da57f
--- /dev/null
+++ b/.rules/docs/ollama/list-models.md
@@ -0,0 +1,56 @@
+# List models
+
+`GET /api/tags` — Fetch a list of locally available models and their details.
+
+**Server:** `http://localhost:11434`
+
+## Request
+
+No parameters required.
+
+```bash
+curl http://localhost:11434/api/tags
+```
+
+## Response
+
+| Field | Type | Description |
+|---|---|---|
+| `models` | ModelSummary[] | Array of available models |
+
+### ModelSummary
+
+| Field | Type | Description |
+|---|---|---|
+| `name` | string | Model name |
+| `model` | string | Model name |
+| `modified_at` | string | Last modified (ISO 8601) |
+| `size` | integer | Size on disk (bytes) |
+| `digest` | string | SHA256 digest |
+| `details.format` | string | File format (e.g. `"gguf"`) |
+| `details.family` | string | Primary model family (e.g. `"llama"`) |
+| `details.families` | string[] | All families the model belongs to |
+| `details.parameter_size` | string | Parameter count label (e.g. `"7B"`) |
+| `details.quantization_level` | string | Quantization level (e.g. `"Q4_0"`) |
+
+### Example response
+```json
+{
+  "models": [
+    {
+      "name": "gemma3",
+      "model": "gemma3",
+      "modified_at": "2025-10-03T23:34:03.409490317-07:00",
+      "size": 3338801804,
+      "digest": "a2af6cc3eb7fa8be8504abaf9b04e88f17a119ec3f04a3addf55f92841195f5a",
+      "details": {
+        "format": "gguf",
+        "family": "gemma",
+        "families": ["gemma"],
+        "parameter_size": "4.3B",
+        "quantization_level": "Q4_K_M"
+      }
+    }
+  ]
+}
+```
diff --git a/.rules/docs/ollama/show-model.md b/.rules/docs/ollama/show-model.md
new file mode 100644
index 0000000..befbd22
--- /dev/null
+++ b/.rules/docs/ollama/show-model.md
@@ -0,0 +1,43 @@
+# Show model details
+
+`POST /api/show` — Get detailed information about a specific model.
+
+**Server:** `http://localhost:11434`
+
+## Request
+
+| Field | Type | Required | Description |
+|---|---|---|---|
+| `model` | string | yes | Model name to show |
+| `verbose` | boolean | no | Include large verbose fields in the response |
+
+## Response
+
+| Field | Type | Description |
+|---|---|---|
+| `parameters` | string | Model parameter settings (text) |
+| `modified_at` | string | Last modified (ISO 8601) |
+| `template` | string | Prompt template used by the model |
+| `capabilities` | string[] | Supported features (e.g. `"completion"`, `"vision"`) |
+| `details.format` | string | File format (e.g. `"gguf"`) |
+| `details.family` | string | Model family |
+| `details.families` | string[] | All families |
+| `details.parameter_size` | string | Parameter count label (e.g. `"4.3B"`) |
+| `details.quantization_level` | string | Quantization level (e.g. `"Q4_K_M"`) |
+| `model_info` | object | Architecture metadata (context length, embedding size, etc.) |
+
+## Examples
+
+```bash
+curl http://localhost:11434/api/show -d '{
+  "model": "gemma3"
+}'
+```
+
+### Verbose
+```bash
+curl http://localhost:11434/api/show -d '{
+  "model": "gemma3",
+  "verbose": true
+}'
+```
diff --git a/.rules/docs/ollama/version.md b/.rules/docs/ollama/version.md
new file mode 100644
index 0000000..29fb757
--- /dev/null
+++ b/.rules/docs/ollama/version.md
@@ -0,0 +1,19 @@
+# Get version
+
+`GET /api/version` — Retrieve the Ollama server version.
+
+**Server:** `http://localhost:11434`
+
+## Request
+
+No parameters required.
+
+```bash
+curl http://localhost:11434/api/version
+```
+
+## Response
+
+| Field | Type | Description |
+|---|---|---|
+| `version` | string | Ollama version (e.g. `"0.12.6"`) |