summaryrefslogtreecommitdiffhomepage
path: root/.rules/docs/ollama/vision.md
diff options
context:
space:
mode:
Diffstat (limited to '.rules/docs/ollama/vision.md')
-rw-r--r--.rules/docs/ollama/vision.md87
1 files changed, 87 insertions, 0 deletions
diff --git a/.rules/docs/ollama/vision.md b/.rules/docs/ollama/vision.md
new file mode 100644
index 0000000..bab9ea0
--- /dev/null
+++ b/.rules/docs/ollama/vision.md
@@ -0,0 +1,87 @@
+# Vision (Multimodal Image Input)
+
+Vision-capable models accept images alongside text prompts for description, classification, and visual Q&A.
+
+## How to Send Images
+
+Add an `images` array to a message in `/api/chat` or to the top-level request in `/api/generate`.
+
+**REST API:** Images must be **base64-encoded** strings (no file paths or URLs).
+
+### `/api/chat` example
+
+```json
+{
+ "model": "gemma3",
+ "messages": [{
+ "role": "user",
+ "content": "What is in this image?",
+ "images": ["<base64-encoded image data>"]
+ }],
+ "stream": false
+}
+```
+
+### `/api/generate` example
+
+```json
+{
+ "model": "gemma3",
+ "prompt": "Describe this image.",
+ "images": ["<base64-encoded image data>"],
+ "stream": false
+}
+```
+
+## Base64 Encoding
+
+In an Obsidian plugin context (TypeScript), convert an `ArrayBuffer` to base64:
+
+```typescript
+const buffer = await vault.readBinary(file);
+const bytes = new Uint8Array(buffer);
+let binary = '';
+for (const b of bytes) binary += String.fromCharCode(b);
+const base64 = btoa(binary);
+```
+
+## Multiple Images
+
+Pass multiple base64 strings in the `images` array. The model will consider all of them in context.
+
+```json
+"images": ["<base64_image_1>", "<base64_image_2>"]
+```
+
+## Combining with Structured Output
+
+Vision works with the `format` parameter — use a JSON schema to get structured descriptions:
+
+```json
+{
+ "model": "gemma3",
+ "messages": [{
+ "role": "user",
+ "content": "Describe the objects in this photo.",
+ "images": ["<base64>"]
+ }],
+ "stream": false,
+ "format": {
+ "type": "object",
+ "properties": {
+ "objects": {
+ "type": "array",
+ "items": {"type": "string"}
+ }
+ },
+ "required": ["objects"]
+ }
+}
+```
+
+## Supported Models
+
+Any model with vision/multimodal capability, e.g.:
+- `gemma3`
+- `llava`
+- Browse: [vision models](https://ollama.com/search?c=vision)