summaryrefslogtreecommitdiffhomepage
path: root/ENDPOINT_ROUTING.md
diff options
context:
space:
mode:
authorAdam Malczewski <[email protected]>2026-04-30 18:06:07 +0900
committerAdam Malczewski <[email protected]>2026-04-30 18:06:07 +0900
commit9be8821368deff024eafedeea55a614f9a9468cf (patch)
tree43d70e2e8d6ac31e288f8f99b71555c051db0b19 /ENDPOINT_ROUTING.md
parent5c9b8f5142198bdf230d500b5101322a22235670 (diff)
downloaddispatch-adapter-copilot-9be8821368deff024eafedeea55a614f9a9468cf.tar.gz
dispatch-adapter-copilot-9be8821368deff024eafedeea55a614f9a9468cf.zip
update to correctly use new api for newer models.HEADmain
Diffstat (limited to 'ENDPOINT_ROUTING.md')
-rw-r--r--ENDPOINT_ROUTING.md189
1 files changed, 189 insertions, 0 deletions
diff --git a/ENDPOINT_ROUTING.md b/ENDPOINT_ROUTING.md
new file mode 100644
index 0000000..e4ab523
--- /dev/null
+++ b/ENDPOINT_ROUTING.md
@@ -0,0 +1,189 @@
+# Endpoint Routing — How the adapter picks `/v1/chat/completions` vs `/v1/responses`
+
+> **TL;DR** The decision is made by a single regex against the model id string.
+> No capability discovery, no flag, no per-request override.
+
+## The decision
+
+The full routing logic lives in **one method**:
+
+`lib/dispatch/adapter/copilot.rb`
+
+```ruby
+# Returns true when the selected model requires the /v1/responses endpoint.
+# This applies to GPT-5 reasoning models. These models reject tool calls on
+# /v1/chat/completions and return a 400 RequestError directing callers to
+# use /v1/responses instead.
+def uses_responses_api?
+ @model.match?(/\Agpt-5/)
+end
+```
+
+`\A` anchors at the start of the string, so any model id whose name begins
+with the literal `gpt-5` (case-sensitive) is routed to the Responses API.
+Everything else goes to Chat Completions.
+
+The check is invoked once per `#chat` call:
+
+```ruby
+# lib/dispatch/adapter/copilot.rb (inside #chat)
+if uses_responses_api?
+ if stream
+ chat_streaming_responses(...) # POST /v1/responses (SSE)
+ else
+ chat_non_streaming_responses(...) # POST /v1/responses
+ end
+else
+ # build chat-completions body
+ if stream
+ chat_streaming(...) # POST /v1/chat/completions (SSE)
+ else
+ chat_non_streaming(...) # POST /v1/chat/completions
+ end
+end
+```
+
+The four code paths are:
+
+| Path | Method | Endpoint | Streamed? |
+|---|---|---|---|
+| Responses, streaming | `chat_streaming_responses` | `POST /v1/responses` | yes |
+| Responses, blocking | `chat_non_streaming_responses` | `POST /v1/responses` | no |
+| Chat, streaming | `chat_streaming` | `POST /v1/chat/completions` | yes |
+| Chat, blocking | `chat_non_streaming` | `POST /v1/chat/completions` | no |
+
+All four live in `lib/dispatch/adapter/copilot.rb`.
+
+## Body-shape differences (what the adapter rewrites silently)
+
+| Concept | `/v1/chat/completions` body | `/v1/responses` body |
+|---|---|---|
+| Conversation | `messages: [...]` | `input: [...]` |
+| Token cap | `max_tokens` (or `max_completion_tokens` on o*/gpt-5/gemini) | `max_output_tokens` |
+| Reasoning effort | `reasoning_effort: "high"` | `reasoning: { effort: "high" }` |
+| Tool definition | `{ type: "function", function: { name, description, parameters } }` | `{ type: "function", name, description, parameters }` (no `function:` wrapper) |
+
+These transforms are handled inside the adapter — callers always pass the
+same `Dispatch::Adapter::ToolDefinition` / `Dispatch::Adapter::Message`
+structs and the same `thinking:` keyword.
+
+## Current model list and routing
+
+Source: `reference/models.txt` (lives one level up from this gem, in the
+parent `update-adapters/` workspace; format is `model_id,premium_multiplier`).
+
+| Model id | Premium multiplier | `\Agpt-5` match? | Endpoint |
+|---|---|---|---|
+| gpt-4.1 | 0.0 | ❌ | `/v1/chat/completions` |
+| gpt-4o | 0.0 | ❌ | `/v1/chat/completions` |
+| gpt-5-mini | 0.0 | ✅ | `/v1/responses` |
+| oswe-vscode-prime | 0.0 | ❌ | `/v1/chat/completions` |
+| grok-code-fast-1 | 0.25 | ❌ | `/v1/chat/completions` |
+| claude-haiku-4.5 | 0.33 | ❌ | `/v1/chat/completions` |
+| gemini-3-flash-preview | 0.33 | ❌ | `/v1/chat/completions` |
+| gpt-5.4-mini | 0.33 | ✅ | `/v1/responses` |
+| claude-sonnet-4 | 1.0 | ❌ | `/v1/chat/completions` |
+| claude-sonnet-4.5 | 1.0 | ❌ | `/v1/chat/completions` |
+| claude-sonnet-4.6 | 1.0 | ❌ | `/v1/chat/completions` |
+| gemini-2.5-pro | 1.0 | ❌ | `/v1/chat/completions` |
+| gemini-3.1-pro-preview | 1.0 | ❌ | `/v1/chat/completions` |
+| gpt-5.2 | 1.0 | ✅ | `/v1/responses` |
+| gpt-5.2-codex | 1.0 | ✅ | `/v1/responses` |
+| gpt-5.3-codex | 1.0 | ✅ | `/v1/responses` |
+| gpt-5.4 | 1.0 | ✅ | `/v1/responses` |
+| claude-opus-4.7 | 7.5 | ❌ | `/v1/chat/completions` |
+| gpt-5.5 | 7.5 | ✅ | `/v1/responses` |
+
+## Why a regex and not capability discovery?
+
+`GET https://api.githubcopilot.com/models` does NOT return a field that
+indicates which endpoint a given model accepts. A typical entry looks like:
+
+```json
+{
+ "id": "claude-3.7-sonnet",
+ "vendor": "Anthropic",
+ "model_picker_enabled": true,
+ "policy": { "state": "enabled" },
+ "capabilities": {
+ "family": "claude-3.7-sonnet",
+ "type": "chat",
+ "tokenizer": "o200k_base",
+ "limits": { "max_context_window_tokens": 200000, "max_output_tokens": 8192, "max_prompt_tokens": 90000 },
+ "supports": { "streaming": true, "tool_calls": true, "parallel_tool_calls": true, "vision": true }
+ }
+}
+```
+
+There is no `endpoints`, `api`, `responses_api`, or `chat_completions`
+flag. The signal that a model needs `/v1/responses` is the **400 error
+string** Copilot returns when you send tools + reasoning_effort to
+`/v1/chat/completions` for a GPT-5 family model:
+
+```
+Function tools with reasoning_effort are not supported for gpt-5.4 in
+/v1/chat/completions. Please use /v1/responses instead.
+```
+
+Hence the hardcoded `/\Agpt-5/` heuristic. See
+`GPT5_RESPONSES_API.md` for the original problem statement.
+
+## How to update this when GitHub adds new models
+
+When GitHub Copilot adds a new model that requires `/v1/responses`:
+
+1. **Edit the regex** in
+ `lib/dispatch/adapter/copilot.rb` at the `uses_responses_api?` method.
+ Add the new family to the alternation, e.g.:
+
+ ```ruby
+ def uses_responses_api?
+ @model.match?(/\A(?:gpt-5|gpt-6|codex-6|o5)/)
+ end
+ ```
+
+2. **Update the test expectations** in
+ `spec/dispatch/adapter/copilot_spec.rb`. Search for `uses_responses_api`
+ and `/\Agpt-5/` to find the relevant examples; both positive (a model
+ that should match) and negative (a model that shouldn't) cases need
+ updating.
+
+3. **Update the table above** in this file
+ (`ENDPOINT_ROUTING.md`) so the documented routing matches the code.
+
+4. **Update `reference/models.txt`** in the parent workspace if you also
+ want the new model listed for build/test scripts.
+
+5. **Bump the gem version** in
+ `lib/dispatch/adapter/version.rb` (minor bump for new model support,
+ patch for a regex tweak that just fixes routing for an existing
+ misclassified model).
+
+6. **Run the test gate** from inside this gem:
+ ```bash
+ bundle exec rubocop --autocorrect-all
+ bundle exec rspec
+ ```
+ Both must exit 0.
+
+## Alternative: probe-and-fallback (not currently implemented)
+
+A more durable design would catch the specific 400 error string from
+`/v1/chat/completions`, cache the offending model id, and retransmit on
+`/v1/responses`. Pros: zero hardcoded list. Cons: adds latency on the
+first request per new model per process and depends on the upstream
+error wording staying stable. The probe must include a tool definition
+to be reliable — sending a tool-less request to `/v1/chat/completions`
+will succeed for some GPT-5 variants and only the tools+reasoning combo
+triggers the rejection.
+
+## File reference (everything routing-related)
+
+| Path | What it contains |
+|---|---|
+| `lib/dispatch/adapter/copilot.rb` | `uses_responses_api?` (the regex), the `chat` dispatcher, all four code paths, body builders for both endpoints |
+| `lib/dispatch/adapter/version.rb` | Gem version constant |
+| `spec/dispatch/adapter/copilot_spec.rb` | Tests for both endpoint paths and the routing predicate |
+| `GPT5_RESPONSES_API.md` | Original problem statement — the 400 error from Copilot |
+| `ENDPOINT_ROUTING.md` | This file |
+| `../models.txt` | Workspace-level list of model ids and premium multipliers |