diff options
| author | Adam Malczewski <[email protected]> | 2026-04-30 18:06:07 +0900 |
|---|---|---|
| committer | Adam Malczewski <[email protected]> | 2026-04-30 18:06:07 +0900 |
| commit | 9be8821368deff024eafedeea55a614f9a9468cf (patch) | |
| tree | 43d70e2e8d6ac31e288f8f99b71555c051db0b19 /ENDPOINT_ROUTING.md | |
| parent | 5c9b8f5142198bdf230d500b5101322a22235670 (diff) | |
| download | dispatch-adapter-copilot-9be8821368deff024eafedeea55a614f9a9468cf.tar.gz dispatch-adapter-copilot-9be8821368deff024eafedeea55a614f9a9468cf.zip | |
Diffstat (limited to 'ENDPOINT_ROUTING.md')
| -rw-r--r-- | ENDPOINT_ROUTING.md | 189 |
1 files changed, 189 insertions, 0 deletions
diff --git a/ENDPOINT_ROUTING.md b/ENDPOINT_ROUTING.md new file mode 100644 index 0000000..e4ab523 --- /dev/null +++ b/ENDPOINT_ROUTING.md @@ -0,0 +1,189 @@ +# Endpoint Routing — How the adapter picks `/v1/chat/completions` vs `/v1/responses` + +> **TL;DR** The decision is made by a single regex against the model id string. +> No capability discovery, no flag, no per-request override. + +## The decision + +The full routing logic lives in **one method**: + +`lib/dispatch/adapter/copilot.rb` + +```ruby +# Returns true when the selected model requires the /v1/responses endpoint. +# This applies to GPT-5 reasoning models. These models reject tool calls on +# /v1/chat/completions and return a 400 RequestError directing callers to +# use /v1/responses instead. +def uses_responses_api? + @model.match?(/\Agpt-5/) +end +``` + +`\A` anchors at the start of the string, so any model id whose name begins +with the literal `gpt-5` (case-sensitive) is routed to the Responses API. +Everything else goes to Chat Completions. + +The check is invoked once per `#chat` call: + +```ruby +# lib/dispatch/adapter/copilot.rb (inside #chat) +if uses_responses_api? + if stream + chat_streaming_responses(...) # POST /v1/responses (SSE) + else + chat_non_streaming_responses(...) # POST /v1/responses + end +else + # build chat-completions body + if stream + chat_streaming(...) # POST /v1/chat/completions (SSE) + else + chat_non_streaming(...) # POST /v1/chat/completions + end +end +``` + +The four code paths are: + +| Path | Method | Endpoint | Streamed? | +|---|---|---|---| +| Responses, streaming | `chat_streaming_responses` | `POST /v1/responses` | yes | +| Responses, blocking | `chat_non_streaming_responses` | `POST /v1/responses` | no | +| Chat, streaming | `chat_streaming` | `POST /v1/chat/completions` | yes | +| Chat, blocking | `chat_non_streaming` | `POST /v1/chat/completions` | no | + +All four live in `lib/dispatch/adapter/copilot.rb`. + +## Body-shape differences (what the adapter rewrites silently) + +| Concept | `/v1/chat/completions` body | `/v1/responses` body | +|---|---|---| +| Conversation | `messages: [...]` | `input: [...]` | +| Token cap | `max_tokens` (or `max_completion_tokens` on o*/gpt-5/gemini) | `max_output_tokens` | +| Reasoning effort | `reasoning_effort: "high"` | `reasoning: { effort: "high" }` | +| Tool definition | `{ type: "function", function: { name, description, parameters } }` | `{ type: "function", name, description, parameters }` (no `function:` wrapper) | + +These transforms are handled inside the adapter — callers always pass the +same `Dispatch::Adapter::ToolDefinition` / `Dispatch::Adapter::Message` +structs and the same `thinking:` keyword. + +## Current model list and routing + +Source: `reference/models.txt` (lives one level up from this gem, in the +parent `update-adapters/` workspace; format is `model_id,premium_multiplier`). + +| Model id | Premium multiplier | `\Agpt-5` match? | Endpoint | +|---|---|---|---| +| gpt-4.1 | 0.0 | ❌ | `/v1/chat/completions` | +| gpt-4o | 0.0 | ❌ | `/v1/chat/completions` | +| gpt-5-mini | 0.0 | ✅ | `/v1/responses` | +| oswe-vscode-prime | 0.0 | ❌ | `/v1/chat/completions` | +| grok-code-fast-1 | 0.25 | ❌ | `/v1/chat/completions` | +| claude-haiku-4.5 | 0.33 | ❌ | `/v1/chat/completions` | +| gemini-3-flash-preview | 0.33 | ❌ | `/v1/chat/completions` | +| gpt-5.4-mini | 0.33 | ✅ | `/v1/responses` | +| claude-sonnet-4 | 1.0 | ❌ | `/v1/chat/completions` | +| claude-sonnet-4.5 | 1.0 | ❌ | `/v1/chat/completions` | +| claude-sonnet-4.6 | 1.0 | ❌ | `/v1/chat/completions` | +| gemini-2.5-pro | 1.0 | ❌ | `/v1/chat/completions` | +| gemini-3.1-pro-preview | 1.0 | ❌ | `/v1/chat/completions` | +| gpt-5.2 | 1.0 | ✅ | `/v1/responses` | +| gpt-5.2-codex | 1.0 | ✅ | `/v1/responses` | +| gpt-5.3-codex | 1.0 | ✅ | `/v1/responses` | +| gpt-5.4 | 1.0 | ✅ | `/v1/responses` | +| claude-opus-4.7 | 7.5 | ❌ | `/v1/chat/completions` | +| gpt-5.5 | 7.5 | ✅ | `/v1/responses` | + +## Why a regex and not capability discovery? + +`GET https://api.githubcopilot.com/models` does NOT return a field that +indicates which endpoint a given model accepts. A typical entry looks like: + +```json +{ + "id": "claude-3.7-sonnet", + "vendor": "Anthropic", + "model_picker_enabled": true, + "policy": { "state": "enabled" }, + "capabilities": { + "family": "claude-3.7-sonnet", + "type": "chat", + "tokenizer": "o200k_base", + "limits": { "max_context_window_tokens": 200000, "max_output_tokens": 8192, "max_prompt_tokens": 90000 }, + "supports": { "streaming": true, "tool_calls": true, "parallel_tool_calls": true, "vision": true } + } +} +``` + +There is no `endpoints`, `api`, `responses_api`, or `chat_completions` +flag. The signal that a model needs `/v1/responses` is the **400 error +string** Copilot returns when you send tools + reasoning_effort to +`/v1/chat/completions` for a GPT-5 family model: + +``` +Function tools with reasoning_effort are not supported for gpt-5.4 in +/v1/chat/completions. Please use /v1/responses instead. +``` + +Hence the hardcoded `/\Agpt-5/` heuristic. See +`GPT5_RESPONSES_API.md` for the original problem statement. + +## How to update this when GitHub adds new models + +When GitHub Copilot adds a new model that requires `/v1/responses`: + +1. **Edit the regex** in + `lib/dispatch/adapter/copilot.rb` at the `uses_responses_api?` method. + Add the new family to the alternation, e.g.: + + ```ruby + def uses_responses_api? + @model.match?(/\A(?:gpt-5|gpt-6|codex-6|o5)/) + end + ``` + +2. **Update the test expectations** in + `spec/dispatch/adapter/copilot_spec.rb`. Search for `uses_responses_api` + and `/\Agpt-5/` to find the relevant examples; both positive (a model + that should match) and negative (a model that shouldn't) cases need + updating. + +3. **Update the table above** in this file + (`ENDPOINT_ROUTING.md`) so the documented routing matches the code. + +4. **Update `reference/models.txt`** in the parent workspace if you also + want the new model listed for build/test scripts. + +5. **Bump the gem version** in + `lib/dispatch/adapter/version.rb` (minor bump for new model support, + patch for a regex tweak that just fixes routing for an existing + misclassified model). + +6. **Run the test gate** from inside this gem: + ```bash + bundle exec rubocop --autocorrect-all + bundle exec rspec + ``` + Both must exit 0. + +## Alternative: probe-and-fallback (not currently implemented) + +A more durable design would catch the specific 400 error string from +`/v1/chat/completions`, cache the offending model id, and retransmit on +`/v1/responses`. Pros: zero hardcoded list. Cons: adds latency on the +first request per new model per process and depends on the upstream +error wording staying stable. The probe must include a tool definition +to be reliable — sending a tool-less request to `/v1/chat/completions` +will succeed for some GPT-5 variants and only the tools+reasoning combo +triggers the rejection. + +## File reference (everything routing-related) + +| Path | What it contains | +|---|---| +| `lib/dispatch/adapter/copilot.rb` | `uses_responses_api?` (the regex), the `chat` dispatcher, all four code paths, body builders for both endpoints | +| `lib/dispatch/adapter/version.rb` | Gem version constant | +| `spec/dispatch/adapter/copilot_spec.rb` | Tests for both endpoint paths and the routing predicate | +| `GPT5_RESPONSES_API.md` | Original problem statement — the 400 error from Copilot | +| `ENDPOINT_ROUTING.md` | This file | +| `../models.txt` | Workspace-level list of model ids and premium multipliers | |
