# Endpoint Routing — How the adapter picks `/v1/chat/completions` vs `/v1/responses` > **TL;DR** The decision is made by a single regex against the model id string. > No capability discovery, no flag, no per-request override. ## The decision The full routing logic lives in **one method**: `lib/dispatch/adapter/copilot.rb` ```ruby # Returns true when the selected model requires the /v1/responses endpoint. # This applies to GPT-5 reasoning models. These models reject tool calls on # /v1/chat/completions and return a 400 RequestError directing callers to # use /v1/responses instead. def uses_responses_api? @model.match?(/\Agpt-5/) end ``` `\A` anchors at the start of the string, so any model id whose name begins with the literal `gpt-5` (case-sensitive) is routed to the Responses API. Everything else goes to Chat Completions. The check is invoked once per `#chat` call: ```ruby # lib/dispatch/adapter/copilot.rb (inside #chat) if uses_responses_api? if stream chat_streaming_responses(...) # POST /v1/responses (SSE) else chat_non_streaming_responses(...) # POST /v1/responses end else # build chat-completions body if stream chat_streaming(...) # POST /v1/chat/completions (SSE) else chat_non_streaming(...) # POST /v1/chat/completions end end ``` The four code paths are: | Path | Method | Endpoint | Streamed? | |---|---|---|---| | Responses, streaming | `chat_streaming_responses` | `POST /v1/responses` | yes | | Responses, blocking | `chat_non_streaming_responses` | `POST /v1/responses` | no | | Chat, streaming | `chat_streaming` | `POST /v1/chat/completions` | yes | | Chat, blocking | `chat_non_streaming` | `POST /v1/chat/completions` | no | All four live in `lib/dispatch/adapter/copilot.rb`. ## Body-shape differences (what the adapter rewrites silently) | Concept | `/v1/chat/completions` body | `/v1/responses` body | |---|---|---| | Conversation | `messages: [...]` | `input: [...]` | | Token cap | `max_tokens` (or `max_completion_tokens` on o*/gpt-5/gemini) | `max_output_tokens` | | Reasoning effort | `reasoning_effort: "high"` | `reasoning: { effort: "high" }` | | Tool definition | `{ type: "function", function: { name, description, parameters } }` | `{ type: "function", name, description, parameters }` (no `function:` wrapper) | These transforms are handled inside the adapter — callers always pass the same `Dispatch::Adapter::ToolDefinition` / `Dispatch::Adapter::Message` structs and the same `thinking:` keyword. ## Current model list and routing Source: `reference/models.txt` (lives one level up from this gem, in the parent `update-adapters/` workspace; format is `model_id,premium_multiplier`). | Model id | Premium multiplier | `\Agpt-5` match? | Endpoint | |---|---|---|---| | gpt-4.1 | 0.0 | ❌ | `/v1/chat/completions` | | gpt-4o | 0.0 | ❌ | `/v1/chat/completions` | | gpt-5-mini | 0.0 | ✅ | `/v1/responses` | | oswe-vscode-prime | 0.0 | ❌ | `/v1/chat/completions` | | grok-code-fast-1 | 0.25 | ❌ | `/v1/chat/completions` | | claude-haiku-4.5 | 0.33 | ❌ | `/v1/chat/completions` | | gemini-3-flash-preview | 0.33 | ❌ | `/v1/chat/completions` | | gpt-5.4-mini | 0.33 | ✅ | `/v1/responses` | | claude-sonnet-4 | 1.0 | ❌ | `/v1/chat/completions` | | claude-sonnet-4.5 | 1.0 | ❌ | `/v1/chat/completions` | | claude-sonnet-4.6 | 1.0 | ❌ | `/v1/chat/completions` | | gemini-2.5-pro | 1.0 | ❌ | `/v1/chat/completions` | | gemini-3.1-pro-preview | 1.0 | ❌ | `/v1/chat/completions` | | gpt-5.2 | 1.0 | ✅ | `/v1/responses` | | gpt-5.2-codex | 1.0 | ✅ | `/v1/responses` | | gpt-5.3-codex | 1.0 | ✅ | `/v1/responses` | | gpt-5.4 | 1.0 | ✅ | `/v1/responses` | | claude-opus-4.7 | 7.5 | ❌ | `/v1/chat/completions` | | gpt-5.5 | 7.5 | ✅ | `/v1/responses` | ## Why a regex and not capability discovery? `GET https://api.githubcopilot.com/models` does NOT return a field that indicates which endpoint a given model accepts. A typical entry looks like: ```json { "id": "claude-3.7-sonnet", "vendor": "Anthropic", "model_picker_enabled": true, "policy": { "state": "enabled" }, "capabilities": { "family": "claude-3.7-sonnet", "type": "chat", "tokenizer": "o200k_base", "limits": { "max_context_window_tokens": 200000, "max_output_tokens": 8192, "max_prompt_tokens": 90000 }, "supports": { "streaming": true, "tool_calls": true, "parallel_tool_calls": true, "vision": true } } } ``` There is no `endpoints`, `api`, `responses_api`, or `chat_completions` flag. The signal that a model needs `/v1/responses` is the **400 error string** Copilot returns when you send tools + reasoning_effort to `/v1/chat/completions` for a GPT-5 family model: ``` Function tools with reasoning_effort are not supported for gpt-5.4 in /v1/chat/completions. Please use /v1/responses instead. ``` Hence the hardcoded `/\Agpt-5/` heuristic. See `GPT5_RESPONSES_API.md` for the original problem statement. ## How to update this when GitHub adds new models When GitHub Copilot adds a new model that requires `/v1/responses`: 1. **Edit the regex** in `lib/dispatch/adapter/copilot.rb` at the `uses_responses_api?` method. Add the new family to the alternation, e.g.: ```ruby def uses_responses_api? @model.match?(/\A(?:gpt-5|gpt-6|codex-6|o5)/) end ``` 2. **Update the test expectations** in `spec/dispatch/adapter/copilot_spec.rb`. Search for `uses_responses_api` and `/\Agpt-5/` to find the relevant examples; both positive (a model that should match) and negative (a model that shouldn't) cases need updating. 3. **Update the table above** in this file (`ENDPOINT_ROUTING.md`) so the documented routing matches the code. 4. **Update `reference/models.txt`** in the parent workspace if you also want the new model listed for build/test scripts. 5. **Bump the gem version** in `lib/dispatch/adapter/version.rb` (minor bump for new model support, patch for a regex tweak that just fixes routing for an existing misclassified model). 6. **Run the test gate** from inside this gem: ```bash bundle exec rubocop --autocorrect-all bundle exec rspec ``` Both must exit 0. ## Alternative: probe-and-fallback (not currently implemented) A more durable design would catch the specific 400 error string from `/v1/chat/completions`, cache the offending model id, and retransmit on `/v1/responses`. Pros: zero hardcoded list. Cons: adds latency on the first request per new model per process and depends on the upstream error wording staying stable. The probe must include a tool definition to be reliable — sending a tool-less request to `/v1/chat/completions` will succeed for some GPT-5 variants and only the tools+reasoning combo triggers the rejection. ## File reference (everything routing-related) | Path | What it contains | |---|---| | `lib/dispatch/adapter/copilot.rb` | `uses_responses_api?` (the regex), the `chat` dispatcher, all four code paths, body builders for both endpoints | | `lib/dispatch/adapter/version.rb` | Gem version constant | | `spec/dispatch/adapter/copilot_spec.rb` | Tests for both endpoint paths and the routing predicate | | `GPT5_RESPONSES_API.md` | Original problem statement — the 400 error from Copilot | | `ENDPOINT_ROUTING.md` | This file | | `../models.txt` | Workspace-level list of model ids and premium multipliers |