summaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorAdam Malczewski <[email protected]>2026-04-30 18:06:07 +0900
committerAdam Malczewski <[email protected]>2026-04-30 18:06:07 +0900
commit9be8821368deff024eafedeea55a614f9a9468cf (patch)
tree43d70e2e8d6ac31e288f8f99b71555c051db0b19
parent5c9b8f5142198bdf230d500b5101322a22235670 (diff)
downloaddispatch-adapter-copilot-main.tar.gz
dispatch-adapter-copilot-main.zip
update to correctly use new api for newer models.HEADmain
-rw-r--r--.rubocop.yml4
-rw-r--r--ENDPOINT_ROUTING.md189
-rw-r--r--GPT5_RESPONSES_API.md68
-rw-r--r--Gemfile.lock4
-rw-r--r--dispatch-adapter-copilot.gemspec3
-rw-r--r--lib/dispatch/adapter/copilot.rb386
-rw-r--r--spec/dispatch/adapter/copilot_spec.rb430
7 files changed, 1062 insertions, 22 deletions
diff --git a/.rubocop.yml b/.rubocop.yml
index ff78dd3..cc23375 100644
--- a/.rubocop.yml
+++ b/.rubocop.yml
@@ -47,3 +47,7 @@ Style/Documentation:
Style/RedundantStructKeywordInit:
Enabled: false
+
+Lint/EmptyBlock:
+ Exclude:
+ - "spec/**/*"
diff --git a/ENDPOINT_ROUTING.md b/ENDPOINT_ROUTING.md
new file mode 100644
index 0000000..e4ab523
--- /dev/null
+++ b/ENDPOINT_ROUTING.md
@@ -0,0 +1,189 @@
+# Endpoint Routing — How the adapter picks `/v1/chat/completions` vs `/v1/responses`
+
+> **TL;DR** The decision is made by a single regex against the model id string.
+> No capability discovery, no flag, no per-request override.
+
+## The decision
+
+The full routing logic lives in **one method**:
+
+`lib/dispatch/adapter/copilot.rb`
+
+```ruby
+# Returns true when the selected model requires the /v1/responses endpoint.
+# This applies to GPT-5 reasoning models. These models reject tool calls on
+# /v1/chat/completions and return a 400 RequestError directing callers to
+# use /v1/responses instead.
+def uses_responses_api?
+ @model.match?(/\Agpt-5/)
+end
+```
+
+`\A` anchors at the start of the string, so any model id whose name begins
+with the literal `gpt-5` (case-sensitive) is routed to the Responses API.
+Everything else goes to Chat Completions.
+
+The check is invoked once per `#chat` call:
+
+```ruby
+# lib/dispatch/adapter/copilot.rb (inside #chat)
+if uses_responses_api?
+ if stream
+ chat_streaming_responses(...) # POST /v1/responses (SSE)
+ else
+ chat_non_streaming_responses(...) # POST /v1/responses
+ end
+else
+ # build chat-completions body
+ if stream
+ chat_streaming(...) # POST /v1/chat/completions (SSE)
+ else
+ chat_non_streaming(...) # POST /v1/chat/completions
+ end
+end
+```
+
+The four code paths are:
+
+| Path | Method | Endpoint | Streamed? |
+|---|---|---|---|
+| Responses, streaming | `chat_streaming_responses` | `POST /v1/responses` | yes |
+| Responses, blocking | `chat_non_streaming_responses` | `POST /v1/responses` | no |
+| Chat, streaming | `chat_streaming` | `POST /v1/chat/completions` | yes |
+| Chat, blocking | `chat_non_streaming` | `POST /v1/chat/completions` | no |
+
+All four live in `lib/dispatch/adapter/copilot.rb`.
+
+## Body-shape differences (what the adapter rewrites silently)
+
+| Concept | `/v1/chat/completions` body | `/v1/responses` body |
+|---|---|---|
+| Conversation | `messages: [...]` | `input: [...]` |
+| Token cap | `max_tokens` (or `max_completion_tokens` on o*/gpt-5/gemini) | `max_output_tokens` |
+| Reasoning effort | `reasoning_effort: "high"` | `reasoning: { effort: "high" }` |
+| Tool definition | `{ type: "function", function: { name, description, parameters } }` | `{ type: "function", name, description, parameters }` (no `function:` wrapper) |
+
+These transforms are handled inside the adapter — callers always pass the
+same `Dispatch::Adapter::ToolDefinition` / `Dispatch::Adapter::Message`
+structs and the same `thinking:` keyword.
+
+## Current model list and routing
+
+Source: `reference/models.txt` (lives one level up from this gem, in the
+parent `update-adapters/` workspace; format is `model_id,premium_multiplier`).
+
+| Model id | Premium multiplier | `\Agpt-5` match? | Endpoint |
+|---|---|---|---|
+| gpt-4.1 | 0.0 | ❌ | `/v1/chat/completions` |
+| gpt-4o | 0.0 | ❌ | `/v1/chat/completions` |
+| gpt-5-mini | 0.0 | ✅ | `/v1/responses` |
+| oswe-vscode-prime | 0.0 | ❌ | `/v1/chat/completions` |
+| grok-code-fast-1 | 0.25 | ❌ | `/v1/chat/completions` |
+| claude-haiku-4.5 | 0.33 | ❌ | `/v1/chat/completions` |
+| gemini-3-flash-preview | 0.33 | ❌ | `/v1/chat/completions` |
+| gpt-5.4-mini | 0.33 | ✅ | `/v1/responses` |
+| claude-sonnet-4 | 1.0 | ❌ | `/v1/chat/completions` |
+| claude-sonnet-4.5 | 1.0 | ❌ | `/v1/chat/completions` |
+| claude-sonnet-4.6 | 1.0 | ❌ | `/v1/chat/completions` |
+| gemini-2.5-pro | 1.0 | ❌ | `/v1/chat/completions` |
+| gemini-3.1-pro-preview | 1.0 | ❌ | `/v1/chat/completions` |
+| gpt-5.2 | 1.0 | ✅ | `/v1/responses` |
+| gpt-5.2-codex | 1.0 | ✅ | `/v1/responses` |
+| gpt-5.3-codex | 1.0 | ✅ | `/v1/responses` |
+| gpt-5.4 | 1.0 | ✅ | `/v1/responses` |
+| claude-opus-4.7 | 7.5 | ❌ | `/v1/chat/completions` |
+| gpt-5.5 | 7.5 | ✅ | `/v1/responses` |
+
+## Why a regex and not capability discovery?
+
+`GET https://api.githubcopilot.com/models` does NOT return a field that
+indicates which endpoint a given model accepts. A typical entry looks like:
+
+```json
+{
+ "id": "claude-3.7-sonnet",
+ "vendor": "Anthropic",
+ "model_picker_enabled": true,
+ "policy": { "state": "enabled" },
+ "capabilities": {
+ "family": "claude-3.7-sonnet",
+ "type": "chat",
+ "tokenizer": "o200k_base",
+ "limits": { "max_context_window_tokens": 200000, "max_output_tokens": 8192, "max_prompt_tokens": 90000 },
+ "supports": { "streaming": true, "tool_calls": true, "parallel_tool_calls": true, "vision": true }
+ }
+}
+```
+
+There is no `endpoints`, `api`, `responses_api`, or `chat_completions`
+flag. The signal that a model needs `/v1/responses` is the **400 error
+string** Copilot returns when you send tools + reasoning_effort to
+`/v1/chat/completions` for a GPT-5 family model:
+
+```
+Function tools with reasoning_effort are not supported for gpt-5.4 in
+/v1/chat/completions. Please use /v1/responses instead.
+```
+
+Hence the hardcoded `/\Agpt-5/` heuristic. See
+`GPT5_RESPONSES_API.md` for the original problem statement.
+
+## How to update this when GitHub adds new models
+
+When GitHub Copilot adds a new model that requires `/v1/responses`:
+
+1. **Edit the regex** in
+ `lib/dispatch/adapter/copilot.rb` at the `uses_responses_api?` method.
+ Add the new family to the alternation, e.g.:
+
+ ```ruby
+ def uses_responses_api?
+ @model.match?(/\A(?:gpt-5|gpt-6|codex-6|o5)/)
+ end
+ ```
+
+2. **Update the test expectations** in
+ `spec/dispatch/adapter/copilot_spec.rb`. Search for `uses_responses_api`
+ and `/\Agpt-5/` to find the relevant examples; both positive (a model
+ that should match) and negative (a model that shouldn't) cases need
+ updating.
+
+3. **Update the table above** in this file
+ (`ENDPOINT_ROUTING.md`) so the documented routing matches the code.
+
+4. **Update `reference/models.txt`** in the parent workspace if you also
+ want the new model listed for build/test scripts.
+
+5. **Bump the gem version** in
+ `lib/dispatch/adapter/version.rb` (minor bump for new model support,
+ patch for a regex tweak that just fixes routing for an existing
+ misclassified model).
+
+6. **Run the test gate** from inside this gem:
+ ```bash
+ bundle exec rubocop --autocorrect-all
+ bundle exec rspec
+ ```
+ Both must exit 0.
+
+## Alternative: probe-and-fallback (not currently implemented)
+
+A more durable design would catch the specific 400 error string from
+`/v1/chat/completions`, cache the offending model id, and retransmit on
+`/v1/responses`. Pros: zero hardcoded list. Cons: adds latency on the
+first request per new model per process and depends on the upstream
+error wording staying stable. The probe must include a tool definition
+to be reliable — sending a tool-less request to `/v1/chat/completions`
+will succeed for some GPT-5 variants and only the tools+reasoning combo
+triggers the rejection.
+
+## File reference (everything routing-related)
+
+| Path | What it contains |
+|---|---|
+| `lib/dispatch/adapter/copilot.rb` | `uses_responses_api?` (the regex), the `chat` dispatcher, all four code paths, body builders for both endpoints |
+| `lib/dispatch/adapter/version.rb` | Gem version constant |
+| `spec/dispatch/adapter/copilot_spec.rb` | Tests for both endpoint paths and the routing predicate |
+| `GPT5_RESPONSES_API.md` | Original problem statement — the 400 error from Copilot |
+| `ENDPOINT_ROUTING.md` | This file |
+| `../models.txt` | Workspace-level list of model ids and premium multipliers |
diff --git a/GPT5_RESPONSES_API.md b/GPT5_RESPONSES_API.md
new file mode 100644
index 0000000..8dec8cf
--- /dev/null
+++ b/GPT5_RESPONSES_API.md
@@ -0,0 +1,68 @@
+# GPT-5.4 + Tool Calls — Requires `/v1/responses` API
+
+## Problem
+
+When `build.rb` selects the `gpt-5.4` model and sends a request with tool
+definitions, the Copilot API responds with:
+
+```
+Dispatch::Adapter::RequestError: Function tools with reasoning_effort are
+not supported for gpt-5.4 in /v1/chat/completions. Please use /v1/responses instead.
+```
+
+`dispatch-adapter-copilot` currently targets `/v1/chat/completions` for all
+models. GPT-5.4 is a reasoning model that requires the newer `/v1/responses`
+endpoint when tool calls are involved.
+
+---
+
+## Background
+
+### `/v1/chat/completions`
+OpenAI's original chat API. Stateless: you send the full `messages` history,
+get back `choices`. Tool calls via `tools` + `tool_calls` are supported. Works
+for all models up to GPT-4o.
+
+### `/v1/responses`
+Introduced for reasoning models (o1, o3, GPT-5+). Key differences:
+
+- Uses `input` instead of `messages` for the conversation history.
+- Exposes a `reasoning_effort` parameter (`low` / `medium` / `high`).
+- Optionally stateful via `previous_response_id` (server keeps history).
+- **Required** for tool use on reasoning/GPT-5 models — OpenAI removed
+ function-call support from Chat Completions for these models.
+
+GPT-5.4 was added to the GitHub Copilot model catalog but brings the
+Responses API requirement with it. The adapter was written before this model
+existed, so it has no Responses API support.
+
+---
+
+## What Needs to Be Done
+
+To support GPT-5.4 (and future reasoning models) with tool calls:
+
+1. **Detect reasoning models** — identify which model IDs require the
+ Responses API (e.g. anything matching `gpt-5.*` or carrying a
+ `reasoning` capability flag in the `/models` response).
+
+2. **Implement a Responses API code path** in `dispatch-adapter-copilot`:
+ - Endpoint: `POST /v1/responses` (not `/v1/chat/completions`).
+ - Request shape: `input` array instead of `messages`.
+ - Response shape: different structure — parse accordingly.
+ - Map `Dispatch::Adapter` tool definitions and result blocks to the
+ Responses API format.
+ - Handle `reasoning_effort` (expose as an adapter option or auto-set
+ to `medium`).
+
+3. **Route per model** — the adapter should check the model ID and choose
+ the correct endpoint at request time, keeping Chat Completions for all
+ non-reasoning models.
+
+---
+
+## Workaround (until implemented)
+
+Use `sonnet-4.6` instead of `gpt-5.4` in `build.rb`'s interactive menu.
+Claude Sonnet 4.6 (routed via Copilot's `/v1/chat/completions`) fully
+supports tool calls and has no Responses API requirement.
diff --git a/Gemfile.lock b/Gemfile.lock
index f7595b0..f3aa5a6 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -1,7 +1,7 @@
PATH
remote: ../dispatch-adapter-interface
specs:
- dispatch-adapter-interface (0.2.0)
+ dispatch-adapter-interface (0.3.0)
PATH
remote: .
@@ -114,7 +114,7 @@ CHECKSUMS
date (3.5.1) sha256=750d06384d7b9c15d562c76291407d89e368dda4d4fff957eb94962d325a0dc0
diff-lcs (1.6.2) sha256=9ae0d2cba7d4df3075fe8cd8602a8604993efc0dfa934cff568969efb1909962
dispatch-adapter-copilot (0.4.0)
- dispatch-adapter-interface (0.2.0)
+ dispatch-adapter-interface (0.3.0)
erb (6.0.2) sha256=9fe6264d44f79422c87490a1558479bd0e7dad4dd0e317656e67ea3077b5242b
hashdiff (1.2.1) sha256=9c079dbc513dfc8833ab59c0c2d8f230fa28499cc5efb4b8dd276cf931457cd1
io-console (0.8.2) sha256=d6e3ae7a7cc7574f4b8893b4fca2162e57a825b223a177b7afa236c5ef9814cc
diff --git a/dispatch-adapter-copilot.gemspec b/dispatch-adapter-copilot.gemspec
index 4dbdb1e..2ecf345 100644
--- a/dispatch-adapter-copilot.gemspec
+++ b/dispatch-adapter-copilot.gemspec
@@ -25,7 +25,8 @@ Gem::Specification.new do |spec|
(f == gemspec) ||
f.start_with?(*%w[bin/ Gemfile .gitignore .rspec spec/ .rubocop.yml])
end
- end.select { |f| File.exist?(File.join(__dir__, f)) }
+ end
+ spec.files.select! { |f| File.exist?(File.join(__dir__, f)) }
spec.bindir = "exe"
spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
spec.require_paths = ["lib"]
diff --git a/lib/dispatch/adapter/copilot.rb b/lib/dispatch/adapter/copilot.rb
index 7355df8..445adff 100644
--- a/lib/dispatch/adapter/copilot.rb
+++ b/lib/dispatch/adapter/copilot.rb
@@ -82,29 +82,38 @@ module Dispatch
def chat(messages, system: nil, tools: [], stream: false, max_tokens: nil, thinking: :default, &)
ensure_authenticated!
- wire_messages = build_wire_messages(messages, system)
- wire_tools = build_wire_tools(tools)
effective_max_tokens = max_tokens || @default_max_tokens
effective_thinking = thinking == :default ? @default_thinking : thinking
validate_thinking_level!(effective_thinking)
- body = {
- model: @model,
- messages: wire_messages,
- stream: stream
- }
- if uses_max_completion_tokens?
- body[:max_completion_tokens] = effective_max_tokens
+ if uses_responses_api?
+ if stream
+ chat_streaming_responses(messages, system, tools, effective_max_tokens, effective_thinking, &)
+ else
+ chat_non_streaming_responses(messages, system, tools, effective_max_tokens, effective_thinking)
+ end
else
- body[:max_tokens] = effective_max_tokens
- end
- body[:tools] = wire_tools unless wire_tools.empty?
- body[:reasoning_effort] = effective_thinking if effective_thinking
+ wire_messages = build_wire_messages(messages, system)
+ wire_tools = build_wire_tools(tools)
- if stream
- chat_streaming(body, &)
- else
- chat_non_streaming(body)
+ body = {
+ model: @model,
+ messages: wire_messages,
+ stream: stream
+ }
+ if uses_max_completion_tokens?
+ body[:max_completion_tokens] = effective_max_tokens
+ else
+ body[:max_tokens] = effective_max_tokens
+ end
+ body[:tools] = wire_tools unless wire_tools.empty?
+ body[:reasoning_effort] = effective_thinking if effective_thinking
+
+ if stream
+ chat_streaming(body, &)
+ else
+ chat_non_streaming(body)
+ end
end
end
@@ -189,6 +198,14 @@ module Dispatch
@model.match?(/o[1-9]|gpt-5|gemini/)
end
+ # Returns true when the selected model requires the /v1/responses endpoint.
+ # This applies to GPT-5 reasoning models. These models reject tool calls on
+ # /v1/chat/completions and return a 400 RequestError directing callers to
+ # use /v1/responses instead.
+ def uses_responses_api?
+ @model.match?(/\Agpt-5/)
+ end
+
def default_token_path
File.join(Dir.home, ".config", "dispatch", "copilot_github_token")
end
@@ -440,6 +457,92 @@ module Dispatch
merge_consecutive_roles(wire)
end
+ # Converts canonical messages to the flat `input` array required by
+ # POST /v1/responses. System prompt is prepended as a system-role item.
+ # The Responses API does not support a top-level `system` parameter —
+ # the system message must be the first element of `input`.
+ def build_responses_api_input(messages, system)
+ input = []
+ input << { role: "system", content: system } if system
+
+ messages.each do |msg|
+ input.concat(convert_message_to_responses_input(msg))
+ end
+
+ input
+ end
+
+ # Converts a single canonical Message to one or more Responses API input
+ # items. Returns an Array (always) so results can be flat-concatenated.
+ def convert_message_to_responses_input(msg)
+ case msg.content
+ when String
+ [{ role: msg.role, content: msg.content }]
+ when Array
+ convert_content_blocks_to_responses_input(msg)
+ else
+ [{ role: msg.role, content: msg.content.to_s }]
+ end
+ end
+
+ # Converts an array of content blocks (TextBlock, ToolUseBlock,
+ # ToolResultBlock) from a single Message into Responses API input items.
+ #
+ # Key differences from the Chat Completions conversion:
+ # - ToolUseBlock → top-level {type: "function_call", ...} item (not nested
+ # under an assistant message role)
+ # - ToolResultBlock → top-level {type: "function_call_output", ...} item
+ # - TextBlock in assistant message → {role: "assistant", content: [{type:
+ # "output_text", text: "..."}]}
+ def convert_content_blocks_to_responses_input(msg)
+ items = []
+ text_parts = []
+
+ msg.content.each do |block|
+ case block
+ when TextBlock
+ text_parts << block.text
+ when ImageBlock
+ raise NotImplementedError, "ImageBlock is not yet supported by the Copilot adapter"
+ when ToolUseBlock
+ # Flush any accumulated text first as an assistant message
+ unless text_parts.empty?
+ items << {
+ role: "assistant",
+ content: [{ type: "output_text", text: text_parts.join("\n") }]
+ }
+ text_parts = []
+ end
+ items << {
+ type: "function_call",
+ call_id: block.id,
+ name: block.name,
+ arguments: JSON.generate(block.arguments)
+ }
+ when ToolResultBlock
+ items << {
+ type: "function_call_output",
+ call_id: block.tool_use_id,
+ output: tool_result_content(block)
+ }
+ end
+ end
+
+ # Flush any remaining text
+ unless text_parts.empty?
+ items << if msg.role == "assistant"
+ {
+ role: "assistant",
+ content: [{ type: "output_text", text: text_parts.join("\n") }]
+ }
+ else
+ { role: msg.role, content: text_parts.join("\n") }
+ end
+ end
+
+ items
+ end
+
def convert_message(msg)
case msg.content
when String
@@ -544,6 +647,45 @@ module Dispatch
end
end
+ # Assembles the full request body for POST /v1/responses.
+ #
+ # Key differences from the Chat Completions body:
+ # - Uses `input` instead of `messages`.
+ # - Uses `max_output_tokens` instead of `max_tokens`/`max_completion_tokens`.
+ # - Uses `reasoning: {effort:}` instead of `reasoning_effort`.
+ # - Tool definitions omit the `function` wrapper — name/description/parameters
+ # are top-level inside the tool object.
+ def build_responses_api_body(messages, system, tools, stream, max_tokens, thinking)
+ input = build_responses_api_input(messages, system)
+ wire_tools = build_responses_api_tools(tools)
+
+ body = {
+ model: @model,
+ input: input,
+ stream: stream,
+ max_output_tokens: max_tokens
+ }
+
+ body[:tools] = wire_tools unless wire_tools.empty?
+ body[:reasoning] = { effort: thinking } if thinking
+
+ body
+ end
+
+ # Converts ToolDefinition structs (or plain hashes) to the Responses API
+ # tool format. Unlike Chat Completions, there is no `function` wrapper —
+ # name, description, and parameters are direct keys on the tool object.
+ def build_responses_api_tools(tools)
+ tools.map do |td|
+ {
+ type: "function",
+ name: tool_attr(td, :name),
+ description: tool_attr(td, :description),
+ parameters: tool_attr(td, :parameters)
+ }
+ end
+ end
+
# --- Chat (non-streaming) ---
def chat_non_streaming(body)
@@ -603,6 +745,78 @@ module Dispatch
)
end
+ # Non-streaming chat via POST /v1/responses.
+ # Called when uses_responses_api? is true and stream is false.
+ def chat_non_streaming_responses(messages, system, tools, max_tokens, thinking)
+ @rate_limiter.wait!
+ body = build_responses_api_body(messages, system, tools, false, max_tokens, thinking)
+ wire_messages = build_responses_api_input(messages, system)
+
+ uri = URI("#{API_BASE}/responses")
+ request = Net::HTTP::Post.new(uri)
+ apply_headers!(request, initiator: x_initiator_for_responses(wire_messages))
+ request.body = JSON.generate(deep_utf8(body))
+
+ response = execute_request(uri, request)
+ data = parse_response!(response)
+ build_response_from_responses_api(data)
+ end
+
+ # Builds a canonical Response from a /v1/responses non-streaming body.
+ def build_response_from_responses_api(data)
+ output = data["output"] || []
+ text_parts = []
+ tool_calls = []
+
+ output.each do |item|
+ case item["type"]
+ when "message"
+ (item["content"] || []).each do |part|
+ text_parts << part["text"] if part["type"] == "output_text" && part["text"]
+ end
+ when "function_call"
+ tool_calls << ToolUseBlock.new(
+ id: item["call_id"] || item["id"],
+ name: item["name"],
+ arguments: parse_tool_arguments(item["arguments"])
+ )
+ end
+ end
+
+ stop_reason = tool_calls.any? ? :tool_use : :end_turn
+ content = text_parts.empty? ? nil : text_parts.join
+
+ usage_data = data["usage"] || {}
+ usage = Usage.new(
+ input_tokens: usage_data["input_tokens"] || 0,
+ output_tokens: usage_data["output_tokens"] || 0
+ )
+
+ Response.new(
+ content: content,
+ tool_calls: tool_calls,
+ model: data["model"] || @model,
+ stop_reason: stop_reason,
+ usage: usage
+ )
+ end
+
+ # Determines X-Initiator for a Responses API call.
+ # Same logic as x_initiator_for but operates on the already-built `input`
+ # array where items use `type: "function_call"` / `type: "function_call_output"`
+ # instead of role-based items.
+ def x_initiator_for_responses(input_items)
+ if input_items.any? do |item|
+ item[:role].to_s == "assistant" ||
+ item[:type].to_s == "function_call" ||
+ item[:type].to_s == "function_call_output"
+ end
+ "agent"
+ else
+ "user"
+ end
+ end
+
# Recursively coerces every String inside a wire-body to valid UTF-8.
#
# Tool results (grep output, file reads, shell stdout) frequently arrive
@@ -625,8 +839,6 @@ module Dispatch
obj.map { |v| deep_utf8(v) }
when Hash
obj.each_with_object({}) { |(k, v), h| h[k] = deep_utf8(v) }
- when Symbol
- obj
else
obj
end
@@ -772,6 +984,142 @@ module Dispatch
)
)
end
+
+ # Streaming chat via POST /v1/responses.
+ # Called when uses_responses_api? is true and stream is true.
+ def chat_streaming_responses(messages, system, tools, max_tokens, thinking, &block)
+ @rate_limiter.wait!
+ body = build_responses_api_body(messages, system, tools, true, max_tokens, thinking)
+ wire_input = build_responses_api_input(messages, system)
+
+ uri = URI("#{API_BASE}/responses")
+ request = Net::HTTP::Post.new(uri)
+ apply_headers!(request, initiator: x_initiator_for_responses(wire_input))
+ request.body = JSON.generate(deep_utf8(body))
+
+ collector = new_responses_stream_collector
+
+ execute_streaming_request(uri, request) do |response|
+ buffer = +""
+ response.read_body do |chunk|
+ buffer << chunk
+ process_responses_sse_buffer(buffer, collector, &block)
+ end
+ end
+
+ build_streaming_response_from_responses(collector)
+ end
+
+ def new_responses_stream_collector
+ {
+ # text_parts: Hash<output_index => String> — accumulated text fragments
+ text_parts: Hash.new { |h, k| h[k] = +"" },
+ # tool_calls: Hash<item_id => {call_id:, name:, arguments:}>
+ tool_calls: {},
+ # order: Array of [:text, output_index] or [:tool, item_id] in appearance order
+ order: [],
+ model: @model,
+ input_tokens: 0,
+ output_tokens: 0
+ }
+ end
+
+ def process_responses_sse_buffer(buffer, collector, &)
+ while (line_end = buffer.index("\n"))
+ line = buffer.slice!(0..line_end).strip
+ next if line.empty?
+ next unless line.start_with?("data: ")
+
+ data_str = line.delete_prefix("data: ")
+ next if data_str == "[DONE]"
+
+ data = JSON.parse(data_str)
+ process_responses_stream_event(data, collector, &)
+ end
+ rescue JSON::ParserError
+ nil
+ end
+
+ def process_responses_stream_event(data, collector, &block)
+ case data["type"]
+ when "response.output_item.added"
+ handle_responses_output_item_added(data, collector, &block)
+ when "response.output_text.delta"
+ output_index = data["output_index"] || 0
+ fragment = data["delta"].to_s
+ collector[:text_parts][output_index] << fragment
+ block.call(StreamDelta.new(type: :text_delta, text: fragment))
+ when "response.function_call_arguments.delta"
+ handle_responses_arguments_delta(data, collector, &block)
+ when "response.completed"
+ usage = data.dig("response", "usage") || {}
+ collector[:input_tokens] = usage["input_tokens"] || collector[:input_tokens]
+ collector[:output_tokens] = usage["output_tokens"] || collector[:output_tokens]
+ model = data.dig("response", "model")
+ collector[:model] = model if model
+ end
+ end
+
+ def handle_responses_output_item_added(data, collector, &block)
+ item = data["item"] || {}
+ case item["type"]
+ when "function_call"
+ item_id = item["id"]
+ collector[:tool_calls][item_id] = {
+ call_id: item["call_id"] || item_id,
+ name: item["name"] || "",
+ arguments: +""
+ }
+ collector[:order] << [:tool, item_id]
+ block.call(StreamDelta.new(
+ type: :tool_use_start,
+ tool_call_id: item["call_id"] || item_id,
+ tool_name: item["name"] || ""
+ ))
+ when "message"
+ output_index = data["output_index"] || 0
+ collector[:order] << [:text, output_index] unless collector[:order].any? { |t, i| t == :text && i == output_index }
+ end
+ end
+
+ def handle_responses_arguments_delta(data, collector, &block)
+ item_id = data["item_id"]
+ fragment = data["delta"].to_s
+ tc = collector[:tool_calls][item_id]
+ return unless tc
+
+ tc[:arguments] << fragment
+ block.call(StreamDelta.new(
+ type: :tool_use_delta,
+ tool_call_id: tc[:call_id],
+ argument_delta: fragment
+ ))
+ end
+
+ def build_streaming_response_from_responses(collector)
+ tool_calls = collector[:tool_calls].values.map do |tc|
+ ToolUseBlock.new(
+ id: tc[:call_id],
+ name: tc[:name],
+ arguments: parse_tool_arguments(tc[:arguments])
+ )
+ end
+
+ all_text = collector[:text_parts].keys.sort.map { |idx| collector[:text_parts][idx] }.join
+ content = all_text.empty? ? nil : all_text
+ stop_reason = tool_calls.any? ? :tool_use : :end_turn
+
+ Response.new(
+ content: content,
+ tool_calls: tool_calls,
+ model: collector[:model],
+ stop_reason: stop_reason,
+ usage: Usage.new(
+ input_tokens: collector[:input_tokens],
+ output_tokens: collector[:output_tokens]
+ )
+ )
+ end
end
end
end
diff --git a/spec/dispatch/adapter/copilot_spec.rb b/spec/dispatch/adapter/copilot_spec.rb
index 15bdcb0..6f54fef 100644
--- a/spec/dispatch/adapter/copilot_spec.rb
+++ b/spec/dispatch/adapter/copilot_spec.rb
@@ -1588,4 +1588,434 @@ RSpec.describe Dispatch::Adapter::Copilot do
expect { adapter.chat(messages) }.to raise_error(Dispatch::Adapter::ConnectionError)
end
end
+
+ describe "#chat via /v1/responses (reasoning models)" do
+ let(:responses_adapter) do
+ described_class.new(
+ model: "gpt-5.4",
+ github_token: github_token,
+ max_tokens: 4096,
+ thinking: "medium",
+ min_request_interval: 0
+ )
+ end
+
+ # Helper: build a well-formed /v1/responses non-streaming response body.
+ def responses_body(text: nil, tool_calls: [], model: "gpt-5.4",
+ input_tokens: 10, output_tokens: 5)
+ output = []
+ unless text.nil?
+ output << {
+ "type" => "message",
+ "id" => "msg_001",
+ "role" => "assistant",
+ "content" => [{ "type" => "output_text", "text" => text }]
+ }
+ end
+ tool_calls.each do |tc|
+ output << {
+ "type" => "function_call",
+ "id" => tc[:id],
+ "call_id" => tc[:id],
+ "name" => tc[:name],
+ "arguments" => JSON.generate(tc[:arguments])
+ }
+ end
+ {
+ "id" => "resp_001",
+ "object" => "response",
+ "model" => model,
+ "output" => output,
+ "usage" => {
+ "input_tokens" => input_tokens,
+ "output_tokens" => output_tokens,
+ "total_tokens" => input_tokens + output_tokens
+ }
+ }
+ end
+
+ context "with a text-only response" do
+ before do
+ stub_request(:post, "https://api.githubcopilot.com/responses")
+ .to_return(
+ status: 200,
+ body: JSON.generate(responses_body(text: "Hello from GPT-5!")),
+ headers: { "Content-Type" => "application/json" }
+ )
+ end
+
+ it "returns a Response with content" do
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")]
+ response = responses_adapter.chat(messages)
+
+ expect(response).to be_a(Dispatch::Adapter::Response)
+ expect(response.content).to eq("Hello from GPT-5!")
+ expect(response.tool_calls).to be_empty
+ expect(response.model).to eq("gpt-5.4")
+ expect(response.stop_reason).to eq(:end_turn)
+ expect(response.usage.input_tokens).to eq(10)
+ expect(response.usage.output_tokens).to eq(5)
+ end
+ end
+
+ context "with a tool call response" do
+ before do
+ stub_request(:post, "https://api.githubcopilot.com/responses")
+ .to_return(
+ status: 200,
+ body: JSON.generate(responses_body(
+ tool_calls: [{ id: "call_abc", name: "get_weather",
+ arguments: { "city" => "New York" } }]
+ )),
+ headers: { "Content-Type" => "application/json" }
+ )
+ end
+
+ it "returns a Response with tool_calls as ToolUseBlock array" do
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Weather?")]
+ response = responses_adapter.chat(messages)
+
+ expect(response.content).to be_nil
+ expect(response.stop_reason).to eq(:tool_use)
+ expect(response.tool_calls.size).to eq(1)
+
+ tc = response.tool_calls.first
+ expect(tc).to be_a(Dispatch::Adapter::ToolUseBlock)
+ expect(tc.id).to eq("call_abc")
+ expect(tc.name).to eq("get_weather")
+ expect(tc.arguments).to eq({ "city" => "New York" })
+ end
+ end
+
+ context "with mixed text + tool call response" do
+ before do
+ stub_request(:post, "https://api.githubcopilot.com/responses")
+ .to_return(
+ status: 200,
+ body: JSON.generate(responses_body(
+ text: "Let me check.",
+ tool_calls: [{ id: "call_def", name: "search", arguments: { "q" => "test" } }]
+ )),
+ headers: { "Content-Type" => "application/json" }
+ )
+ end
+
+ it "returns both content and tool_calls" do
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Search")]
+ response = responses_adapter.chat(messages)
+
+ expect(response.content).to eq("Let me check.")
+ expect(response.tool_calls.size).to eq(1)
+ expect(response.stop_reason).to eq(:tool_use)
+ end
+ end
+
+ context "request body shape" do
+ it "sends `input` not `messages`" do
+ stub = stub_request(:post, "https://api.githubcopilot.com/responses")
+ .with do |req|
+ body = JSON.parse(req.body)
+ body.key?("input") && !body.key?("messages")
+ end
+ .to_return(
+ status: 200,
+ body: JSON.generate(responses_body(text: "ok")),
+ headers: { "Content-Type" => "application/json" }
+ )
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")]
+ responses_adapter.chat(messages)
+
+ expect(stub).to have_been_requested
+ end
+
+ it "sends `max_output_tokens` not `max_tokens`" do
+ stub = stub_request(:post, "https://api.githubcopilot.com/responses")
+ .with do |req|
+ body = JSON.parse(req.body)
+ body["max_output_tokens"] == 4096 &&
+ !body.key?("max_tokens") &&
+ !body.key?("max_completion_tokens")
+ end
+ .to_return(
+ status: 200,
+ body: JSON.generate(responses_body(text: "ok")),
+ headers: { "Content-Type" => "application/json" }
+ )
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")]
+ responses_adapter.chat(messages)
+
+ expect(stub).to have_been_requested
+ end
+
+ it "sends `reasoning: {effort:}` not `reasoning_effort`" do
+ stub = stub_request(:post, "https://api.githubcopilot.com/responses")
+ .with do |req|
+ body = JSON.parse(req.body)
+ body["reasoning"] == { "effort" => "medium" } && !body.key?("reasoning_effort")
+ end
+ .to_return(
+ status: 200,
+ body: JSON.generate(responses_body(text: "ok")),
+ headers: { "Content-Type" => "application/json" }
+ )
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")]
+ responses_adapter.chat(messages)
+
+ expect(stub).to have_been_requested
+ end
+
+ it "sends tools without the `function` wrapper" do
+ tool = Dispatch::Adapter::ToolDefinition.new(
+ name: "get_weather",
+ description: "Get weather",
+ parameters: { "type" => "object", "properties" => { "city" => { "type" => "string" } } }
+ )
+
+ stub = stub_request(:post, "https://api.githubcopilot.com/responses")
+ .with do |req|
+ body = JSON.parse(req.body)
+ t = body["tools"]&.first
+ t && t["type"] == "function" &&
+ t["name"] == "get_weather" &&
+ t["description"] == "Get weather" &&
+ !t.key?("function")
+ end
+ .to_return(
+ status: 200,
+ body: JSON.generate(responses_body(text: "ok")),
+ headers: { "Content-Type" => "application/json" }
+ )
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "weather?")]
+ responses_adapter.chat(messages, tools: [tool])
+
+ expect(stub).to have_been_requested
+ end
+
+ it "converts tool results to function_call_output items in input" do
+ tool_use = Dispatch::Adapter::ToolUseBlock.new(
+ id: "call_1", name: "search", arguments: { "q" => "ruby" }
+ )
+ tool_result = Dispatch::Adapter::ToolResultBlock.new(
+ tool_use_id: "call_1", content: "some results"
+ )
+
+ messages = [
+ Dispatch::Adapter::Message.new(role: "user", content: "search ruby"),
+ Dispatch::Adapter::Message.new(role: "assistant", content: [tool_use]),
+ Dispatch::Adapter::Message.new(role: "user", content: [tool_result])
+ ]
+
+ stub = stub_request(:post, "https://api.githubcopilot.com/responses")
+ .with do |req|
+ body = JSON.parse(req.body)
+ input = body["input"]
+ fc = input.find { |i| i["type"] == "function_call" }
+ fco = input.find { |i| i["type"] == "function_call_output" }
+ fc && fc["call_id"] == "call_1" && fc["name"] == "search" &&
+ fco && fco["call_id"] == "call_1" && fco["output"] == "some results"
+ end
+ .to_return(
+ status: 200,
+ body: JSON.generate(responses_body(text: "done")),
+ headers: { "Content-Type" => "application/json" }
+ )
+
+ responses_adapter.chat(messages)
+
+ expect(stub).to have_been_requested
+ end
+ end
+
+ context "with system: parameter" do
+ it "prepends system item at start of input array" do
+ stub = stub_request(:post, "https://api.githubcopilot.com/responses")
+ .with do |req|
+ body = JSON.parse(req.body)
+ body["input"].first == { "role" => "system", "content" => "Be concise." }
+ end
+ .to_return(
+ status: 200,
+ body: JSON.generate(responses_body(text: "ok")),
+ headers: { "Content-Type" => "application/json" }
+ )
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")]
+ responses_adapter.chat(messages, system: "Be concise.")
+
+ expect(stub).to have_been_requested
+ end
+ end
+
+ context "X-Initiator header" do
+ let(:ok_resp) do
+ {
+ status: 200,
+ body: JSON.generate(responses_body(text: "ok")),
+ headers: { "Content-Type" => "application/json" }
+ }
+ end
+
+ it "sends X-Initiator: user for a fresh user message" do
+ stub = stub_request(:post, "https://api.githubcopilot.com/responses")
+ .with(headers: { "X-Initiator" => "user" })
+ .to_return(**ok_resp)
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")]
+ responses_adapter.chat(messages)
+
+ expect(stub).to have_been_requested
+ end
+
+ it "sends X-Initiator: agent when tool results are present" do
+ tool_use = Dispatch::Adapter::ToolUseBlock.new(id: "c1", name: "fn", arguments: {})
+ tool_result = Dispatch::Adapter::ToolResultBlock.new(tool_use_id: "c1", content: "res")
+
+ messages = [
+ Dispatch::Adapter::Message.new(role: "user", content: "go"),
+ Dispatch::Adapter::Message.new(role: "assistant", content: [tool_use]),
+ Dispatch::Adapter::Message.new(role: "user", content: [tool_result])
+ ]
+
+ stub = stub_request(:post, "https://api.githubcopilot.com/responses")
+ .with(headers: { "X-Initiator" => "agent" })
+ .to_return(**ok_resp)
+
+ responses_adapter.chat(messages)
+
+ expect(stub).to have_been_requested
+ end
+ end
+
+ context "error mapping" do
+ it "maps 400 to RequestError (the error gpt-5.4 would give on wrong endpoint)" do
+ stub_request(:post, "https://api.githubcopilot.com/responses")
+ .to_return(status: 400, body: JSON.generate({ "error" => { "message" => "bad request" } }))
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")]
+ expect { responses_adapter.chat(messages) }.to raise_error(Dispatch::Adapter::RequestError)
+ end
+ end
+
+ context "streaming" do
+ def sse_events(*events)
+ all = events.map { |e| "data: #{JSON.generate(e)}\n\n" }
+ all << "data: [DONE]\n\n"
+ all.join
+ end
+
+ it "yields text StreamDeltas and returns Response" do
+ body = sse_events(
+ { "type" => "response.output_item.added", "output_index" => 0,
+ "item" => { "type" => "message", "id" => "msg_001", "role" => "assistant", "content" => [] } },
+ { "type" => "response.output_text.delta", "item_id" => "msg_001",
+ "output_index" => 0, "content_index" => 0, "delta" => "Hello" },
+ { "type" => "response.output_text.delta", "item_id" => "msg_001",
+ "output_index" => 0, "content_index" => 0, "delta" => " world" },
+ { "type" => "response.completed",
+ "response" => { "model" => "gpt-5.4",
+ "usage" => { "input_tokens" => 10, "output_tokens" => 2, "total_tokens" => 12 } } }
+ )
+
+ stub_request(:post, "https://api.githubcopilot.com/responses")
+ .with { |req| JSON.parse(req.body)["stream"] == true }
+ .to_return(status: 200, body: body, headers: { "Content-Type" => "text/event-stream" })
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")]
+ deltas = []
+ response = responses_adapter.chat(messages, stream: true) { |d| deltas << d }
+
+ text_deltas = deltas.select { |d| d.type == :text_delta }
+ expect(text_deltas.size).to eq(2)
+ expect(text_deltas[0].text).to eq("Hello")
+ expect(text_deltas[1].text).to eq(" world")
+
+ expect(response).to be_a(Dispatch::Adapter::Response)
+ expect(response.content).to eq("Hello world")
+ expect(response.stop_reason).to eq(:end_turn)
+ expect(response.model).to eq("gpt-5.4")
+ expect(response.usage.input_tokens).to eq(10)
+ expect(response.usage.output_tokens).to eq(2)
+ end
+
+ it "yields tool_use_start and tool_use_delta for function call streams" do
+ body = sse_events(
+ { "type" => "response.output_item.added", "output_index" => 0,
+ "item" => { "type" => "function_call", "id" => "fc_001", "call_id" => "call_001",
+ "name" => "get_weather" } },
+ { "type" => "response.function_call_arguments.delta",
+ "item_id" => "fc_001", "output_index" => 0, "delta" => "{\"city\":" },
+ { "type" => "response.function_call_arguments.delta",
+ "item_id" => "fc_001", "output_index" => 0, "delta" => "\"NYC\"}" },
+ { "type" => "response.completed",
+ "response" => { "model" => "gpt-5.4",
+ "usage" => { "input_tokens" => 15, "output_tokens" => 8, "total_tokens" => 23 } } }
+ )
+
+ stub_request(:post, "https://api.githubcopilot.com/responses")
+ .to_return(status: 200, body: body, headers: { "Content-Type" => "text/event-stream" })
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "weather?")]
+ deltas = []
+ response = responses_adapter.chat(messages, stream: true) { |d| deltas << d }
+
+ starts = deltas.select { |d| d.type == :tool_use_start }
+ arg_deltas = deltas.select { |d| d.type == :tool_use_delta }
+
+ expect(starts.size).to eq(1)
+ expect(starts.first.tool_call_id).to eq("call_001")
+ expect(starts.first.tool_name).to eq("get_weather")
+
+ expect(arg_deltas.size).to eq(2)
+ expect(arg_deltas[0].argument_delta).to eq("{\"city\":")
+ expect(arg_deltas[1].argument_delta).to eq("\"NYC\"}")
+
+ expect(response.stop_reason).to eq(:tool_use)
+ expect(response.tool_calls.size).to eq(1)
+ expect(response.tool_calls.first.name).to eq("get_weather")
+ expect(response.tool_calls.first.arguments).to eq({ "city" => "NYC" })
+ expect(response.usage.input_tokens).to eq(15)
+ expect(response.usage.output_tokens).to eq(8)
+ end
+
+ it "sends stream: true in the request body" do
+ body = sse_events(
+ { "type" => "response.completed",
+ "response" => { "model" => "gpt-5.4", "usage" => { "input_tokens" => 5, "output_tokens" => 1 } } }
+ )
+
+ stub = stub_request(:post, "https://api.githubcopilot.com/responses")
+ .with { |req| JSON.parse(req.body)["stream"] == true }
+ .to_return(status: 200, body: body, headers: { "Content-Type" => "text/event-stream" })
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")]
+ responses_adapter.chat(messages, stream: true) { |_d| }
+
+ expect(stub).to have_been_requested
+ end
+
+ it "returns nil content when there are no text deltas" do
+ body = sse_events(
+ { "type" => "response.output_item.added", "output_index" => 0,
+ "item" => { "type" => "function_call", "id" => "fc_001", "call_id" => "call_001", "name" => "fn" } },
+ { "type" => "response.function_call_arguments.delta",
+ "item_id" => "fc_001", "output_index" => 0, "delta" => "{}" },
+ { "type" => "response.completed",
+ "response" => { "model" => "gpt-5.4", "usage" => { "input_tokens" => 5, "output_tokens" => 1 } } }
+ )
+
+ stub_request(:post, "https://api.githubcopilot.com/responses")
+ .to_return(status: 200, body: body, headers: { "Content-Type" => "text/event-stream" })
+
+ messages = [Dispatch::Adapter::Message.new(role: "user", content: "do it")]
+ response = responses_adapter.chat(messages, stream: true) { |_d| }
+
+ expect(response.content).to be_nil
+ expect(response.tool_calls.size).to eq(1)
+ end
+ end
+ end
end