diff options
| -rw-r--r-- | .rubocop.yml | 4 | ||||
| -rw-r--r-- | ENDPOINT_ROUTING.md | 189 | ||||
| -rw-r--r-- | GPT5_RESPONSES_API.md | 68 | ||||
| -rw-r--r-- | Gemfile.lock | 4 | ||||
| -rw-r--r-- | dispatch-adapter-copilot.gemspec | 3 | ||||
| -rw-r--r-- | lib/dispatch/adapter/copilot.rb | 386 | ||||
| -rw-r--r-- | spec/dispatch/adapter/copilot_spec.rb | 430 |
7 files changed, 1062 insertions, 22 deletions
diff --git a/.rubocop.yml b/.rubocop.yml index ff78dd3..cc23375 100644 --- a/.rubocop.yml +++ b/.rubocop.yml @@ -47,3 +47,7 @@ Style/Documentation: Style/RedundantStructKeywordInit: Enabled: false + +Lint/EmptyBlock: + Exclude: + - "spec/**/*" diff --git a/ENDPOINT_ROUTING.md b/ENDPOINT_ROUTING.md new file mode 100644 index 0000000..e4ab523 --- /dev/null +++ b/ENDPOINT_ROUTING.md @@ -0,0 +1,189 @@ +# Endpoint Routing — How the adapter picks `/v1/chat/completions` vs `/v1/responses` + +> **TL;DR** The decision is made by a single regex against the model id string. +> No capability discovery, no flag, no per-request override. + +## The decision + +The full routing logic lives in **one method**: + +`lib/dispatch/adapter/copilot.rb` + +```ruby +# Returns true when the selected model requires the /v1/responses endpoint. +# This applies to GPT-5 reasoning models. These models reject tool calls on +# /v1/chat/completions and return a 400 RequestError directing callers to +# use /v1/responses instead. +def uses_responses_api? + @model.match?(/\Agpt-5/) +end +``` + +`\A` anchors at the start of the string, so any model id whose name begins +with the literal `gpt-5` (case-sensitive) is routed to the Responses API. +Everything else goes to Chat Completions. + +The check is invoked once per `#chat` call: + +```ruby +# lib/dispatch/adapter/copilot.rb (inside #chat) +if uses_responses_api? + if stream + chat_streaming_responses(...) # POST /v1/responses (SSE) + else + chat_non_streaming_responses(...) # POST /v1/responses + end +else + # build chat-completions body + if stream + chat_streaming(...) # POST /v1/chat/completions (SSE) + else + chat_non_streaming(...) # POST /v1/chat/completions + end +end +``` + +The four code paths are: + +| Path | Method | Endpoint | Streamed? | +|---|---|---|---| +| Responses, streaming | `chat_streaming_responses` | `POST /v1/responses` | yes | +| Responses, blocking | `chat_non_streaming_responses` | `POST /v1/responses` | no | +| Chat, streaming | `chat_streaming` | `POST /v1/chat/completions` | yes | +| Chat, blocking | `chat_non_streaming` | `POST /v1/chat/completions` | no | + +All four live in `lib/dispatch/adapter/copilot.rb`. + +## Body-shape differences (what the adapter rewrites silently) + +| Concept | `/v1/chat/completions` body | `/v1/responses` body | +|---|---|---| +| Conversation | `messages: [...]` | `input: [...]` | +| Token cap | `max_tokens` (or `max_completion_tokens` on o*/gpt-5/gemini) | `max_output_tokens` | +| Reasoning effort | `reasoning_effort: "high"` | `reasoning: { effort: "high" }` | +| Tool definition | `{ type: "function", function: { name, description, parameters } }` | `{ type: "function", name, description, parameters }` (no `function:` wrapper) | + +These transforms are handled inside the adapter — callers always pass the +same `Dispatch::Adapter::ToolDefinition` / `Dispatch::Adapter::Message` +structs and the same `thinking:` keyword. + +## Current model list and routing + +Source: `reference/models.txt` (lives one level up from this gem, in the +parent `update-adapters/` workspace; format is `model_id,premium_multiplier`). + +| Model id | Premium multiplier | `\Agpt-5` match? | Endpoint | +|---|---|---|---| +| gpt-4.1 | 0.0 | ❌ | `/v1/chat/completions` | +| gpt-4o | 0.0 | ❌ | `/v1/chat/completions` | +| gpt-5-mini | 0.0 | ✅ | `/v1/responses` | +| oswe-vscode-prime | 0.0 | ❌ | `/v1/chat/completions` | +| grok-code-fast-1 | 0.25 | ❌ | `/v1/chat/completions` | +| claude-haiku-4.5 | 0.33 | ❌ | `/v1/chat/completions` | +| gemini-3-flash-preview | 0.33 | ❌ | `/v1/chat/completions` | +| gpt-5.4-mini | 0.33 | ✅ | `/v1/responses` | +| claude-sonnet-4 | 1.0 | ❌ | `/v1/chat/completions` | +| claude-sonnet-4.5 | 1.0 | ❌ | `/v1/chat/completions` | +| claude-sonnet-4.6 | 1.0 | ❌ | `/v1/chat/completions` | +| gemini-2.5-pro | 1.0 | ❌ | `/v1/chat/completions` | +| gemini-3.1-pro-preview | 1.0 | ❌ | `/v1/chat/completions` | +| gpt-5.2 | 1.0 | ✅ | `/v1/responses` | +| gpt-5.2-codex | 1.0 | ✅ | `/v1/responses` | +| gpt-5.3-codex | 1.0 | ✅ | `/v1/responses` | +| gpt-5.4 | 1.0 | ✅ | `/v1/responses` | +| claude-opus-4.7 | 7.5 | ❌ | `/v1/chat/completions` | +| gpt-5.5 | 7.5 | ✅ | `/v1/responses` | + +## Why a regex and not capability discovery? + +`GET https://api.githubcopilot.com/models` does NOT return a field that +indicates which endpoint a given model accepts. A typical entry looks like: + +```json +{ + "id": "claude-3.7-sonnet", + "vendor": "Anthropic", + "model_picker_enabled": true, + "policy": { "state": "enabled" }, + "capabilities": { + "family": "claude-3.7-sonnet", + "type": "chat", + "tokenizer": "o200k_base", + "limits": { "max_context_window_tokens": 200000, "max_output_tokens": 8192, "max_prompt_tokens": 90000 }, + "supports": { "streaming": true, "tool_calls": true, "parallel_tool_calls": true, "vision": true } + } +} +``` + +There is no `endpoints`, `api`, `responses_api`, or `chat_completions` +flag. The signal that a model needs `/v1/responses` is the **400 error +string** Copilot returns when you send tools + reasoning_effort to +`/v1/chat/completions` for a GPT-5 family model: + +``` +Function tools with reasoning_effort are not supported for gpt-5.4 in +/v1/chat/completions. Please use /v1/responses instead. +``` + +Hence the hardcoded `/\Agpt-5/` heuristic. See +`GPT5_RESPONSES_API.md` for the original problem statement. + +## How to update this when GitHub adds new models + +When GitHub Copilot adds a new model that requires `/v1/responses`: + +1. **Edit the regex** in + `lib/dispatch/adapter/copilot.rb` at the `uses_responses_api?` method. + Add the new family to the alternation, e.g.: + + ```ruby + def uses_responses_api? + @model.match?(/\A(?:gpt-5|gpt-6|codex-6|o5)/) + end + ``` + +2. **Update the test expectations** in + `spec/dispatch/adapter/copilot_spec.rb`. Search for `uses_responses_api` + and `/\Agpt-5/` to find the relevant examples; both positive (a model + that should match) and negative (a model that shouldn't) cases need + updating. + +3. **Update the table above** in this file + (`ENDPOINT_ROUTING.md`) so the documented routing matches the code. + +4. **Update `reference/models.txt`** in the parent workspace if you also + want the new model listed for build/test scripts. + +5. **Bump the gem version** in + `lib/dispatch/adapter/version.rb` (minor bump for new model support, + patch for a regex tweak that just fixes routing for an existing + misclassified model). + +6. **Run the test gate** from inside this gem: + ```bash + bundle exec rubocop --autocorrect-all + bundle exec rspec + ``` + Both must exit 0. + +## Alternative: probe-and-fallback (not currently implemented) + +A more durable design would catch the specific 400 error string from +`/v1/chat/completions`, cache the offending model id, and retransmit on +`/v1/responses`. Pros: zero hardcoded list. Cons: adds latency on the +first request per new model per process and depends on the upstream +error wording staying stable. The probe must include a tool definition +to be reliable — sending a tool-less request to `/v1/chat/completions` +will succeed for some GPT-5 variants and only the tools+reasoning combo +triggers the rejection. + +## File reference (everything routing-related) + +| Path | What it contains | +|---|---| +| `lib/dispatch/adapter/copilot.rb` | `uses_responses_api?` (the regex), the `chat` dispatcher, all four code paths, body builders for both endpoints | +| `lib/dispatch/adapter/version.rb` | Gem version constant | +| `spec/dispatch/adapter/copilot_spec.rb` | Tests for both endpoint paths and the routing predicate | +| `GPT5_RESPONSES_API.md` | Original problem statement — the 400 error from Copilot | +| `ENDPOINT_ROUTING.md` | This file | +| `../models.txt` | Workspace-level list of model ids and premium multipliers | diff --git a/GPT5_RESPONSES_API.md b/GPT5_RESPONSES_API.md new file mode 100644 index 0000000..8dec8cf --- /dev/null +++ b/GPT5_RESPONSES_API.md @@ -0,0 +1,68 @@ +# GPT-5.4 + Tool Calls — Requires `/v1/responses` API + +## Problem + +When `build.rb` selects the `gpt-5.4` model and sends a request with tool +definitions, the Copilot API responds with: + +``` +Dispatch::Adapter::RequestError: Function tools with reasoning_effort are +not supported for gpt-5.4 in /v1/chat/completions. Please use /v1/responses instead. +``` + +`dispatch-adapter-copilot` currently targets `/v1/chat/completions` for all +models. GPT-5.4 is a reasoning model that requires the newer `/v1/responses` +endpoint when tool calls are involved. + +--- + +## Background + +### `/v1/chat/completions` +OpenAI's original chat API. Stateless: you send the full `messages` history, +get back `choices`. Tool calls via `tools` + `tool_calls` are supported. Works +for all models up to GPT-4o. + +### `/v1/responses` +Introduced for reasoning models (o1, o3, GPT-5+). Key differences: + +- Uses `input` instead of `messages` for the conversation history. +- Exposes a `reasoning_effort` parameter (`low` / `medium` / `high`). +- Optionally stateful via `previous_response_id` (server keeps history). +- **Required** for tool use on reasoning/GPT-5 models — OpenAI removed + function-call support from Chat Completions for these models. + +GPT-5.4 was added to the GitHub Copilot model catalog but brings the +Responses API requirement with it. The adapter was written before this model +existed, so it has no Responses API support. + +--- + +## What Needs to Be Done + +To support GPT-5.4 (and future reasoning models) with tool calls: + +1. **Detect reasoning models** — identify which model IDs require the + Responses API (e.g. anything matching `gpt-5.*` or carrying a + `reasoning` capability flag in the `/models` response). + +2. **Implement a Responses API code path** in `dispatch-adapter-copilot`: + - Endpoint: `POST /v1/responses` (not `/v1/chat/completions`). + - Request shape: `input` array instead of `messages`. + - Response shape: different structure — parse accordingly. + - Map `Dispatch::Adapter` tool definitions and result blocks to the + Responses API format. + - Handle `reasoning_effort` (expose as an adapter option or auto-set + to `medium`). + +3. **Route per model** — the adapter should check the model ID and choose + the correct endpoint at request time, keeping Chat Completions for all + non-reasoning models. + +--- + +## Workaround (until implemented) + +Use `sonnet-4.6` instead of `gpt-5.4` in `build.rb`'s interactive menu. +Claude Sonnet 4.6 (routed via Copilot's `/v1/chat/completions`) fully +supports tool calls and has no Responses API requirement. diff --git a/Gemfile.lock b/Gemfile.lock index f7595b0..f3aa5a6 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -1,7 +1,7 @@ PATH remote: ../dispatch-adapter-interface specs: - dispatch-adapter-interface (0.2.0) + dispatch-adapter-interface (0.3.0) PATH remote: . @@ -114,7 +114,7 @@ CHECKSUMS date (3.5.1) sha256=750d06384d7b9c15d562c76291407d89e368dda4d4fff957eb94962d325a0dc0 diff-lcs (1.6.2) sha256=9ae0d2cba7d4df3075fe8cd8602a8604993efc0dfa934cff568969efb1909962 dispatch-adapter-copilot (0.4.0) - dispatch-adapter-interface (0.2.0) + dispatch-adapter-interface (0.3.0) erb (6.0.2) sha256=9fe6264d44f79422c87490a1558479bd0e7dad4dd0e317656e67ea3077b5242b hashdiff (1.2.1) sha256=9c079dbc513dfc8833ab59c0c2d8f230fa28499cc5efb4b8dd276cf931457cd1 io-console (0.8.2) sha256=d6e3ae7a7cc7574f4b8893b4fca2162e57a825b223a177b7afa236c5ef9814cc diff --git a/dispatch-adapter-copilot.gemspec b/dispatch-adapter-copilot.gemspec index 4dbdb1e..2ecf345 100644 --- a/dispatch-adapter-copilot.gemspec +++ b/dispatch-adapter-copilot.gemspec @@ -25,7 +25,8 @@ Gem::Specification.new do |spec| (f == gemspec) || f.start_with?(*%w[bin/ Gemfile .gitignore .rspec spec/ .rubocop.yml]) end - end.select { |f| File.exist?(File.join(__dir__, f)) } + end + spec.files.select! { |f| File.exist?(File.join(__dir__, f)) } spec.bindir = "exe" spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) } spec.require_paths = ["lib"] diff --git a/lib/dispatch/adapter/copilot.rb b/lib/dispatch/adapter/copilot.rb index 7355df8..445adff 100644 --- a/lib/dispatch/adapter/copilot.rb +++ b/lib/dispatch/adapter/copilot.rb @@ -82,29 +82,38 @@ module Dispatch def chat(messages, system: nil, tools: [], stream: false, max_tokens: nil, thinking: :default, &) ensure_authenticated! - wire_messages = build_wire_messages(messages, system) - wire_tools = build_wire_tools(tools) effective_max_tokens = max_tokens || @default_max_tokens effective_thinking = thinking == :default ? @default_thinking : thinking validate_thinking_level!(effective_thinking) - body = { - model: @model, - messages: wire_messages, - stream: stream - } - if uses_max_completion_tokens? - body[:max_completion_tokens] = effective_max_tokens + if uses_responses_api? + if stream + chat_streaming_responses(messages, system, tools, effective_max_tokens, effective_thinking, &) + else + chat_non_streaming_responses(messages, system, tools, effective_max_tokens, effective_thinking) + end else - body[:max_tokens] = effective_max_tokens - end - body[:tools] = wire_tools unless wire_tools.empty? - body[:reasoning_effort] = effective_thinking if effective_thinking + wire_messages = build_wire_messages(messages, system) + wire_tools = build_wire_tools(tools) - if stream - chat_streaming(body, &) - else - chat_non_streaming(body) + body = { + model: @model, + messages: wire_messages, + stream: stream + } + if uses_max_completion_tokens? + body[:max_completion_tokens] = effective_max_tokens + else + body[:max_tokens] = effective_max_tokens + end + body[:tools] = wire_tools unless wire_tools.empty? + body[:reasoning_effort] = effective_thinking if effective_thinking + + if stream + chat_streaming(body, &) + else + chat_non_streaming(body) + end end end @@ -189,6 +198,14 @@ module Dispatch @model.match?(/o[1-9]|gpt-5|gemini/) end + # Returns true when the selected model requires the /v1/responses endpoint. + # This applies to GPT-5 reasoning models. These models reject tool calls on + # /v1/chat/completions and return a 400 RequestError directing callers to + # use /v1/responses instead. + def uses_responses_api? + @model.match?(/\Agpt-5/) + end + def default_token_path File.join(Dir.home, ".config", "dispatch", "copilot_github_token") end @@ -440,6 +457,92 @@ module Dispatch merge_consecutive_roles(wire) end + # Converts canonical messages to the flat `input` array required by + # POST /v1/responses. System prompt is prepended as a system-role item. + # The Responses API does not support a top-level `system` parameter — + # the system message must be the first element of `input`. + def build_responses_api_input(messages, system) + input = [] + input << { role: "system", content: system } if system + + messages.each do |msg| + input.concat(convert_message_to_responses_input(msg)) + end + + input + end + + # Converts a single canonical Message to one or more Responses API input + # items. Returns an Array (always) so results can be flat-concatenated. + def convert_message_to_responses_input(msg) + case msg.content + when String + [{ role: msg.role, content: msg.content }] + when Array + convert_content_blocks_to_responses_input(msg) + else + [{ role: msg.role, content: msg.content.to_s }] + end + end + + # Converts an array of content blocks (TextBlock, ToolUseBlock, + # ToolResultBlock) from a single Message into Responses API input items. + # + # Key differences from the Chat Completions conversion: + # - ToolUseBlock → top-level {type: "function_call", ...} item (not nested + # under an assistant message role) + # - ToolResultBlock → top-level {type: "function_call_output", ...} item + # - TextBlock in assistant message → {role: "assistant", content: [{type: + # "output_text", text: "..."}]} + def convert_content_blocks_to_responses_input(msg) + items = [] + text_parts = [] + + msg.content.each do |block| + case block + when TextBlock + text_parts << block.text + when ImageBlock + raise NotImplementedError, "ImageBlock is not yet supported by the Copilot adapter" + when ToolUseBlock + # Flush any accumulated text first as an assistant message + unless text_parts.empty? + items << { + role: "assistant", + content: [{ type: "output_text", text: text_parts.join("\n") }] + } + text_parts = [] + end + items << { + type: "function_call", + call_id: block.id, + name: block.name, + arguments: JSON.generate(block.arguments) + } + when ToolResultBlock + items << { + type: "function_call_output", + call_id: block.tool_use_id, + output: tool_result_content(block) + } + end + end + + # Flush any remaining text + unless text_parts.empty? + items << if msg.role == "assistant" + { + role: "assistant", + content: [{ type: "output_text", text: text_parts.join("\n") }] + } + else + { role: msg.role, content: text_parts.join("\n") } + end + end + + items + end + def convert_message(msg) case msg.content when String @@ -544,6 +647,45 @@ module Dispatch end end + # Assembles the full request body for POST /v1/responses. + # + # Key differences from the Chat Completions body: + # - Uses `input` instead of `messages`. + # - Uses `max_output_tokens` instead of `max_tokens`/`max_completion_tokens`. + # - Uses `reasoning: {effort:}` instead of `reasoning_effort`. + # - Tool definitions omit the `function` wrapper — name/description/parameters + # are top-level inside the tool object. + def build_responses_api_body(messages, system, tools, stream, max_tokens, thinking) + input = build_responses_api_input(messages, system) + wire_tools = build_responses_api_tools(tools) + + body = { + model: @model, + input: input, + stream: stream, + max_output_tokens: max_tokens + } + + body[:tools] = wire_tools unless wire_tools.empty? + body[:reasoning] = { effort: thinking } if thinking + + body + end + + # Converts ToolDefinition structs (or plain hashes) to the Responses API + # tool format. Unlike Chat Completions, there is no `function` wrapper — + # name, description, and parameters are direct keys on the tool object. + def build_responses_api_tools(tools) + tools.map do |td| + { + type: "function", + name: tool_attr(td, :name), + description: tool_attr(td, :description), + parameters: tool_attr(td, :parameters) + } + end + end + # --- Chat (non-streaming) --- def chat_non_streaming(body) @@ -603,6 +745,78 @@ module Dispatch ) end + # Non-streaming chat via POST /v1/responses. + # Called when uses_responses_api? is true and stream is false. + def chat_non_streaming_responses(messages, system, tools, max_tokens, thinking) + @rate_limiter.wait! + body = build_responses_api_body(messages, system, tools, false, max_tokens, thinking) + wire_messages = build_responses_api_input(messages, system) + + uri = URI("#{API_BASE}/responses") + request = Net::HTTP::Post.new(uri) + apply_headers!(request, initiator: x_initiator_for_responses(wire_messages)) + request.body = JSON.generate(deep_utf8(body)) + + response = execute_request(uri, request) + data = parse_response!(response) + build_response_from_responses_api(data) + end + + # Builds a canonical Response from a /v1/responses non-streaming body. + def build_response_from_responses_api(data) + output = data["output"] || [] + text_parts = [] + tool_calls = [] + + output.each do |item| + case item["type"] + when "message" + (item["content"] || []).each do |part| + text_parts << part["text"] if part["type"] == "output_text" && part["text"] + end + when "function_call" + tool_calls << ToolUseBlock.new( + id: item["call_id"] || item["id"], + name: item["name"], + arguments: parse_tool_arguments(item["arguments"]) + ) + end + end + + stop_reason = tool_calls.any? ? :tool_use : :end_turn + content = text_parts.empty? ? nil : text_parts.join + + usage_data = data["usage"] || {} + usage = Usage.new( + input_tokens: usage_data["input_tokens"] || 0, + output_tokens: usage_data["output_tokens"] || 0 + ) + + Response.new( + content: content, + tool_calls: tool_calls, + model: data["model"] || @model, + stop_reason: stop_reason, + usage: usage + ) + end + + # Determines X-Initiator for a Responses API call. + # Same logic as x_initiator_for but operates on the already-built `input` + # array where items use `type: "function_call"` / `type: "function_call_output"` + # instead of role-based items. + def x_initiator_for_responses(input_items) + if input_items.any? do |item| + item[:role].to_s == "assistant" || + item[:type].to_s == "function_call" || + item[:type].to_s == "function_call_output" + end + "agent" + else + "user" + end + end + # Recursively coerces every String inside a wire-body to valid UTF-8. # # Tool results (grep output, file reads, shell stdout) frequently arrive @@ -625,8 +839,6 @@ module Dispatch obj.map { |v| deep_utf8(v) } when Hash obj.each_with_object({}) { |(k, v), h| h[k] = deep_utf8(v) } - when Symbol - obj else obj end @@ -772,6 +984,142 @@ module Dispatch ) ) end + + # Streaming chat via POST /v1/responses. + # Called when uses_responses_api? is true and stream is true. + def chat_streaming_responses(messages, system, tools, max_tokens, thinking, &block) + @rate_limiter.wait! + body = build_responses_api_body(messages, system, tools, true, max_tokens, thinking) + wire_input = build_responses_api_input(messages, system) + + uri = URI("#{API_BASE}/responses") + request = Net::HTTP::Post.new(uri) + apply_headers!(request, initiator: x_initiator_for_responses(wire_input)) + request.body = JSON.generate(deep_utf8(body)) + + collector = new_responses_stream_collector + + execute_streaming_request(uri, request) do |response| + buffer = +"" + response.read_body do |chunk| + buffer << chunk + process_responses_sse_buffer(buffer, collector, &block) + end + end + + build_streaming_response_from_responses(collector) + end + + def new_responses_stream_collector + { + # text_parts: Hash<output_index => String> — accumulated text fragments + text_parts: Hash.new { |h, k| h[k] = +"" }, + # tool_calls: Hash<item_id => {call_id:, name:, arguments:}> + tool_calls: {}, + # order: Array of [:text, output_index] or [:tool, item_id] in appearance order + order: [], + model: @model, + input_tokens: 0, + output_tokens: 0 + } + end + + def process_responses_sse_buffer(buffer, collector, &) + while (line_end = buffer.index("\n")) + line = buffer.slice!(0..line_end).strip + next if line.empty? + next unless line.start_with?("data: ") + + data_str = line.delete_prefix("data: ") + next if data_str == "[DONE]" + + data = JSON.parse(data_str) + process_responses_stream_event(data, collector, &) + end + rescue JSON::ParserError + nil + end + + def process_responses_stream_event(data, collector, &block) + case data["type"] + when "response.output_item.added" + handle_responses_output_item_added(data, collector, &block) + when "response.output_text.delta" + output_index = data["output_index"] || 0 + fragment = data["delta"].to_s + collector[:text_parts][output_index] << fragment + block.call(StreamDelta.new(type: :text_delta, text: fragment)) + when "response.function_call_arguments.delta" + handle_responses_arguments_delta(data, collector, &block) + when "response.completed" + usage = data.dig("response", "usage") || {} + collector[:input_tokens] = usage["input_tokens"] || collector[:input_tokens] + collector[:output_tokens] = usage["output_tokens"] || collector[:output_tokens] + model = data.dig("response", "model") + collector[:model] = model if model + end + end + + def handle_responses_output_item_added(data, collector, &block) + item = data["item"] || {} + case item["type"] + when "function_call" + item_id = item["id"] + collector[:tool_calls][item_id] = { + call_id: item["call_id"] || item_id, + name: item["name"] || "", + arguments: +"" + } + collector[:order] << [:tool, item_id] + block.call(StreamDelta.new( + type: :tool_use_start, + tool_call_id: item["call_id"] || item_id, + tool_name: item["name"] || "" + )) + when "message" + output_index = data["output_index"] || 0 + collector[:order] << [:text, output_index] unless collector[:order].any? { |t, i| t == :text && i == output_index } + end + end + + def handle_responses_arguments_delta(data, collector, &block) + item_id = data["item_id"] + fragment = data["delta"].to_s + tc = collector[:tool_calls][item_id] + return unless tc + + tc[:arguments] << fragment + block.call(StreamDelta.new( + type: :tool_use_delta, + tool_call_id: tc[:call_id], + argument_delta: fragment + )) + end + + def build_streaming_response_from_responses(collector) + tool_calls = collector[:tool_calls].values.map do |tc| + ToolUseBlock.new( + id: tc[:call_id], + name: tc[:name], + arguments: parse_tool_arguments(tc[:arguments]) + ) + end + + all_text = collector[:text_parts].keys.sort.map { |idx| collector[:text_parts][idx] }.join + content = all_text.empty? ? nil : all_text + stop_reason = tool_calls.any? ? :tool_use : :end_turn + + Response.new( + content: content, + tool_calls: tool_calls, + model: collector[:model], + stop_reason: stop_reason, + usage: Usage.new( + input_tokens: collector[:input_tokens], + output_tokens: collector[:output_tokens] + ) + ) + end end end end diff --git a/spec/dispatch/adapter/copilot_spec.rb b/spec/dispatch/adapter/copilot_spec.rb index 15bdcb0..6f54fef 100644 --- a/spec/dispatch/adapter/copilot_spec.rb +++ b/spec/dispatch/adapter/copilot_spec.rb @@ -1588,4 +1588,434 @@ RSpec.describe Dispatch::Adapter::Copilot do expect { adapter.chat(messages) }.to raise_error(Dispatch::Adapter::ConnectionError) end end + + describe "#chat via /v1/responses (reasoning models)" do + let(:responses_adapter) do + described_class.new( + model: "gpt-5.4", + github_token: github_token, + max_tokens: 4096, + thinking: "medium", + min_request_interval: 0 + ) + end + + # Helper: build a well-formed /v1/responses non-streaming response body. + def responses_body(text: nil, tool_calls: [], model: "gpt-5.4", + input_tokens: 10, output_tokens: 5) + output = [] + unless text.nil? + output << { + "type" => "message", + "id" => "msg_001", + "role" => "assistant", + "content" => [{ "type" => "output_text", "text" => text }] + } + end + tool_calls.each do |tc| + output << { + "type" => "function_call", + "id" => tc[:id], + "call_id" => tc[:id], + "name" => tc[:name], + "arguments" => JSON.generate(tc[:arguments]) + } + end + { + "id" => "resp_001", + "object" => "response", + "model" => model, + "output" => output, + "usage" => { + "input_tokens" => input_tokens, + "output_tokens" => output_tokens, + "total_tokens" => input_tokens + output_tokens + } + } + end + + context "with a text-only response" do + before do + stub_request(:post, "https://api.githubcopilot.com/responses") + .to_return( + status: 200, + body: JSON.generate(responses_body(text: "Hello from GPT-5!")), + headers: { "Content-Type" => "application/json" } + ) + end + + it "returns a Response with content" do + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")] + response = responses_adapter.chat(messages) + + expect(response).to be_a(Dispatch::Adapter::Response) + expect(response.content).to eq("Hello from GPT-5!") + expect(response.tool_calls).to be_empty + expect(response.model).to eq("gpt-5.4") + expect(response.stop_reason).to eq(:end_turn) + expect(response.usage.input_tokens).to eq(10) + expect(response.usage.output_tokens).to eq(5) + end + end + + context "with a tool call response" do + before do + stub_request(:post, "https://api.githubcopilot.com/responses") + .to_return( + status: 200, + body: JSON.generate(responses_body( + tool_calls: [{ id: "call_abc", name: "get_weather", + arguments: { "city" => "New York" } }] + )), + headers: { "Content-Type" => "application/json" } + ) + end + + it "returns a Response with tool_calls as ToolUseBlock array" do + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Weather?")] + response = responses_adapter.chat(messages) + + expect(response.content).to be_nil + expect(response.stop_reason).to eq(:tool_use) + expect(response.tool_calls.size).to eq(1) + + tc = response.tool_calls.first + expect(tc).to be_a(Dispatch::Adapter::ToolUseBlock) + expect(tc.id).to eq("call_abc") + expect(tc.name).to eq("get_weather") + expect(tc.arguments).to eq({ "city" => "New York" }) + end + end + + context "with mixed text + tool call response" do + before do + stub_request(:post, "https://api.githubcopilot.com/responses") + .to_return( + status: 200, + body: JSON.generate(responses_body( + text: "Let me check.", + tool_calls: [{ id: "call_def", name: "search", arguments: { "q" => "test" } }] + )), + headers: { "Content-Type" => "application/json" } + ) + end + + it "returns both content and tool_calls" do + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Search")] + response = responses_adapter.chat(messages) + + expect(response.content).to eq("Let me check.") + expect(response.tool_calls.size).to eq(1) + expect(response.stop_reason).to eq(:tool_use) + end + end + + context "request body shape" do + it "sends `input` not `messages`" do + stub = stub_request(:post, "https://api.githubcopilot.com/responses") + .with do |req| + body = JSON.parse(req.body) + body.key?("input") && !body.key?("messages") + end + .to_return( + status: 200, + body: JSON.generate(responses_body(text: "ok")), + headers: { "Content-Type" => "application/json" } + ) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")] + responses_adapter.chat(messages) + + expect(stub).to have_been_requested + end + + it "sends `max_output_tokens` not `max_tokens`" do + stub = stub_request(:post, "https://api.githubcopilot.com/responses") + .with do |req| + body = JSON.parse(req.body) + body["max_output_tokens"] == 4096 && + !body.key?("max_tokens") && + !body.key?("max_completion_tokens") + end + .to_return( + status: 200, + body: JSON.generate(responses_body(text: "ok")), + headers: { "Content-Type" => "application/json" } + ) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")] + responses_adapter.chat(messages) + + expect(stub).to have_been_requested + end + + it "sends `reasoning: {effort:}` not `reasoning_effort`" do + stub = stub_request(:post, "https://api.githubcopilot.com/responses") + .with do |req| + body = JSON.parse(req.body) + body["reasoning"] == { "effort" => "medium" } && !body.key?("reasoning_effort") + end + .to_return( + status: 200, + body: JSON.generate(responses_body(text: "ok")), + headers: { "Content-Type" => "application/json" } + ) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")] + responses_adapter.chat(messages) + + expect(stub).to have_been_requested + end + + it "sends tools without the `function` wrapper" do + tool = Dispatch::Adapter::ToolDefinition.new( + name: "get_weather", + description: "Get weather", + parameters: { "type" => "object", "properties" => { "city" => { "type" => "string" } } } + ) + + stub = stub_request(:post, "https://api.githubcopilot.com/responses") + .with do |req| + body = JSON.parse(req.body) + t = body["tools"]&.first + t && t["type"] == "function" && + t["name"] == "get_weather" && + t["description"] == "Get weather" && + !t.key?("function") + end + .to_return( + status: 200, + body: JSON.generate(responses_body(text: "ok")), + headers: { "Content-Type" => "application/json" } + ) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "weather?")] + responses_adapter.chat(messages, tools: [tool]) + + expect(stub).to have_been_requested + end + + it "converts tool results to function_call_output items in input" do + tool_use = Dispatch::Adapter::ToolUseBlock.new( + id: "call_1", name: "search", arguments: { "q" => "ruby" } + ) + tool_result = Dispatch::Adapter::ToolResultBlock.new( + tool_use_id: "call_1", content: "some results" + ) + + messages = [ + Dispatch::Adapter::Message.new(role: "user", content: "search ruby"), + Dispatch::Adapter::Message.new(role: "assistant", content: [tool_use]), + Dispatch::Adapter::Message.new(role: "user", content: [tool_result]) + ] + + stub = stub_request(:post, "https://api.githubcopilot.com/responses") + .with do |req| + body = JSON.parse(req.body) + input = body["input"] + fc = input.find { |i| i["type"] == "function_call" } + fco = input.find { |i| i["type"] == "function_call_output" } + fc && fc["call_id"] == "call_1" && fc["name"] == "search" && + fco && fco["call_id"] == "call_1" && fco["output"] == "some results" + end + .to_return( + status: 200, + body: JSON.generate(responses_body(text: "done")), + headers: { "Content-Type" => "application/json" } + ) + + responses_adapter.chat(messages) + + expect(stub).to have_been_requested + end + end + + context "with system: parameter" do + it "prepends system item at start of input array" do + stub = stub_request(:post, "https://api.githubcopilot.com/responses") + .with do |req| + body = JSON.parse(req.body) + body["input"].first == { "role" => "system", "content" => "Be concise." } + end + .to_return( + status: 200, + body: JSON.generate(responses_body(text: "ok")), + headers: { "Content-Type" => "application/json" } + ) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")] + responses_adapter.chat(messages, system: "Be concise.") + + expect(stub).to have_been_requested + end + end + + context "X-Initiator header" do + let(:ok_resp) do + { + status: 200, + body: JSON.generate(responses_body(text: "ok")), + headers: { "Content-Type" => "application/json" } + } + end + + it "sends X-Initiator: user for a fresh user message" do + stub = stub_request(:post, "https://api.githubcopilot.com/responses") + .with(headers: { "X-Initiator" => "user" }) + .to_return(**ok_resp) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")] + responses_adapter.chat(messages) + + expect(stub).to have_been_requested + end + + it "sends X-Initiator: agent when tool results are present" do + tool_use = Dispatch::Adapter::ToolUseBlock.new(id: "c1", name: "fn", arguments: {}) + tool_result = Dispatch::Adapter::ToolResultBlock.new(tool_use_id: "c1", content: "res") + + messages = [ + Dispatch::Adapter::Message.new(role: "user", content: "go"), + Dispatch::Adapter::Message.new(role: "assistant", content: [tool_use]), + Dispatch::Adapter::Message.new(role: "user", content: [tool_result]) + ] + + stub = stub_request(:post, "https://api.githubcopilot.com/responses") + .with(headers: { "X-Initiator" => "agent" }) + .to_return(**ok_resp) + + responses_adapter.chat(messages) + + expect(stub).to have_been_requested + end + end + + context "error mapping" do + it "maps 400 to RequestError (the error gpt-5.4 would give on wrong endpoint)" do + stub_request(:post, "https://api.githubcopilot.com/responses") + .to_return(status: 400, body: JSON.generate({ "error" => { "message" => "bad request" } })) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")] + expect { responses_adapter.chat(messages) }.to raise_error(Dispatch::Adapter::RequestError) + end + end + + context "streaming" do + def sse_events(*events) + all = events.map { |e| "data: #{JSON.generate(e)}\n\n" } + all << "data: [DONE]\n\n" + all.join + end + + it "yields text StreamDeltas and returns Response" do + body = sse_events( + { "type" => "response.output_item.added", "output_index" => 0, + "item" => { "type" => "message", "id" => "msg_001", "role" => "assistant", "content" => [] } }, + { "type" => "response.output_text.delta", "item_id" => "msg_001", + "output_index" => 0, "content_index" => 0, "delta" => "Hello" }, + { "type" => "response.output_text.delta", "item_id" => "msg_001", + "output_index" => 0, "content_index" => 0, "delta" => " world" }, + { "type" => "response.completed", + "response" => { "model" => "gpt-5.4", + "usage" => { "input_tokens" => 10, "output_tokens" => 2, "total_tokens" => 12 } } } + ) + + stub_request(:post, "https://api.githubcopilot.com/responses") + .with { |req| JSON.parse(req.body)["stream"] == true } + .to_return(status: 200, body: body, headers: { "Content-Type" => "text/event-stream" }) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")] + deltas = [] + response = responses_adapter.chat(messages, stream: true) { |d| deltas << d } + + text_deltas = deltas.select { |d| d.type == :text_delta } + expect(text_deltas.size).to eq(2) + expect(text_deltas[0].text).to eq("Hello") + expect(text_deltas[1].text).to eq(" world") + + expect(response).to be_a(Dispatch::Adapter::Response) + expect(response.content).to eq("Hello world") + expect(response.stop_reason).to eq(:end_turn) + expect(response.model).to eq("gpt-5.4") + expect(response.usage.input_tokens).to eq(10) + expect(response.usage.output_tokens).to eq(2) + end + + it "yields tool_use_start and tool_use_delta for function call streams" do + body = sse_events( + { "type" => "response.output_item.added", "output_index" => 0, + "item" => { "type" => "function_call", "id" => "fc_001", "call_id" => "call_001", + "name" => "get_weather" } }, + { "type" => "response.function_call_arguments.delta", + "item_id" => "fc_001", "output_index" => 0, "delta" => "{\"city\":" }, + { "type" => "response.function_call_arguments.delta", + "item_id" => "fc_001", "output_index" => 0, "delta" => "\"NYC\"}" }, + { "type" => "response.completed", + "response" => { "model" => "gpt-5.4", + "usage" => { "input_tokens" => 15, "output_tokens" => 8, "total_tokens" => 23 } } } + ) + + stub_request(:post, "https://api.githubcopilot.com/responses") + .to_return(status: 200, body: body, headers: { "Content-Type" => "text/event-stream" }) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "weather?")] + deltas = [] + response = responses_adapter.chat(messages, stream: true) { |d| deltas << d } + + starts = deltas.select { |d| d.type == :tool_use_start } + arg_deltas = deltas.select { |d| d.type == :tool_use_delta } + + expect(starts.size).to eq(1) + expect(starts.first.tool_call_id).to eq("call_001") + expect(starts.first.tool_name).to eq("get_weather") + + expect(arg_deltas.size).to eq(2) + expect(arg_deltas[0].argument_delta).to eq("{\"city\":") + expect(arg_deltas[1].argument_delta).to eq("\"NYC\"}") + + expect(response.stop_reason).to eq(:tool_use) + expect(response.tool_calls.size).to eq(1) + expect(response.tool_calls.first.name).to eq("get_weather") + expect(response.tool_calls.first.arguments).to eq({ "city" => "NYC" }) + expect(response.usage.input_tokens).to eq(15) + expect(response.usage.output_tokens).to eq(8) + end + + it "sends stream: true in the request body" do + body = sse_events( + { "type" => "response.completed", + "response" => { "model" => "gpt-5.4", "usage" => { "input_tokens" => 5, "output_tokens" => 1 } } } + ) + + stub = stub_request(:post, "https://api.githubcopilot.com/responses") + .with { |req| JSON.parse(req.body)["stream"] == true } + .to_return(status: 200, body: body, headers: { "Content-Type" => "text/event-stream" }) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "Hi")] + responses_adapter.chat(messages, stream: true) { |_d| } + + expect(stub).to have_been_requested + end + + it "returns nil content when there are no text deltas" do + body = sse_events( + { "type" => "response.output_item.added", "output_index" => 0, + "item" => { "type" => "function_call", "id" => "fc_001", "call_id" => "call_001", "name" => "fn" } }, + { "type" => "response.function_call_arguments.delta", + "item_id" => "fc_001", "output_index" => 0, "delta" => "{}" }, + { "type" => "response.completed", + "response" => { "model" => "gpt-5.4", "usage" => { "input_tokens" => 5, "output_tokens" => 1 } } } + ) + + stub_request(:post, "https://api.githubcopilot.com/responses") + .to_return(status: 200, body: body, headers: { "Content-Type" => "text/event-stream" }) + + messages = [Dispatch::Adapter::Message.new(role: "user", content: "do it")] + response = responses_adapter.chat(messages, stream: true) { |_d| } + + expect(response.content).to be_nil + expect(response.tool_calls.size).to eq(1) + end + end + end end |
