# Rate Limiting — Implementation Plan

Cross-process, per-account rate limiting for the Copilot adapter. All processes sharing the same GitHub account (same `token_path` directory) share a single rate limit state via the filesystem.

---

## Overview

Two rate limiting mechanisms, both enforced transparently (the adapter sleeps until allowed, never raises):

1. **Per-request cooldown** — Minimum interval between consecutive requests. Default: 3 seconds.
2. **Sliding window limit** — Maximum N requests within a time period. Default: disabled (`nil`).

Both are configured via constructor parameters. Rate limit state is stored in a file next to the persisted GitHub token, using `flock` for cross-process atomic access.

---

## Configuration

### Constructor Parameters

```ruby
Copilot.new(
  model: "gpt-4.1",
  github_token: nil,
  token_path: nil,
  max_tokens: 8192,
  thinking: nil,
  min_request_interval: 3.0,          # seconds between requests (Float/Integer, nil to disable)
  rate_limit: nil                      # sliding window config (Hash or nil to disable)
)
```

#### `min_request_interval:` (default: `3.0`)

- Minimum number of seconds that must elapse between the start of one request and the start of the next.
- Set to `nil` or `0` to disable.
- Applies system-wide across all processes sharing the same rate limit file.

#### `rate_limit:` (default: `nil` — disabled)

- A Hash with two keys: `{ requests: Integer, period: Integer }`.
  - `requests` — Maximum number of requests allowed within the window.
  - `period` — Window size in seconds.
- Example: `{ requests: 10, period: 60 }` means at most 10 requests per 60-second sliding window.
- Set to `nil` to disable sliding window limiting (only per-request cooldown applies).
- Validation: both `requests` and `period` must be positive integers when provided. Raises `ArgumentError` otherwise.

---

## Behaviour

When `chat` or `list_models` is called (any method that hits the Copilot API):

1. **Acquire the rate limit file lock** (`flock(File::LOCK_EX)`).
2. **Read the rate limit state** from the file.
3. **Check per-request cooldown**: If less than `min_request_interval` seconds have elapsed since the last request timestamp, calculate the remaining wait time.
4. **Check sliding window** (if configured): Count how many timestamps in the log fall within `[now - period, now]`. If the count >= `requests`, calculate the wait time until the oldest entry in the window expires.
5. **Take the maximum** of both wait times (they can overlap).
6. **Release the lock**, then **sleep** for the calculated wait time (if any).
7. **Re-acquire the lock**, re-read state, re-check (the state may have changed while sleeping — another process may have made a request during our sleep).
8. **Record the current timestamp** in the state file and release the lock.
9. **Proceed** with the API request.

The re-check-after-sleep loop is necessary because another process could slip in a request while we were sleeping. The loop converges quickly (at most a few iterations) because each process sleeps for the correct duration.

### Thread Safety

The existing `@mutex` protects the Copilot token refresh. Rate limiting uses a separate concern:

- **Cross-process**: `flock` on the rate limit file.
- **In-process threads**: The `flock` call itself is sufficient — Ruby's `File#flock` blocks the calling thread (does not hold the GVL while waiting), so concurrent threads in the same process will serialize correctly through the flock.

---

## File Format

### Path

```
{token_path_directory}/copilot_rate_limit
```

Where `token_path_directory` is `File.dirname(@token_path)`. Since `@token_path` defaults to `~/.config/dispatch/copilot_github_token`, the rate limit file defaults to `~/.config/dispatch/copilot_rate_limit`.

### Contents

JSON with two fields:

```json
{
  "last_request_at": 1743465600.123,
  "request_log": [1743465590.0, 1743465595.0, 1743465600.123]
}
```

- `last_request_at` — Unix timestamp (Float) of the most recent request. Used for per-request cooldown.
- `request_log` — Array of Unix timestamps (Float) for recent requests. Used for sliding window. Entries older than the window `period` are pruned on every write to keep the file small.

If sliding window is disabled, `request_log` is still maintained (empty array) so that enabling it later works immediately without losing the last-request timestamp.

When the file does not exist or is empty/corrupt, treat it as fresh state (no previous requests).

### File Permissions

Created with `0600` (same as the token file) to prevent other users from reading/tampering.

---

## Implementation Structure

### New File: `lib/dispatch/adapter/rate_limiter.rb`

A standalone class `Dispatch::Adapter::RateLimiter` that encapsulates all rate limiting logic. The Copilot adapter delegates to it.

```ruby
class RateLimiter
  def initialize(rate_limit_path:, min_request_interval:, rate_limit:)
    # ...
  end

  def wait!
    # Acquire lock, read state, compute wait, sleep, record, release.
  end
end
```

#### Public API

- `#wait!` — Blocks until the rate limit allows a request, then records the request timestamp. Called by the adapter before every API call.

#### Private Methods

- `#read_state(file)` — Parse JSON from the locked file. Returns default state on missing/corrupt file.
- `#write_state(file, state)` — Write JSON state back to the file.
- `#compute_wait(state, now)` — Returns the number of seconds to sleep (Float, 0.0 if no wait needed).
- `#prune_log(log, now, period)` — Remove timestamps older than `now - period`.
- `#record_request(state, now)` — Append `now` to log, update `last_request_at`, prune old entries.

### Changes to `Dispatch::Adapter::Copilot`

1. Add constructor parameters `min_request_interval:` and `rate_limit:`.
2. In `initialize`, create a `RateLimiter` instance.
3. Call `@rate_limiter.wait!` at the start of `chat_non_streaming`, `chat_streaming`, and `list_models` — after `ensure_authenticated!` (authentication should not be rate-limited) but before the HTTP request.
4. Validate `rate_limit:` hash structure in the constructor.

### Changes to `Dispatch::Adapter::Base`

No changes. Rate limiting is an implementation concern of the Copilot adapter, not part of the abstract interface. Other adapters may have different rate limiting strategies or none at all.

---

## Edge Cases

| Scenario | Behaviour |
|---|---|
| Rate limit file does not exist | Treat as no previous requests. Create on first write. |
| Rate limit file contains invalid JSON | Treat as no previous requests. Overwrite on next write. |
| Rate limit file directory does not exist | Create it (same as `persist_token` does for the token file). |
| `min_request_interval: nil` or `0` | Per-request cooldown disabled. |
| `rate_limit: nil` | Sliding window disabled. Only cooldown applies. |
| Both disabled | `wait!` is a no-op (returns immediately). |
| `rate_limit:` missing `requests` or `period` key | Raises `ArgumentError` in constructor. |
| `rate_limit: { requests: 0, ... }` or negative | Raises `ArgumentError` in constructor. |
| Clock skew between processes | Handled — we use monotonic-ish `Time.now.to_f`. Minor skew (sub-second) is acceptable. Major skew (NTP jump) could cause one extra wait or one early request, which is acceptable. |
| Process killed while holding lock | `flock` is automatically released by the OS when the file descriptor is closed (including process termination). No stale locks. |
| Very long `request_log` after sustained use | Pruned on every write. Maximum size = `rate_limit[:requests]` entries. |

---

## Validation Rules

In the constructor:

- `min_request_interval` must be `nil`, or a `Numeric` >= 0. Raise `ArgumentError` otherwise.
- `rate_limit` must be `nil` or a `Hash` with:
  - `:requests` — positive `Integer`
  - `:period` — positive `Integer` or `Float`
  - No extra keys required; extra keys are ignored.
- Raise `ArgumentError` with a descriptive message on invalid config.