# RML compositing — architecture direction (gated by a spike)

> **Status: DIRECTION CHOSEN, gated by a GO/NO-GO spike.** This reopens the
> compositing half of plan.md §2 row 51 (wlr_scene for compositing). Nothing is
> committed until the spike (Phase 0) passes on the real CF-AX3. Naming below is
> PROPOSED — needs user sign-off before it lands in GLOSSARY.md.

## Thesis
The kernel's UI substrate (RMLUi) becomes the **content compositor**: application
toplevels, layer-shell clients (wallpaper, panels), and the existing UI chrome
are all RMLUi elements backed by **live, shared GL textures** — so window
**layout, animation, and 3D effects are expressed in RCSS**, with no per-frame
texture copies. Tiling, stage-manager, effects, etc. become RCSS + extension
policy on top of this.

## Why this is viable (the de-risking already done)
- **Shared EGLDisplay already exists** (plan.md §2 row 51): the wlr renderer and
  RMLUi's GLES 3.2 context share an EGLDisplay. Slice 3 + the stage dock already
  move client/scene pixels into RMLUi as textures via dmabuf/EGLImage on this
  exact Haswell/crocus HW — the texture handoff is proven; the spike's job is to
  make it **live + zero-copy + shared-handle** rather than snapshot.
- **RmlUi does transform-aware hit-testing for us.** `Context::ProcessMouse*` /
  `ProcessTouch*` pick the element under a point THROUGH `transform`/3D
  `perspective` and dispatch DOM-style events + `:hover` (auto-updated in
  `Context::Update`). So the hard geometry of routing input through a 3D-tilted
  window is handled upstream — we only translate "element X at local (lx,ly)" to
  `wl_seat` (surface-local coords + implicit grab), which the kernel already
  models.
- **RmlUi GL3 renderer supports** transforms, clip masks, **filters (blur,
  drop-shadow)**, **shaders**, and **render-to-texture** — and caches compiled
  geometry + effect passes (an unchanged blur isn't recomputed). Our custom
  RenderInterface must implement these hooks (the "custom is fine" work).

## Division of labour (the end state)
- **RMLUi = content compositor.** ALL on-screen content: toplevels, layer-shell
  clients (incl. wallpaper — user decision), UI chrome. Layout/animation/effects
  in RCSS.
- **wlroots = foundation + plane manager** (NOT optional; RMLUi can't talk to
  DRM):
  - backend/DRM, output management, modeset, vblank/frame scheduling, GLES
    renderer, buffer import (dmabuf→texture), allocation, `wl_seat`/input;
  - the **hardware cursor plane** (drawing the cursor in RMLUi would force a full
    recomposite on every move — keep it a wlr plane);
  - the **fullscreen-video scanout bypass** (deferred optimization — see below).
- This is NOT "hybrid compositing": in steady state RMLUi composites everything
  and `wlr_scene` is reduced to **present the RMLUi buffer + cursor plane +
  scanout bypass**. `wlr_scene` may stay as that thin presenter (it can even
  scan out the RMLUi buffer itself on the primary plane).

## Performance posture (replacing wlr_scene's damage/scanout)
What we give up by moving content off `wlr_scene`: per-surface damage, occlusion
culling, and direct scanout. Mitigations:
- **Dirty-gated rendering (OUR mechanism — NOT a RMLUi built-in).** RMLUi tracks
  internal dirty flags (so `Update()` is cheap at idle) and knows when animations
  are active, but it does NOT provide screen damage or a "skip this frame"
  signal — and the most important dirty source here (a client buffer updating) is
  OUR shared texture changing OUTSIDE RMLUi, which RMLUi can't see anyway. So WE
  gate: only schedule + `Render()` a frame when a signal we already own fires — a
  client buffer commit (wlroots), an active RCSS animation (RMLUi tells us), or an
  input-driven state change (hover/focus/drag). This keeps a static desktop at
  ~zero GPU — the big battery/thermal win on a 15 W fanless ultrabook — WITHOUT
  per-surface damage. (`request_frames` already stops scheduling at rest; we gate
  on dirtiness on top.) The spike must CONFIRM idle ≈ no work in practice.
- **Fullscreen-video scanout bypass (deferred).** When exactly one window is
  fullscreen with nothing composited on top, pull that surface OUT of the RMLUi
  composite and hand it to `wlr_scene`/scanout directly (RMLUi draws nothing that
  frame). Trigger = the fullscreen STATE (e.g. VLC's fullscreen button), not the
  click. Impact of NOT having it: more battery/heat during long fullscreen video,
  not breakage — so it's a later optimization, sized by a spike measurement.
- What we lose and accept for now: partial-region redraw when one small thing
  changes (minor; the HD4400 can repaint 1080p of simple quads within budget).

## Phase 0 — THE SPIKE (kernel/substrate; throwaway; one GO/NO-GO)
Acceptance criteria, measured on the real CF-AX3:
1. A **live** toplevel buffer sampled by RmlUi via a shared GL context — **zero
   per-frame copy** — drawn as an element.
2. An **RCSS 3D transform + transition** applied to it (visual proof of payoff).
3. **Pointer + touch + keyboard routed back** to that client through RmlUi
   picking → `wl_seat`, correct under a transform.
4. A toplevel **with a popup + subsurface** composited correctly → decides the
   surface-tree question: **per-subsurface elements** vs **per-window
   render-to-texture**.
5. A **layer-shell client (wallpaper)** also rendered as an element (proves the
   "wallpaper through RMLUi" decision; same mechanism as #1).
6. **Perf**: ~4 windows @1080p incl. one continuously updating (terminal/video);
   measure frame time AND confirm **idle = ~no work** with dirty-gating on; also
   measure the cost of pushing fullscreen video through RMLUi (to size the
   scanout bypass).
7. **Present path**: reuse the existing RMLUi-FBO → `wlr_scene_buffer` bridge
   (fastest to truth); cursor stays a wlr plane.

Output: report + GO/NO-GO + chosen answers to (4) and the present-path, + the
perf numbers.

## Phase 1 — Architecture (if GO): a design doc settling
- The **surface-element model** + surface-tree handling (from spike #4).
- The **shared-texture handoff** API in the substrate.
- **Unified input**: fold the kernel's pointer/touch routing into RmlUi picking
  (dovetails with the existing "substrate gets input first" implicit-grab path).
- The **extension-facing contract**: how a future tiling/effects/stage-dock
  extension places & animates windows — RCSS docs + data bindings (existing
  substrate contract) vs. a new typed window-layout service.
- Update plan.md §2 row 51 to the new compositing model.

## Phase 2 — Implementation (phased, behind a flag; session stays usable)
substrate: surface-element + shared-texture handoff → kernel: route toplevels +
layer-shell into the substrate instead of `wlr_scene`; switch input → port
focus/move/resize/fullscreen → re-express stage-dock minimize as RCSS → revisit
tiling (now trivial: RCSS layout over surface elements) → effects.

## Proposed naming (NEEDS SIGN-OFF before GLOSSARY.md)
- **RML compositing** — the approach: RMLUi composites all on-screen content.
- **surface element** — an RML element backed by a live client surface's shared
  texture (a toplevel OR a layer surface presented inside RMLUi). Uses the
  canonical "surface"; avoids the "window" alias.

## Open questions the spike resolves
- per-subsurface elements vs per-window render-to-texture (the #1 unknown);
- present path: reuse FBO→scene_buffer vs eventually render direct to output;
- real perf headroom on the HD4400 (frame time + idle + video).

## Relationship to other work
- **Tiling is deferred** and becomes much smaller on top of this (RCSS layout
  over surface elements; the pure layout core in `notes/tiling-spec.md` still
  applies — it's renderer-agnostic).
- The **stage dock** already prototypes the texture-import half (frozen
  previews); its minimize/restore becomes RCSS on the new path.
- Status bar / home screen (slices 11–12) are RMLUi chrome already — they fit
  natively.