X11 needs to calibrate to get the best possible latency, as such it
needs the scene to render so that the render time of the scene can be
accounted for in the delay calculation.
This replaces the scaled `destRect` with a version that uses doubles
correcting the rounding error that is causing a failure to properly
clear the black bar areas.
This mesh will later be used to render only damaged portions of the desktop.
We also moved the coordinate transformation for damage overlay into a matrix
and computed by the shader.
XPresent doesn't give us the time before presentation, but the time just
after. This code calculates and calibrates a delay to sleep for before
signaling the wait event for render when using jitRender
After the damage queue PR, EGL damage count 0 means no change, and -1 means
invalidate the entire window. However, several other places have different
semantics, and we are not handling them correctly:
1. KVMFR uses 0 to signal invalidating the entire frame, so if we receive 0
rectangles in egl_on_frame, we should set damage count to -1.
2. The damage overlay treated 0 as full damage, which is now incorrect. This
is fixed, and now it treats 0 as no update, and -1 as full damage.
The way things were handled in EGLTexture is not only very hard to
follow, but broken. This change set breaks up EGLTexture into a modular
design making it easier to implement the various versions.
Note that DMABUF is currently broken and needs to be re-implemented.
There used to be a possible race when a bunch of rectangle is appended, but
the total count is not updated before it's read. Using a lock eliminates
all such races.
Without configuring Wayland compositors to send frame callbacks as late as
possible, JIT rendering can increase latency by more than one frame.
For example, by default, sway asks applications to render right after a
vblank, and does its own composition right after a vblank, resulting in
~2 frame's worth of latency. If max_render_time is set on the output,
it composes that many milliseconds before the vblank, losing ~1 frame's
worth of latency. If max_render_time is set on the window also, the frame
callback is sent that many milliseconds before composition, and we achieve
perfectly low latency.
Therefore, out of the box, JIT rendering should not be enabled, as manual
compositor configuration is required for optimal results.
For reference, the following sway settings results in the best latency:
output <insert output name> max_render_time 1
for_window [app_id="looking-glass-client"] max_render_time 1
This reverts commit 3baed05728.
If we invalidate the window, we used to not update this->cursorLast, and
this causes us to lose track of the cursor. Now we update this->cursorLast
unconditionally, and this fixes the issue.
Version 3 does not send xdg_output.done events, instead guaranteeing that
all xdg_output.* events are sent before wl_output.done. This saves us from
doing the work twice.
The method used is not guaranteed to work on all Wayland compositors,
so offer a way out. We need to support it anyways in case xdg_output
or wp_viewporter protocols are not available.
Currently, we scale the desktop up to the next largest integer, and rely on
the wayland compositor to scale it back down to the correct size.
This is obviously undesirable.
In this commit, we attempt to detect the actual fractional scaling by finding
the current active mode in wl_output, and dividing it by the logical screen
size reported by xdg_output, taking into consideration screen rotation.
We then use wp_viewporter to set the exact buffer and viewport sizes if
fractional scaling is needed.
When requested, JIT render mode will be used if the display server supports it.
Otherwise, a warning is generated instead.
This essentially uses the signalNextFrame logic for imgui, but for everything.
We automatically enable this mode when overlay is on.
Currently, this exposes some damage tracking bugs in the EGL renderer.
This prevents damage from being overwritten when frames are received
faster than could be rendered.
This implementation cycles between two queues, removing all need for
memory allocation.