DXGI doesn't take into account the SDRWhiteLevel that has already been
applied to the monitor when it converts to HDR10 which results in clipping.
This change set implements a HLSL shader to reverse this while at the same
time converting to HDR10.
This is still not perfect but far better then doing nothing.
There is no performance penalty to using ARGB10 as it's still a 32-bit copy per
pixel and the nvidia driver internally handles any conversions required to make
this work for both SDR and HDR.
@quantum has observed nvfbc under rare circumstances fail to initialize,
this adds a retry to the init with a short delay to hopefully recover
from this situation.
The prior commit to expose the FrameBuffer internals makes use of an
atomic from `stdatomic.h`. Unfortunatly C++ has no notion of _Atomic and
as such `stdatomic.h` is not compatible. To work around this we instead
just forward declare the type here.
This is an experimental & incomplete feature for those using
supersampling. Anything > 1200p will be downsampled by 50% before
copying out of the GPU to save on memory bandwidth.
Unfinished! Has issues with damage tracking and currently can not
be configured. Only dx11 has been tested at this point, everything
else will likely have problems/crash.
The nsleep() call lets d3d12 sleep for a more precise amount of
time while maintaining the current millisecond-scale sleep
interface in the configuration file.
To avoid client showing "Using : NVFBC (NVidia Frame Buffer Capt".
This happens because the string is truncated to 31 characters to fit
in the char capture[32]; member of KVMFRRecord_VMInfo.
For Windows 10, it so happens that the major.minor is 10.0. This is not
usually a given, e.g. on Windows 7 where it would read 6.1, on
Windows 8 it would read 6.2, and on Windows 8.1 it would read 6.3.
This is obviously undesirable, so we should just read the ProductName
from registry if possible. This results in something like:
OS Name: Windows 10 Pro for Workstations (Build: 19043)
This will cache up to 10 handles, in practice I have never seen DXGI
return anything but the same resource each time but we allow for more
anyway should MS change something in the future.
Should the cache get over filled it is disabled entirely and we revert
to the original behaviour.
The buffer input sizes to the `IDXGIOutputDuplication` methods are measured
in bytes. This dramatically increases the number of dirty/move rects that
can be handled.
This new feature while helps on some systems, others using freesync or
higher refresh rates where the capture can't keep up will limit to
fractions of the refresh rate. Better to disable and allow users to
opt-in.
This change reduces the host GPU and CPU load by a large margin
improving guest system performance along with removing latency spikes
when moving the mouse. This is default enabled but can be disabled with
the new option `dxgi:dwmFlush=no` as it limits the capture rate to the
refresh rate of the guests output which may not be desireable.
This change allows the host to provide information to the client about
how the VM is configured, information such as the UUID, CPU
configuration and capture method both for informational display in the
client as well as debugging in the client's logs.
The format of the records allows this to be extended later with new
record types without needing to bump the KVMFR version.
If there is LGMP corruption the LGMP thread will set the state to
REINIT which if this happens early enough will get overwritten if the
inital app state is set too late. Instead set the application initial
state early to avoid this.
If a new client connects but there has not been a capture timeout the
new client count wont be zeroed on a valid capture. If there is a
timeout later the host will still think there is a new client and
re-send a frame when it should not. This fixes this by reading the value
on a valid frame which zeros the new subs count.
If the wait times out, we used to simply restart the loop, which causes
it to check this->stop and exit if set to true. However, nvfbc_stop
already calls lgSignalEvent, which would wake up the pointer thread to
perform the check, so there is no need to set a timeout on the wait.
The host is a 64-bit application, so requiring 64-bit Windows isn't an
issue. However, not using 32-bit installers has an advantage: we don't
need to require WoW64.
It works, with the following limitations:
1. user is forced to select the monitor through platform-specific mechanisms
every time the client starts.
2. cursor is composed onto the screen, and no position can be reported.
This commit makes the crash handler show relative paths instead of absolute
ones, which makes the stack traces generated easier to read.
On the other hand, absolute paths makes sense on Linux, since the user is
expected to build the binaries themselves, and gdb will be able to find the
source code.
This commit adds damage tracking to the DXGI textures, and only copies the
damaged areas to the textures with ID3D11DeviceContext::CopySubresourceRegion.
The sleep logic in waitFrame makes it difficult for this to reduce the
latency, but removing it shows significant improvements (6-7 ms to ~3 ms)
when a tiny portion of the screen is damaged, while showing no difference on
full screen damage.
This implementation uses a line sweep algorithm to copy the precisely the
intersection of all accumulated damage rectangles, ensuring that every
pixel is copied exactly once, and no pixel is ever copied multiple times.
Furthermore, once a row has been swept, we update the framebuffer write
pointer immediately.
Certain drivers do not support pitches that are not multiples of 128 bytes,
and instead just does some kind of rounding internally. On DXGI, this is not
a problem because the API rounds pixel pitch, but NvFBC does not. This causes
certain resolutions to simply not work with dmabuf, most notably 3440x1440,
which is 1440p ultrawide.
Since we are copying pixels with the CPU anyways, we might as well round the
pitch up to 128 bytes (32 pixels).
This commit adds a new host configuration option, nvfbc:diffRes, which
specifies the dimensions of every block in the diff map. This defaults to
128, meaning the default 128x128 block size.
Since block sizes other than 128x128 is not guaranteed to be supported by
NvFBC, the function NvFBCGetDiffMapBlockSize was introduced to query the
support and output the actual block size used.