This commit adds damage tracking to the DXGI textures, and only copies the
damaged areas to the textures with ID3D11DeviceContext::CopySubresourceRegion.
The sleep logic in waitFrame makes it difficult for this to reduce the
latency, but removing it shows significant improvements (6-7 ms to ~3 ms)
when a tiny portion of the screen is damaged, while showing no difference on
full screen damage.
This implementation uses a line sweep algorithm to copy the precisely the
intersection of all accumulated damage rectangles, ensuring that every
pixel is copied exactly once, and no pixel is ever copied multiple times.
Furthermore, once a row has been swept, we update the framebuffer write
pointer immediately.
Certain drivers do not support pitches that are not multiples of 128 bytes,
and instead just does some kind of rounding internally. On DXGI, this is not
a problem because the API rounds pixel pitch, but NvFBC does not. This causes
certain resolutions to simply not work with dmabuf, most notably 3440x1440,
which is 1440p ultrawide.
Since we are copying pixels with the CPU anyways, we might as well round the
pitch up to 128 bytes (32 pixels).
This commit adds a new host configuration option, nvfbc:diffRes, which
specifies the dimensions of every block in the diff map. This defaults to
128, meaning the default 128x128 block size.
Since block sizes other than 128x128 is not guaranteed to be supported by
NvFBC, the function NvFBCGetDiffMapBlockSize was introduced to query the
support and output the actual block size used.
When our window is destroyed, our timers are also destroyed. This causes our
attempt at destruction to fail. Instead, set MessageHWND to NULL in the
WM_DESTROY handler and don't try destroying the timers if the window is gone.
DestroyWindow can only be invoked on the thread that created the window.
All other threads must use WM_CLOSE or another message to signal tell the
window to destroy itself.
MinGW seems to decide at random whether it wants to use memcpy from
mscvrt.dll or ntdll.dll. Currently, on Debian buster, ntdll.dll is chosen,
while on sid, mscvrt.dll is chosen.
This commit declares a new .def file for ntdll containing only the
functions we want to link from ntdll.dll, and generates ntdll.a from it
with dlltool. This way, MinGW will never be tempted to link functions
like memcpy from ntdll.dll.
This function is sometimes flaky and may fail for no apparent reason,
see https://stackoverflow.com/q/3945003. This has also been experienced
during the development of #610.
This commit adds logging so we may see if it ever fails for no reason
and work out some way to fix it.
We were using an auto-reset event to signal the mousehook exit. This was
fine when there was only one thread, but with the addition of the update
thread, only one thread is signaled, causing the wait to last forever.
The fix is switching to a manual reset event and call ResetEvent after
the threads have exited.
The type of the QuadPart member of the LARGE_INTEGER union is actually
LONGLONG, so we should cast to LONGLONG instead of int.
This avoids truncation should (ms * 10000.0f) exceed 2^31-1.
It used to be the case that when updating app.manifest, the resource file
it not automatically rebuilt. This made it a headache to update the manifest.
We set OBJECT_DEPENDS so that cmake knows to make the res file depend on
app.manifest and icon.ico.
This commit is based on PR #579 and should be rebased on it after it's merged.
GCC 11 will support x86_64 micro-architecture feature levels.
What we really want to support is nehalem or newer, which is x86-64-v2,
and specifying this instead of nehalem means that we are not tuning for
nehalem specifically.
This function is available since Windows Vista and can therefore be used
directly without going through GetProcAddress. Unfortunately, MinGW does
not have d3dkmthk.h, but we can declare the prototype ourselves and link
against gdi32.dll.
There is no need to LoadLibrary and GetProcAddress to get pointers to
NtDelayExecution or NtSetTimerResolution. These functions don't have
prototypes in any SDK header, but they are exported in ntdll.dll and
we can simply declare the prototype and link ntdll.
There is also no chance that the functions do not exist: I checked an
old install of Windows NT 4.0 and both of these functions exist.
Also used NtSetTimerResolution instead of ZeSetTimerResolution for
consistency (they are the same).
Also changed system timer resolution log message units to μs with
one decimal digit for readability. This is the actual amount of
precision available to us.
According to MSDN documentation for CreateEnvironmentBlock, "[i]f the
environment block is passed to CreateProcessAsUser, you must also
specify the CREATE_UNICODE_ENVIRONMENT flag."
Also pass DETACHED_PROCESS because the host is a GUI application and
doesn't use the console.
Since with the service, we are already running as SYSTEM, we don't need
to use dupeSystemProcessToken to get the token for SYSTEM. This removes
the need for having SeDebugPrivilege, SeTcbPrivilege, and
SeAssignPrimaryTokenPrivilege, or otherwise doing sketchy things.
Furthermore, we now only open the token with the privileges we actually
need.
This allows the process to be terminated without resorting to
TerminateProcess. With some fixes, this allows the notification icon to be
removed when the service is restarted.
Furthermore, instead of sending WM_DESTROY to fool the window into believing
it's being destroyed, we actually call DestroyWindow now.
For adjacent changed regions, we actually use the bounding box for the
entire polygon. This may result in more area being damaged than strictly
necessary, but is nevertheless desirable since it reduces the number of
rectangles.
The windows hook WH_MOUSE_LL is called in such a way that any delay in
processing causes a system wide stall. This change spawns an extra
thread which waits on an event set by the hook which is then used to
call the callback with an artifical limit of 1000Hz.
Before we try and perhaps fail to init DXGI, we should print out what
the device is so that when there is an error report we can immediately
see if the user has the QXL device attached still.
While it's correct for DXGI to use a asyncronous waitFrame model, other
capture interfaces such as NvFBC it is not correct. This change allows
the capture interface to specify which is more correct for it and moves
the waitFrame/post into the main thread if async is not desired.
This changes the host to use a seperate pool of LGMP memory for cursor
positionl updates without shape information helping to prevent
corruption of the shape entries if they are still pending. While this is
not a perfect solution it resolves the issue without making major
changes to LGMP during the RC phase we are currently in.
Before, we only break out of the current row when a change is detected,
and all subsequent rows are still scanned. Now we break out of the entire
loop. This should make change detection ever so slightly faster.
Testing shows that `D3DKMTSetProcessSchedulingPriorityClass` has a
positive performance impact for NvFBC as well as DXGI, as such always
try to boost the priority for the windows host.
This so called "enhanced" event logic is completely flawed and can never
work correctly, better to strip it out and put our faith in windows to
handle the events for us.
And yes, I am fully aware I wrote the utter trash in the first place :)
People often miss the warnings about invalid arguments in their command
line, this last minute patch attempts to address this by making
warnings, errors, fixme's and fatal errors stand out if stdout is a TTY.
If the guest VM is not showing a cursor when it starts such as on the
Windows login screen, the client never gets the current position of the
cursor, which prevents the client from attempting to send mouse
movements. This change ensures the client gets the mouse location on
startup.
We should only advance the pointerIndex if the buffer was not swapped
out for storage. This is to ensure that we do not overwrite cursor
memory that the client(s) may still be using.
This reverts commit d82f2e510d.
While the proposed change is more correct, it breaks the generation of
the file due to failure to locate the resource files, such as
`resources/icon.ico`.
When a new cursor shape is provided by the capture interface we need to
retain a copy of it incase a new client connects which will not yet have
the cursor shape. The logic here was flawed causing the wrong shape to
be sent to a new client in some instances.
This change adds an average function to time how long it takes the GPU
to copy and map the texture, and then uses this average to sleep for 80%
of this average lowering CPU usage and potentially decreasing lock
contention.
It has been detemined that a failure to init NvFBC causes a 20-30%
performance penalty on non NvFBC supported hardware (GeForce) when using
DXGI, as such reverse the order and default to using DXGI as our first
option.
If NvFBC is still desired, pr #500 added the option `app:capture` which
can be used to force NvFBC.
One of the most common issues reported in the support channels is the
IVSHMEM size being too small. This change adds a calculation to
determine an optimal size and uses the new `os_showMessage` platform
method to display a message box to the user with the error.
Since we now let the mouse hook linger until the process is killed, the
cursor event that the hook signals may now be null, as the capture could
have stopped. If the hook fires during this time, a crash occurs.