The target latency is now based upon the device maximum period size
(which may be configured by setting the `PIPEWIRE_LATENCY` environment
variable if using PipeWire), with some allowance for timing jitter from
Spice and the audio device.
PipeWire can change the period size dynamically at any time which must be
taken into account when selecting the target latency to avoid underruns
when the period size is increased. This is explained in detail within the
commit body.
This change allows the audiodevs to return the minimum period frames
needed to start playback instead of having to rely on a pull to obtain
these details.
Additionally we are using this information to select an initial start
latency as well as to train the desired latency in order to keep it as
low as possible.
This change is based on the techniques described in [1] and [2].
The input audio stream from Spice is not synchronised to the audio playback
device. While the input and output may be both nominally running at 48 kHz,
when compared against each other, they will differ by a tiny fraction of a
percent. Given enough time (typically on the order of a few hours), this
will result in the ring buffer becoming completely full or completely
empty. It will stay in this state permanently, periodically resulting in
glitches as the buffer repeatedly underruns or overruns.
To address this, adjust the speed of the received data to match the rate at
which it is being consumed by the audio device. This will result in a
slight pitch shift, but the changes should be small and smooth enough that
this is unnoticeable to the user.
The process works roughly as follows:
1. Every time audio data is received from Spice, or consumed by the audio
device, sample the current time. These are fed into a pair of delay
locked loops to produce smoothed approximations of the two clocks.
2. Compute the difference between the two clocks and compare this against
the target latency to produce an error value. This error value will be
quite stable during normal operation, but can change quite rapidly due
to external factors, particularly at the start of playback. To smooth
out any sudden changes in playback speed, which would be noticeable to
the user, this value is also filtered through another delay locked loop.
3. Feed this error value into a PI controller to produce a ratio value.
This is the target playback speed in order to bring the error value
towards zero.
4. Resample the input audio using the computed ratio to apply the speed
change. The output of the resampler is what is ultimately inserted into
the ring buffer for consumption by the audio device.
Since this process targets a specific latency value, rather than simply
trying to rate match the input and output, it also has the effect of
'correcting' latency issues. If a high latency application (such as a media
player) is already running, the time between requesting the start of
playback and the audio device actually starting to consume samples can be
very high, easily in the hundreds of milliseconds. The changes here will
automatically adjust the playback speed over the course of a few minutes to
bring the latency back down to the target value.
[1] https://kokkinizita.linuxaudio.org/papers/adapt-resamp.pdf
[2] https://kokkinizita.linuxaudio.org/papers/usingdll.pdf