ares-openbsd/scripts
jcm b5cd509a77
ruby: Add Metal VRR support, various Metal driver fixes (#1455)
Adds to the Metal backend from #1431, extending host VSync functionality
to variable refresh rate (VRR) displays, enabling the ability to sync to
guest and host refresh rates simultaneously, along with an assortment of
minor changes and code cleanups:

- Adds a threaded renderer option (technically a GCD queue) to the Metal
backend; essential for VRR at lower audio latencies. At higher audio
latencies, the synchronous rendering option may function better.
- Add easier debug capabilities, including a shell script to compile a
debug `.metallib` shader library which will be used at runtime if ares
is compiled in debug mode. The debug `.metallib` enables GPU frame
capture.
- Implement `clear()` for the Metal backend; the last displayed frame
will no longer stay on the screen if a core is explicitly unloaded.
- Implement fullscreen mode. Uses a borderless window that covers the
entire screen, rather than idiomatic fullscreen, primarily in order to
render around the camera housing. Normal macOS fullscreen behavior is
available via the main window title bar controls.
- Remove unused custom window code.
- Remove redundant copy of `Shaders.metal`.
- Miscellaneous small fixes and code cleanups to resolve compiler
warnings.


https://github.com/ares-emulator/ares/assets/6864788/1e2d594a-c84f-4b95-a792-cacdde2a09b0

## ares vs. VRR
As discussed in #1431, there are some challenges in implementing smooth
VRR support for ares on Metal. Briefly:

- ares lacked a mechanism to tell the display backend the core's desired
refresh rate, in order to sync to it.
- ares on macOS lacked an effective mechanism for receiving callbacks
when frames were presented, in order to keep track of whether we were
running fast or slow, because of the way ares monopolizes the main
thread on macOS.

The first issue has been resolved by a3c57b4 and ec0b625. Cores now tell
the `Screen` instance how often they want to present, and the screen
instance will tell the display backend if it implements the
`refreshRateHint` function. This enables the display backend to quickly
respond to changing guest refresh rates even during runtime.

The second issue is a bit trickier. As discussed, `MTKViewDelegate` and
`CAMetalDisplayLink` are both effectively unavailable until ares moves
its primary emulation work off of the main thread. There is, however,
one Metal callback mechanism that can still be used by ares;
[[MTLDrawable
addPresentedHandler:]](https://developer.apple.com/documentation/metal/mtldrawable/2806858-addpresentedhandler?language=objc),
because it calls back on a dedicated serial queue
(`com.apple.coreanimation.CAMachPortUtilReplyQueue`) rather than the
main thread.

Using this presented handler, ares can use `drawable.presentedTime` to
keep a running average of how long our frames are being presented for,
determine if we are running ahead or behind, and modify our subsequent
present intervals accordingly. That is what we do for this initial VRR
implementation.

## macOS VRR

> [!NOTE]
> The following applies to "ProMotion" displays, as those are the ones
most tightly integrated with macOS and most likely to be used by Mac
users on VRR displays.

Unfortunately, VRR sync is not as simple as just picking a present
interval and telling macOS to present each frame for the guest's
requested interval. macOS only pretends to offer this capability in
exclusive fullscreen mode, and even then, present intervals cannot be
completely arbitrary. Realistically, we seem to be able to get
consistent present intervals for some integer refresh rates between 40
Hz and 120 Hz, and certain rational present intervals (59.97, 23.976
among them). The spread of achievable consistent present intervals seems
to be arbitrary enough that there isn't any sense in targeting them
specifically. Even if we can theoretically present at the exact interval
the guest wants, we inevitably end up falling behind due to transient
load conditions.

> [!NOTE]
> For background, the landscape of guest refresh rates in ares is quite
wide. Many systems present near 60 Hz; 59.97, 60.01, 59.73, 59.92,
59.82, etc. Many of these systems have PAL modes that present near 50
with similar variations. The WonderSwan presents at a maximum of ~75.47
Hz, with per-game variations. The Atari 2600 can present at completely
arbitrary intervals during runtime owing to its CRT- and
processor-centric presentation strategies. In short, It is difficult to
narrow the range of expected guest present intervals if we wanted to
simplify this problem.

Rather, we have to pursue a more holistic strategy. The strategy ares
uses with this PR is as follows:

1. The output function sends frames into a FIFO dispatch queue that can
present asynchronously, to reduce the burden of needing to present
immediately if we are in danger of blocking the main thread.
2. If we are more than 0.5% off of the targeted present interval as
determined by a weighted rolling average, start "nudging" the system to
present at earlier intervals calculated according to the difference
between the rolling average and the target, multiplied by a constant
`kVRRCorrectiveForce`.
- This is done because we can often prod the system into presenting at
rational intervals close to the target interval, resulting in an overall
smoother presentation to the user than if we were to correct more
forcefully.
3. If more than `kVRRImmediatePresentThreshold = 3` frames are in the
queue to be presented, start telling the system to present immediately
instead of "nudging."
- This is a more forceful correction toward the target present interval,
and must be kept low for systems that want to present at intervals that
are far away from any achievable consistent present interval.
4. If the queue gets much deeper, `kMaxSourceBuffersInFlight = 6`, start
dropping frames.
- This is necessary for synchronizing to neither audio nor video, or
else in rare cases where the GPU is overloaded, such as for a shader
that the system is not capable of rendering in time each frame.

This system achieves decent results across most systems in my testing. I
considered allowing these constants to be twiddled with sliders, but was
wary of presenting too many esoteric options to the user. If it's
determined to be worth exposing these constants, that can be done in
future work (if a superior VRR presentation strategy entirely is not
discovered by that point).

## Miscellany

- The threaded renderer is necessary so that we do not ever block the
main thread in the worst case present interval conditions. However, it
only works well at relatively low audio latencies. At higher audio
latencies (as for users on lower-specced systems), the queue overflows
easily and it works better to render synchronously.
- The threaded renderer can also have unintuitive results when
exclusively syncing to host VSync or GPU sync, so for users that expect
particular behavior from that functionality, it is better left disabled.
- For PAL refresh intervals, players will get generally better results
in exclusive fullscreen. macOS likes to render things at 60Hz and will
struggle to render near 50 Hz with other elements onscreen.
- To view the present interval graph as in these debug shots, input
`defaults write -g MetalForceHudEnabled -bool YES` at the command line
before launching ares.

## Future Work

We could possibly achieve better VRR results utilizing something like
`CAMetalDisplayLink` after freeing up the macOS main thread. In my
testing it seems to have the same limitations in terms of achievable
consistent present intervals, but it may offer a more precise picture of
host vs. guest present timing.

Regardless of the viability of other strategies, it would still be
valuable to free up the main thread on macOS for the sake of other
system APIs that may be used in the future, plus benefits of being able
to use the UI consistently and smoothly concurrently with emulation.

N64 and PS1 also do not fully implement the `refreshRateHint` API, and
the API may be buggy for some platforms. Metal refresh rate hints
currently appear in stdout, so if you are seeing an issue with VRR sync,
check the console to see the guest's requested present interval first.

## Gallery 


https://github.com/ares-emulator/ares/assets/6864788/1022c24b-fd83-4ede-b1ac-3f0c3a207972


https://github.com/ares-emulator/ares/assets/6864788/7e36bd4a-7acd-4d37-a572-7550ba1ad2b2

(PAL Super Mario World)


https://github.com/ares-emulator/ares/assets/6864788/9655fcbe-28de-48b2-88b7-7fb864991ef0

Co-authored-by: jcm <butt@butts.com>
2024-04-21 19:27:16 +01:00
..
macos-make-universal.sh ci: add macOS notarization 2022-09-29 22:52:27 +01:00
macos-metal-debug.sh ruby: Add Metal VRR support, various Metal driver fixes (#1455) 2024-04-21 19:27:16 +01:00
push-subtrees.sh scripts: add script to split nall/hiro/ruby/libco to their upstream repos 2023-01-20 13:42:13 +00:00
update-arcade-rom-db.sh arcade: use mame machine names for convenience 2024-04-03 06:57:51 +01:00
update-subtrees.sh