ares-openbsd/scripts/macos-metal-debug.sh

8 行
211 B
Bash
Raw 通常表示 履歴

ruby: Add Metal VRR support, various Metal driver fixes (#1455) Adds to the Metal backend from #1431, extending host VSync functionality to variable refresh rate (VRR) displays, enabling the ability to sync to guest and host refresh rates simultaneously, along with an assortment of minor changes and code cleanups: - Adds a threaded renderer option (technically a GCD queue) to the Metal backend; essential for VRR at lower audio latencies. At higher audio latencies, the synchronous rendering option may function better. - Add easier debug capabilities, including a shell script to compile a debug `.metallib` shader library which will be used at runtime if ares is compiled in debug mode. The debug `.metallib` enables GPU frame capture. - Implement `clear()` for the Metal backend; the last displayed frame will no longer stay on the screen if a core is explicitly unloaded. - Implement fullscreen mode. Uses a borderless window that covers the entire screen, rather than idiomatic fullscreen, primarily in order to render around the camera housing. Normal macOS fullscreen behavior is available via the main window title bar controls. - Remove unused custom window code. - Remove redundant copy of `Shaders.metal`. - Miscellaneous small fixes and code cleanups to resolve compiler warnings. https://github.com/ares-emulator/ares/assets/6864788/1e2d594a-c84f-4b95-a792-cacdde2a09b0 ## ares vs. VRR As discussed in #1431, there are some challenges in implementing smooth VRR support for ares on Metal. Briefly: - ares lacked a mechanism to tell the display backend the core's desired refresh rate, in order to sync to it. - ares on macOS lacked an effective mechanism for receiving callbacks when frames were presented, in order to keep track of whether we were running fast or slow, because of the way ares monopolizes the main thread on macOS. The first issue has been resolved by a3c57b4 and ec0b625. Cores now tell the `Screen` instance how often they want to present, and the screen instance will tell the display backend if it implements the `refreshRateHint` function. This enables the display backend to quickly respond to changing guest refresh rates even during runtime. The second issue is a bit trickier. As discussed, `MTKViewDelegate` and `CAMetalDisplayLink` are both effectively unavailable until ares moves its primary emulation work off of the main thread. There is, however, one Metal callback mechanism that can still be used by ares; [[MTLDrawable addPresentedHandler:]](https://developer.apple.com/documentation/metal/mtldrawable/2806858-addpresentedhandler?language=objc), because it calls back on a dedicated serial queue (`com.apple.coreanimation.CAMachPortUtilReplyQueue`) rather than the main thread. Using this presented handler, ares can use `drawable.presentedTime` to keep a running average of how long our frames are being presented for, determine if we are running ahead or behind, and modify our subsequent present intervals accordingly. That is what we do for this initial VRR implementation. ## macOS VRR > [!NOTE] > The following applies to "ProMotion" displays, as those are the ones most tightly integrated with macOS and most likely to be used by Mac users on VRR displays. Unfortunately, VRR sync is not as simple as just picking a present interval and telling macOS to present each frame for the guest's requested interval. macOS only pretends to offer this capability in exclusive fullscreen mode, and even then, present intervals cannot be completely arbitrary. Realistically, we seem to be able to get consistent present intervals for some integer refresh rates between 40 Hz and 120 Hz, and certain rational present intervals (59.97, 23.976 among them). The spread of achievable consistent present intervals seems to be arbitrary enough that there isn't any sense in targeting them specifically. Even if we can theoretically present at the exact interval the guest wants, we inevitably end up falling behind due to transient load conditions. > [!NOTE] > For background, the landscape of guest refresh rates in ares is quite wide. Many systems present near 60 Hz; 59.97, 60.01, 59.73, 59.92, 59.82, etc. Many of these systems have PAL modes that present near 50 with similar variations. The WonderSwan presents at a maximum of ~75.47 Hz, with per-game variations. The Atari 2600 can present at completely arbitrary intervals during runtime owing to its CRT- and processor-centric presentation strategies. In short, It is difficult to narrow the range of expected guest present intervals if we wanted to simplify this problem. Rather, we have to pursue a more holistic strategy. The strategy ares uses with this PR is as follows: 1. The output function sends frames into a FIFO dispatch queue that can present asynchronously, to reduce the burden of needing to present immediately if we are in danger of blocking the main thread. 2. If we are more than 0.5% off of the targeted present interval as determined by a weighted rolling average, start "nudging" the system to present at earlier intervals calculated according to the difference between the rolling average and the target, multiplied by a constant `kVRRCorrectiveForce`. - This is done because we can often prod the system into presenting at rational intervals close to the target interval, resulting in an overall smoother presentation to the user than if we were to correct more forcefully. 3. If more than `kVRRImmediatePresentThreshold = 3` frames are in the queue to be presented, start telling the system to present immediately instead of "nudging." - This is a more forceful correction toward the target present interval, and must be kept low for systems that want to present at intervals that are far away from any achievable consistent present interval. 4. If the queue gets much deeper, `kMaxSourceBuffersInFlight = 6`, start dropping frames. - This is necessary for synchronizing to neither audio nor video, or else in rare cases where the GPU is overloaded, such as for a shader that the system is not capable of rendering in time each frame. This system achieves decent results across most systems in my testing. I considered allowing these constants to be twiddled with sliders, but was wary of presenting too many esoteric options to the user. If it's determined to be worth exposing these constants, that can be done in future work (if a superior VRR presentation strategy entirely is not discovered by that point). ## Miscellany - The threaded renderer is necessary so that we do not ever block the main thread in the worst case present interval conditions. However, it only works well at relatively low audio latencies. At higher audio latencies (as for users on lower-specced systems), the queue overflows easily and it works better to render synchronously. - The threaded renderer can also have unintuitive results when exclusively syncing to host VSync or GPU sync, so for users that expect particular behavior from that functionality, it is better left disabled. - For PAL refresh intervals, players will get generally better results in exclusive fullscreen. macOS likes to render things at 60Hz and will struggle to render near 50 Hz with other elements onscreen. - To view the present interval graph as in these debug shots, input `defaults write -g MetalForceHudEnabled -bool YES` at the command line before launching ares. ## Future Work We could possibly achieve better VRR results utilizing something like `CAMetalDisplayLink` after freeing up the macOS main thread. In my testing it seems to have the same limitations in terms of achievable consistent present intervals, but it may offer a more precise picture of host vs. guest present timing. Regardless of the viability of other strategies, it would still be valuable to free up the main thread on macOS for the sake of other system APIs that may be used in the future, plus benefits of being able to use the UI consistently and smoothly concurrently with emulation. N64 and PS1 also do not fully implement the `refreshRateHint` API, and the API may be buggy for some platforms. Metal refresh rate hints currently appear in stdout, so if you are seeing an issue with VRR sync, check the console to see the guest's requested present interval first. ## Gallery https://github.com/ares-emulator/ares/assets/6864788/1022c24b-fd83-4ede-b1ac-3f0c3a207972 https://github.com/ares-emulator/ares/assets/6864788/7e36bd4a-7acd-4d37-a572-7550ba1ad2b2 (PAL Super Mario World) https://github.com/ares-emulator/ares/assets/6864788/9655fcbe-28de-48b2-88b7-7fb864991ef0 Co-authored-by: jcm <butt@butts.com>
2024-04-22 03:27:16 +09:00
#!/bin/bash
set -euo pipefail
pushd ../ruby/video/metal
xcrun -sdk macosx metal -o shaders.ir -c -gline-tables-only -frecord-sources Shaders.metal
xcrun -sdk macosx metallib -o shaders.metallib shaders.ir
popd