ares-openbsd/.gitignore

19 行
198 B
Plaintext
Raw 通常表示 履歴

2023-03-23 16:09:35 +09:00
*.moc
*.user
ruby: Add Metal display backend (#1431) This PR adds a Metal display backend to ares. <img width="1482" alt="Screenshot 2024-03-31 at 7 02 40 PM" src="https://github.com/ares-emulator/ares/assets/6864788/48718f19-9916-491b-8301-c16b95f95c19"> *ares N64 core on a 60Hz P3 display. Shader: "crt-maximus-royale".* ## Context With ares's recent introduction of librashader, ares users have enjoyed access to the sizable library of shaders in the slang format, such as those used by the RetroArch frontend. Simultaneously, up to now, the CGL OpenGL backend shipped by ares for Mac users has offered enough functionality to do the (relatively) simple job of an ares display backend; display software-rendered frame buffers and pace them appropriately. librashader's advanced postprocessing, however, has asked a bit more of the display backend, and in the case of OpenGL, has laid bare various deficiencies in macOS's OpenGL driver. These deficiencies can kneecap ares's librashader support on macOS with compiler errors and broken shader behavior. Rather than chase down these errors and try to work around what is fundamentally a broken driver shipped by Apple, we take advantage of librashader's native Metal support by adding a Metal backend to ares. This greatly increases baseline shader compatibility, with the added benefit of documentation and greater platform debugging information when issues arise. The Metal backend will also generally future-proof ares on macOS, with OpenGL's uncertain future on macOS. ## Basics The first iteration of this Metal driver mostly offers feature parity with the OpenGL driver. More advanced features, particularly in the realm of frame pacing on ProMotion displays, will arrive in future iterations. The priority with this PR is to start getting the driver in the hands of users with basic features and greater librashader compatibility. Host refresh rate sync is still a work in progress and in this iteration will only be enabled for users above 10.15.4 on non-VRR displays (discussed more below). Explicit sRGB color matching is offered as a new option for those on wide gamut displays (by default, the Metal driver will map to the native color space, conforming to OpenGL driver behavior). This option is only exposed on macOS, since other operating systems lack per-surface color matching. <img width="400" alt="Screenshot 2024-03-31 at 6 37 38 PM" src="https://github.com/ares-emulator/ares/assets/6864788/e177a256-bba2-4a02-ad5b-9ad84cb09740"> <img width="400" alt="Screenshot 2024-03-31 at 6 37 46 PM" src="https://github.com/ares-emulator/ares/assets/6864788/dd517394-e97e-40f6-98de-4562d62bda98"> *Left: ares presenting in Display P3. Right: ares presenting more accurately in sRGB with "Force sRGB" enabled. Shader: crt-hyllian.* A simple vertex and fragment shader are compiled at runtime, so we avoid the need to add another compiler toolchain to the ares build process. There is unfortunately some code duplicated at present; `metal.cpp` and `Shaders.metal` both include types defined in `ShaderTypes.h`, but we also need to place `Shaders.metal` inside the .app bundle for runtime compilation, making for awkward `#include`s. Presently, we just bundle a copy of `Shaders.metal` appended with `ShaderTypes.h`, inside `desktop-ui/resource`. This will be cleaned up in future work. The driver's draw implementation itself is fairly simple; one render pass renders to an offscreen texture, librashader performs its work on that offscreen texture, and a second ares Metal render pass composites the finished texture inside ares's viewport. Since we do not use `MTKViewDelegate`, our output function finishes with a `[view draw]` call and the system presents at the earliest opportunity. ## Details When it came to the details of implementing this driver, there were some nontrivial issues encountered. Some of these will need solving in separate PRs before this driver is feature-complete. ### ares vs. VRR Users on fixed refresh rate displays should enjoy good frame pacing with this driver. Unfortunately, users of more recent Mac machines with "ProMotion" refresh rates will not have an ideal experience in terms of pacing. To understand why, we need to take a brief detour into how ares works and then discuss some current limitations with ares's macOS integration. In ares "synchronize to audio" mode, ares creates and delivers video frames as audio frames are created. This means that the video frame timing is completely dependent on when exactly the audio driver processes audio frames. For display modes with a refresh rate at or close to the core refresh rate, this is mostly no problem; the system seems to naturally present frames in a FIFO-esque fashion, and every once in awhile the system will just drop or duplicate a frame if two or no draw calls fall within one refresh interval. For recent more advanced Mac displays with "up to 120Hz" refresh rate, the story is more complicated. We have to explicitly tell the system when we want it to draw the frame once available. It is tempting to answer "now"; after all, if our audio timings are correct, then video frames should be generated precisely when they need to be shown. Unfortunately, in higher latency modes of OpenAL or with SDL audio in general on macOS, audio frames are processed in large batches. That means that we end up emitting several video frames in quick succession at 8ms intervals on a 120Hz display, then waiting as long as 75ms for a new batch of audio (and thus video) frames: <img width="265" alt="Screenshot 2024-03-31 at 4 55 25 PM" src="https://github.com/ares-emulator/ares/assets/6864788/69f887a0-8ded-4f47-ac4c-4cf2b58283a7"> <img width="265" alt="Screenshot 2024-03-31 at 4 55 35 PM" src="https://github.com/ares-emulator/ares/assets/6864788/2f7a417a-0378-44b5-b0c0-f894542c0371"> <img width="265" alt="Screenshot 2024-03-31 at 4 55 56 PM" src="https://github.com/ares-emulator/ares/assets/6864788/13b471d9-3e77-45cd-baf4-0d786ae83e22"> *VRR macOS frame pacing across audio driver settings in v0.1 of the Metal driver. The graph in blue shows frame present intervals over time; the values in red show the minimum and maximum present intervals over the graph duration.* If we do not answer "now," we have to decide when to present. Unfortunately, currently, there is not a satisfying way to answer that question. Core refresh rates vary somewhat widely, sometimes during runtime, and there is no mechanism in ares by which to inform the graphics driver of a core's desired refresh rate. We could elect to just duplicate the behavior for a fixed display refresh rate and pick, e.g. 60 Hz, but unfortunately even that option is not available, because we currently have no way of receiving callbacks when a frame is actually presented. Why not `CAMetalDisplayLink` or even `MTKViewDelegate` you ask? Well... ### ares vs. macOS For most of its cores, ares performs much of its work on a single dedicated main thread that blocks for audio and video presentation to drive hardware-accurate timing. Unfortunately, all of this work occurs on the macOS main thread, with lots of blocking and CPU-intensive activity. This interferes with the macOS application run loop's ability to perform its callbacks and call out to observers. In practice, this means that if we try to employ delegates that interface with macOS, that could send a callback when a frame is presented, or tell ares the exact moment a frame needs to be presented, these system delegates cannot actually make these calls in time in between ares's main thread activity; upwards of 50% of `MTKViewDelegate` callbacks are lost, for example. This means that tools like `MTKViewDelegate` or `CAMetalDisplayLink` that would help us solve the frame pacing problem are, unfortunately, useless to us. We cannot leverage these tools as ares is currently architected. To get around these issues, we will need one of: less main thread blocking, so delegates can interface with ares on the main thread, or an audio driver with a processing tolerance that falls within the display's minimum refresh interval. Our best bet for now is to emit frames to the system within ares's main thread work as they come available, let the system draw them as it will, and hope that our audio driver is doing a good job pacing them. In practice, for Metal driver users on VRR displays you cannot set to a fixed rate, this means you should use the OpenAL driver, and set the latency to the lowest value possible. ## Future Work The future for the Metal driver in ares takes us down a few different paths. The main issue at present is making macOS system delegates work well with ares, which is the ideal path forward. Ideally, we could move all of the emulation-intensive work off of the main thread in macOS and into a high priority dedicated thread, reserving the main thread for actual UI and rendering, giving the system plenty of overhead with which to communicate. For the future of VRR in ares, it would be good to create a mechanism to tell the graphics driver what refresh rate the core wants to present at. This would be one way to pace draw calls appropriately in the absence of reliable feedback from the system about the state of the display. It has gone without mentioning so far due to the other issues, but long term, it would also be good for ares or librashader to have some way of utilizing the entire viewport for shaders; currently, shaders are limited to the output width and height area rather than the entire window view size. This is limiting for "bezel"-style shaders that want to use the entire screen in fullscreen, for example. Co-authored-by: jcm <butt@butts.com>
2024-04-01 22:31:28 +09:00
*.xcuserdata
.vs/
.vscode/
.idea/
cmake-*/
obj/
out/
obj-amd64/
obj-arm64/
out-amd64/
out-arm64/
thirdparty/SDL/SDL
thirdparty/SDL/libSDL2-2.0.0.dylib
ruby: Add Metal display backend (#1431) This PR adds a Metal display backend to ares. <img width="1482" alt="Screenshot 2024-03-31 at 7 02 40 PM" src="https://github.com/ares-emulator/ares/assets/6864788/48718f19-9916-491b-8301-c16b95f95c19"> *ares N64 core on a 60Hz P3 display. Shader: "crt-maximus-royale".* ## Context With ares's recent introduction of librashader, ares users have enjoyed access to the sizable library of shaders in the slang format, such as those used by the RetroArch frontend. Simultaneously, up to now, the CGL OpenGL backend shipped by ares for Mac users has offered enough functionality to do the (relatively) simple job of an ares display backend; display software-rendered frame buffers and pace them appropriately. librashader's advanced postprocessing, however, has asked a bit more of the display backend, and in the case of OpenGL, has laid bare various deficiencies in macOS's OpenGL driver. These deficiencies can kneecap ares's librashader support on macOS with compiler errors and broken shader behavior. Rather than chase down these errors and try to work around what is fundamentally a broken driver shipped by Apple, we take advantage of librashader's native Metal support by adding a Metal backend to ares. This greatly increases baseline shader compatibility, with the added benefit of documentation and greater platform debugging information when issues arise. The Metal backend will also generally future-proof ares on macOS, with OpenGL's uncertain future on macOS. ## Basics The first iteration of this Metal driver mostly offers feature parity with the OpenGL driver. More advanced features, particularly in the realm of frame pacing on ProMotion displays, will arrive in future iterations. The priority with this PR is to start getting the driver in the hands of users with basic features and greater librashader compatibility. Host refresh rate sync is still a work in progress and in this iteration will only be enabled for users above 10.15.4 on non-VRR displays (discussed more below). Explicit sRGB color matching is offered as a new option for those on wide gamut displays (by default, the Metal driver will map to the native color space, conforming to OpenGL driver behavior). This option is only exposed on macOS, since other operating systems lack per-surface color matching. <img width="400" alt="Screenshot 2024-03-31 at 6 37 38 PM" src="https://github.com/ares-emulator/ares/assets/6864788/e177a256-bba2-4a02-ad5b-9ad84cb09740"> <img width="400" alt="Screenshot 2024-03-31 at 6 37 46 PM" src="https://github.com/ares-emulator/ares/assets/6864788/dd517394-e97e-40f6-98de-4562d62bda98"> *Left: ares presenting in Display P3. Right: ares presenting more accurately in sRGB with "Force sRGB" enabled. Shader: crt-hyllian.* A simple vertex and fragment shader are compiled at runtime, so we avoid the need to add another compiler toolchain to the ares build process. There is unfortunately some code duplicated at present; `metal.cpp` and `Shaders.metal` both include types defined in `ShaderTypes.h`, but we also need to place `Shaders.metal` inside the .app bundle for runtime compilation, making for awkward `#include`s. Presently, we just bundle a copy of `Shaders.metal` appended with `ShaderTypes.h`, inside `desktop-ui/resource`. This will be cleaned up in future work. The driver's draw implementation itself is fairly simple; one render pass renders to an offscreen texture, librashader performs its work on that offscreen texture, and a second ares Metal render pass composites the finished texture inside ares's viewport. Since we do not use `MTKViewDelegate`, our output function finishes with a `[view draw]` call and the system presents at the earliest opportunity. ## Details When it came to the details of implementing this driver, there were some nontrivial issues encountered. Some of these will need solving in separate PRs before this driver is feature-complete. ### ares vs. VRR Users on fixed refresh rate displays should enjoy good frame pacing with this driver. Unfortunately, users of more recent Mac machines with "ProMotion" refresh rates will not have an ideal experience in terms of pacing. To understand why, we need to take a brief detour into how ares works and then discuss some current limitations with ares's macOS integration. In ares "synchronize to audio" mode, ares creates and delivers video frames as audio frames are created. This means that the video frame timing is completely dependent on when exactly the audio driver processes audio frames. For display modes with a refresh rate at or close to the core refresh rate, this is mostly no problem; the system seems to naturally present frames in a FIFO-esque fashion, and every once in awhile the system will just drop or duplicate a frame if two or no draw calls fall within one refresh interval. For recent more advanced Mac displays with "up to 120Hz" refresh rate, the story is more complicated. We have to explicitly tell the system when we want it to draw the frame once available. It is tempting to answer "now"; after all, if our audio timings are correct, then video frames should be generated precisely when they need to be shown. Unfortunately, in higher latency modes of OpenAL or with SDL audio in general on macOS, audio frames are processed in large batches. That means that we end up emitting several video frames in quick succession at 8ms intervals on a 120Hz display, then waiting as long as 75ms for a new batch of audio (and thus video) frames: <img width="265" alt="Screenshot 2024-03-31 at 4 55 25 PM" src="https://github.com/ares-emulator/ares/assets/6864788/69f887a0-8ded-4f47-ac4c-4cf2b58283a7"> <img width="265" alt="Screenshot 2024-03-31 at 4 55 35 PM" src="https://github.com/ares-emulator/ares/assets/6864788/2f7a417a-0378-44b5-b0c0-f894542c0371"> <img width="265" alt="Screenshot 2024-03-31 at 4 55 56 PM" src="https://github.com/ares-emulator/ares/assets/6864788/13b471d9-3e77-45cd-baf4-0d786ae83e22"> *VRR macOS frame pacing across audio driver settings in v0.1 of the Metal driver. The graph in blue shows frame present intervals over time; the values in red show the minimum and maximum present intervals over the graph duration.* If we do not answer "now," we have to decide when to present. Unfortunately, currently, there is not a satisfying way to answer that question. Core refresh rates vary somewhat widely, sometimes during runtime, and there is no mechanism in ares by which to inform the graphics driver of a core's desired refresh rate. We could elect to just duplicate the behavior for a fixed display refresh rate and pick, e.g. 60 Hz, but unfortunately even that option is not available, because we currently have no way of receiving callbacks when a frame is actually presented. Why not `CAMetalDisplayLink` or even `MTKViewDelegate` you ask? Well... ### ares vs. macOS For most of its cores, ares performs much of its work on a single dedicated main thread that blocks for audio and video presentation to drive hardware-accurate timing. Unfortunately, all of this work occurs on the macOS main thread, with lots of blocking and CPU-intensive activity. This interferes with the macOS application run loop's ability to perform its callbacks and call out to observers. In practice, this means that if we try to employ delegates that interface with macOS, that could send a callback when a frame is presented, or tell ares the exact moment a frame needs to be presented, these system delegates cannot actually make these calls in time in between ares's main thread activity; upwards of 50% of `MTKViewDelegate` callbacks are lost, for example. This means that tools like `MTKViewDelegate` or `CAMetalDisplayLink` that would help us solve the frame pacing problem are, unfortunately, useless to us. We cannot leverage these tools as ares is currently architected. To get around these issues, we will need one of: less main thread blocking, so delegates can interface with ares on the main thread, or an audio driver with a processing tolerance that falls within the display's minimum refresh interval. Our best bet for now is to emit frames to the system within ares's main thread work as they come available, let the system draw them as it will, and hope that our audio driver is doing a good job pacing them. In practice, for Metal driver users on VRR displays you cannot set to a fixed rate, this means you should use the OpenAL driver, and set the latency to the lowest value possible. ## Future Work The future for the Metal driver in ares takes us down a few different paths. The main issue at present is making macOS system delegates work well with ares, which is the ideal path forward. Ideally, we could move all of the emulation-intensive work off of the main thread in macOS and into a high priority dedicated thread, reserving the main thread for actual UI and rendering, giving the system plenty of overhead with which to communicate. For the future of VRR in ares, it would be good to create a mechanism to tell the graphics driver what refresh rate the core wants to present at. This would be one way to pace draw calls appropriately in the absence of reliable feedback from the system about the state of the display. It has gone without mentioning so far due to the other issues, but long term, it would also be good for ares or librashader to have some way of utilizing the entire viewport for shaders; currently, shaders are limited to the output width and height area rather than the entire window view size. This is limiting for "bezel"-style shaders that want to use the entire screen in fullscreen, for example. Co-authored-by: jcm <butt@butts.com>
2024-04-01 22:31:28 +09:00
macos-xcode/
.swiftpm
ruby: Add Metal VRR support, various Metal driver fixes (#1455) Adds to the Metal backend from #1431, extending host VSync functionality to variable refresh rate (VRR) displays, enabling the ability to sync to guest and host refresh rates simultaneously, along with an assortment of minor changes and code cleanups: - Adds a threaded renderer option (technically a GCD queue) to the Metal backend; essential for VRR at lower audio latencies. At higher audio latencies, the synchronous rendering option may function better. - Add easier debug capabilities, including a shell script to compile a debug `.metallib` shader library which will be used at runtime if ares is compiled in debug mode. The debug `.metallib` enables GPU frame capture. - Implement `clear()` for the Metal backend; the last displayed frame will no longer stay on the screen if a core is explicitly unloaded. - Implement fullscreen mode. Uses a borderless window that covers the entire screen, rather than idiomatic fullscreen, primarily in order to render around the camera housing. Normal macOS fullscreen behavior is available via the main window title bar controls. - Remove unused custom window code. - Remove redundant copy of `Shaders.metal`. - Miscellaneous small fixes and code cleanups to resolve compiler warnings. https://github.com/ares-emulator/ares/assets/6864788/1e2d594a-c84f-4b95-a792-cacdde2a09b0 ## ares vs. VRR As discussed in #1431, there are some challenges in implementing smooth VRR support for ares on Metal. Briefly: - ares lacked a mechanism to tell the display backend the core's desired refresh rate, in order to sync to it. - ares on macOS lacked an effective mechanism for receiving callbacks when frames were presented, in order to keep track of whether we were running fast or slow, because of the way ares monopolizes the main thread on macOS. The first issue has been resolved by a3c57b4 and ec0b625. Cores now tell the `Screen` instance how often they want to present, and the screen instance will tell the display backend if it implements the `refreshRateHint` function. This enables the display backend to quickly respond to changing guest refresh rates even during runtime. The second issue is a bit trickier. As discussed, `MTKViewDelegate` and `CAMetalDisplayLink` are both effectively unavailable until ares moves its primary emulation work off of the main thread. There is, however, one Metal callback mechanism that can still be used by ares; [[MTLDrawable addPresentedHandler:]](https://developer.apple.com/documentation/metal/mtldrawable/2806858-addpresentedhandler?language=objc), because it calls back on a dedicated serial queue (`com.apple.coreanimation.CAMachPortUtilReplyQueue`) rather than the main thread. Using this presented handler, ares can use `drawable.presentedTime` to keep a running average of how long our frames are being presented for, determine if we are running ahead or behind, and modify our subsequent present intervals accordingly. That is what we do for this initial VRR implementation. ## macOS VRR > [!NOTE] > The following applies to "ProMotion" displays, as those are the ones most tightly integrated with macOS and most likely to be used by Mac users on VRR displays. Unfortunately, VRR sync is not as simple as just picking a present interval and telling macOS to present each frame for the guest's requested interval. macOS only pretends to offer this capability in exclusive fullscreen mode, and even then, present intervals cannot be completely arbitrary. Realistically, we seem to be able to get consistent present intervals for some integer refresh rates between 40 Hz and 120 Hz, and certain rational present intervals (59.97, 23.976 among them). The spread of achievable consistent present intervals seems to be arbitrary enough that there isn't any sense in targeting them specifically. Even if we can theoretically present at the exact interval the guest wants, we inevitably end up falling behind due to transient load conditions. > [!NOTE] > For background, the landscape of guest refresh rates in ares is quite wide. Many systems present near 60 Hz; 59.97, 60.01, 59.73, 59.92, 59.82, etc. Many of these systems have PAL modes that present near 50 with similar variations. The WonderSwan presents at a maximum of ~75.47 Hz, with per-game variations. The Atari 2600 can present at completely arbitrary intervals during runtime owing to its CRT- and processor-centric presentation strategies. In short, It is difficult to narrow the range of expected guest present intervals if we wanted to simplify this problem. Rather, we have to pursue a more holistic strategy. The strategy ares uses with this PR is as follows: 1. The output function sends frames into a FIFO dispatch queue that can present asynchronously, to reduce the burden of needing to present immediately if we are in danger of blocking the main thread. 2. If we are more than 0.5% off of the targeted present interval as determined by a weighted rolling average, start "nudging" the system to present at earlier intervals calculated according to the difference between the rolling average and the target, multiplied by a constant `kVRRCorrectiveForce`. - This is done because we can often prod the system into presenting at rational intervals close to the target interval, resulting in an overall smoother presentation to the user than if we were to correct more forcefully. 3. If more than `kVRRImmediatePresentThreshold = 3` frames are in the queue to be presented, start telling the system to present immediately instead of "nudging." - This is a more forceful correction toward the target present interval, and must be kept low for systems that want to present at intervals that are far away from any achievable consistent present interval. 4. If the queue gets much deeper, `kMaxSourceBuffersInFlight = 6`, start dropping frames. - This is necessary for synchronizing to neither audio nor video, or else in rare cases where the GPU is overloaded, such as for a shader that the system is not capable of rendering in time each frame. This system achieves decent results across most systems in my testing. I considered allowing these constants to be twiddled with sliders, but was wary of presenting too many esoteric options to the user. If it's determined to be worth exposing these constants, that can be done in future work (if a superior VRR presentation strategy entirely is not discovered by that point). ## Miscellany - The threaded renderer is necessary so that we do not ever block the main thread in the worst case present interval conditions. However, it only works well at relatively low audio latencies. At higher audio latencies (as for users on lower-specced systems), the queue overflows easily and it works better to render synchronously. - The threaded renderer can also have unintuitive results when exclusively syncing to host VSync or GPU sync, so for users that expect particular behavior from that functionality, it is better left disabled. - For PAL refresh intervals, players will get generally better results in exclusive fullscreen. macOS likes to render things at 60Hz and will struggle to render near 50 Hz with other elements onscreen. - To view the present interval graph as in these debug shots, input `defaults write -g MetalForceHudEnabled -bool YES` at the command line before launching ares. ## Future Work We could possibly achieve better VRR results utilizing something like `CAMetalDisplayLink` after freeing up the macOS main thread. In my testing it seems to have the same limitations in terms of achievable consistent present intervals, but it may offer a more precise picture of host vs. guest present timing. Regardless of the viability of other strategies, it would still be valuable to free up the main thread on macOS for the sake of other system APIs that may be used in the future, plus benefits of being able to use the UI consistently and smoothly concurrently with emulation. N64 and PS1 also do not fully implement the `refreshRateHint` API, and the API may be buggy for some platforms. Metal refresh rate hints currently appear in stdout, so if you are seeing an issue with VRR sync, check the console to see the guest's requested present interval first. ## Gallery https://github.com/ares-emulator/ares/assets/6864788/1022c24b-fd83-4ede-b1ac-3f0c3a207972 https://github.com/ares-emulator/ares/assets/6864788/7e36bd4a-7acd-4d37-a572-7550ba1ad2b2 (PAL Super Mario World) https://github.com/ares-emulator/ares/assets/6864788/9655fcbe-28de-48b2-88b7-7fb864991ef0 Co-authored-by: jcm <butt@butts.com>
2024-04-22 03:27:16 +09:00
*.xcodeproj