ruby: Add Metal VRR support, various Metal driver fixes (#1455)

Adds to the Metal backend from #1431, extending host VSync functionality
to variable refresh rate (VRR) displays, enabling the ability to sync to
guest and host refresh rates simultaneously, along with an assortment of
minor changes and code cleanups:

- Adds a threaded renderer option (technically a GCD queue) to the Metal
backend; essential for VRR at lower audio latencies. At higher audio
latencies, the synchronous rendering option may function better.
- Add easier debug capabilities, including a shell script to compile a
debug `.metallib` shader library which will be used at runtime if ares
is compiled in debug mode. The debug `.metallib` enables GPU frame
capture.
- Implement `clear()` for the Metal backend; the last displayed frame
will no longer stay on the screen if a core is explicitly unloaded.
- Implement fullscreen mode. Uses a borderless window that covers the
entire screen, rather than idiomatic fullscreen, primarily in order to
render around the camera housing. Normal macOS fullscreen behavior is
available via the main window title bar controls.
- Remove unused custom window code.
- Remove redundant copy of `Shaders.metal`.
- Miscellaneous small fixes and code cleanups to resolve compiler
warnings.


https://github.com/ares-emulator/ares/assets/6864788/1e2d594a-c84f-4b95-a792-cacdde2a09b0

## ares vs. VRR
As discussed in #1431, there are some challenges in implementing smooth
VRR support for ares on Metal. Briefly:

- ares lacked a mechanism to tell the display backend the core's desired
refresh rate, in order to sync to it.
- ares on macOS lacked an effective mechanism for receiving callbacks
when frames were presented, in order to keep track of whether we were
running fast or slow, because of the way ares monopolizes the main
thread on macOS.

The first issue has been resolved by a3c57b4 and ec0b625. Cores now tell
the `Screen` instance how often they want to present, and the screen
instance will tell the display backend if it implements the
`refreshRateHint` function. This enables the display backend to quickly
respond to changing guest refresh rates even during runtime.

The second issue is a bit trickier. As discussed, `MTKViewDelegate` and
`CAMetalDisplayLink` are both effectively unavailable until ares moves
its primary emulation work off of the main thread. There is, however,
one Metal callback mechanism that can still be used by ares;
[[MTLDrawable
addPresentedHandler:]](https://developer.apple.com/documentation/metal/mtldrawable/2806858-addpresentedhandler?language=objc),
because it calls back on a dedicated serial queue
(`com.apple.coreanimation.CAMachPortUtilReplyQueue`) rather than the
main thread.

Using this presented handler, ares can use `drawable.presentedTime` to
keep a running average of how long our frames are being presented for,
determine if we are running ahead or behind, and modify our subsequent
present intervals accordingly. That is what we do for this initial VRR
implementation.

## macOS VRR

> [!NOTE]
> The following applies to "ProMotion" displays, as those are the ones
most tightly integrated with macOS and most likely to be used by Mac
users on VRR displays.

Unfortunately, VRR sync is not as simple as just picking a present
interval and telling macOS to present each frame for the guest's
requested interval. macOS only pretends to offer this capability in
exclusive fullscreen mode, and even then, present intervals cannot be
completely arbitrary. Realistically, we seem to be able to get
consistent present intervals for some integer refresh rates between 40
Hz and 120 Hz, and certain rational present intervals (59.97, 23.976
among them). The spread of achievable consistent present intervals seems
to be arbitrary enough that there isn't any sense in targeting them
specifically. Even if we can theoretically present at the exact interval
the guest wants, we inevitably end up falling behind due to transient
load conditions.

> [!NOTE]
> For background, the landscape of guest refresh rates in ares is quite
wide. Many systems present near 60 Hz; 59.97, 60.01, 59.73, 59.92,
59.82, etc. Many of these systems have PAL modes that present near 50
with similar variations. The WonderSwan presents at a maximum of ~75.47
Hz, with per-game variations. The Atari 2600 can present at completely
arbitrary intervals during runtime owing to its CRT- and
processor-centric presentation strategies. In short, It is difficult to
narrow the range of expected guest present intervals if we wanted to
simplify this problem.

Rather, we have to pursue a more holistic strategy. The strategy ares
uses with this PR is as follows:

1. The output function sends frames into a FIFO dispatch queue that can
present asynchronously, to reduce the burden of needing to present
immediately if we are in danger of blocking the main thread.
2. If we are more than 0.5% off of the targeted present interval as
determined by a weighted rolling average, start "nudging" the system to
present at earlier intervals calculated according to the difference
between the rolling average and the target, multiplied by a constant
`kVRRCorrectiveForce`.
- This is done because we can often prod the system into presenting at
rational intervals close to the target interval, resulting in an overall
smoother presentation to the user than if we were to correct more
forcefully.
3. If more than `kVRRImmediatePresentThreshold = 3` frames are in the
queue to be presented, start telling the system to present immediately
instead of "nudging."
- This is a more forceful correction toward the target present interval,
and must be kept low for systems that want to present at intervals that
are far away from any achievable consistent present interval.
4. If the queue gets much deeper, `kMaxSourceBuffersInFlight = 6`, start
dropping frames.
- This is necessary for synchronizing to neither audio nor video, or
else in rare cases where the GPU is overloaded, such as for a shader
that the system is not capable of rendering in time each frame.

This system achieves decent results across most systems in my testing. I
considered allowing these constants to be twiddled with sliders, but was
wary of presenting too many esoteric options to the user. If it's
determined to be worth exposing these constants, that can be done in
future work (if a superior VRR presentation strategy entirely is not
discovered by that point).

## Miscellany

- The threaded renderer is necessary so that we do not ever block the
main thread in the worst case present interval conditions. However, it
only works well at relatively low audio latencies. At higher audio
latencies (as for users on lower-specced systems), the queue overflows
easily and it works better to render synchronously.
- The threaded renderer can also have unintuitive results when
exclusively syncing to host VSync or GPU sync, so for users that expect
particular behavior from that functionality, it is better left disabled.
- For PAL refresh intervals, players will get generally better results
in exclusive fullscreen. macOS likes to render things at 60Hz and will
struggle to render near 50 Hz with other elements onscreen.
- To view the present interval graph as in these debug shots, input
`defaults write -g MetalForceHudEnabled -bool YES` at the command line
before launching ares.

## Future Work

We could possibly achieve better VRR results utilizing something like
`CAMetalDisplayLink` after freeing up the macOS main thread. In my
testing it seems to have the same limitations in terms of achievable
consistent present intervals, but it may offer a more precise picture of
host vs. guest present timing.

Regardless of the viability of other strategies, it would still be
valuable to free up the main thread on macOS for the sake of other
system APIs that may be used in the future, plus benefits of being able
to use the UI consistently and smoothly concurrently with emulation.

N64 and PS1 also do not fully implement the `refreshRateHint` API, and
the API may be buggy for some platforms. Metal refresh rate hints
currently appear in stdout, so if you are seeing an issue with VRR sync,
check the console to see the guest's requested present interval first.

## Gallery 


https://github.com/ares-emulator/ares/assets/6864788/1022c24b-fd83-4ede-b1ac-3f0c3a207972


https://github.com/ares-emulator/ares/assets/6864788/7e36bd4a-7acd-4d37-a572-7550ba1ad2b2

(PAL Super Mario World)


https://github.com/ares-emulator/ares/assets/6864788/9655fcbe-28de-48b2-88b7-7fb864991ef0

Co-authored-by: jcm <butt@butts.com>
このコミットが含まれているのは:
jcm 2024-04-21 13:27:16 -05:00 committed by GitHub
コミット b5cd509a77
この署名に対応する既知のキーがデータベースに存在しません
GPGキーID: B5690EEEBB952194
11個のファイルの変更330行の追加250行の削除

2
.gitignore vendored
ファイルの表示

@ -15,4 +15,4 @@ thirdparty/SDL/SDL
thirdparty/SDL/libSDL2-2.0.0.dylib
macos-xcode/
.swiftpm
*.xcodeproj
*.xcodeproj

ファイルの表示

@ -129,6 +129,10 @@ endif
cp resource/$(name).plist $(output.path)/$(name).app/Contents/Info.plist
$(call mkdir,$(output.path)/$(name).app/Contents/Resources/Shaders/)
$(call mkdir,$(output.path)/$(name).app/Contents/Resources/Database/)
ifeq ($(build),debug)
# Try to copy the debug .metallib into the .app bundle; if we don't succeed, user will see runtime log message
-cp ../ruby/video/metal/shaders.metallib $(output.path)/$(name).app/Contents/Resources/Shaders/shaders.metallib
endif
cp ../ruby/video/metal/Shaders.metal $(output.path)/$(name).app/Contents/Resources/Shaders/Shaders.metal
$(call rcopy,$(thirdparty.path)/slang-shaders/*,$(output.path)/$(name).app/Contents/Resources/Shaders/)
$(call rcopy,$(mia.path)/Database/*,$(output.path)/$(name).app/Contents/Resources/Database/)

ファイルの表示

@ -1,82 +0,0 @@
//
// Shaders.metal
// ares
//
// Created by jcm on 2024-03-07.
//
#include <metal_stdlib>
#include "ShaderTypes.h"
using namespace metal;
// Vertex shader outputs and fragment shader inputs
struct RasterizerData
{
// The [[position]] attribute of this member indicates that this value
// is the clip space position of the vertex when this structure is
// returned from the vertex function.
float4 position [[position]];
// Since this member does not have a special attribute, the rasterizer
// interpolates its value with the values of the other triangle vertices
// and then passes the interpolated value to the fragment shader for each
// fragment in the triangle.
float2 textureCoordinate;
};
vertex RasterizerData
vertexShader(uint vertexID [[vertex_id]],
constant AAPLVertex *vertices [[buffer(AAPLVertexInputIndexVertices)]],
constant vector_uint2 *viewportSizePointer [[buffer(AAPLVertexInputIndexViewportSize)]])
{
RasterizerData out;
// Index into the array of positions to get the current vertex.
// The positions are specified in pixel dimensions (i.e. a value of 100
// is 100 pixels from the origin).
float2 pixelSpacePosition = vertices[vertexID].position.xy;
// Get the viewport size and cast to float.
vector_float2 viewportSize = vector_float2(*viewportSizePointer);
// To convert from positions in pixel space to positions in clip-space,
// divide the pixel coordinates by half the size of the viewport.
out.position = vector_float4(0.0, 0.0, 0.0, 1.0);
out.position.xy = pixelSpacePosition / (viewportSize / 2.0);
// Pass the input color directly to the rasterizer.
out.textureCoordinate = vertices[vertexID].textureCoordinate;
return out;
}
fragment float4
samplingShader(RasterizerData in [[stage_in]],
texture2d<half> colorTexture [[ texture(AAPLTextureIndexBaseColor) ]])
{
constexpr sampler textureSampler (mag_filter::nearest,
min_filter::nearest);
// Sample the texture to obtain a color
const half4 colorSample = colorTexture.sample(textureSampler, in.textureCoordinate);
// return the color of the texture
return float4(colorSample);
}
fragment float4
drawableSamplingShader(RasterizerData in [[stage_in]],
texture2d<half> colorTexture [[ texture(AAPLTextureIndexBaseColor) ]])
{
constexpr sampler textureSampler (mag_filter::linear,
min_filter::linear);
// Sample the texture to obtain a color
const half4 colorSample = colorTexture.sample(textureSampler, in.textureCoordinate);
// return the color of the texture
return float4(colorSample);
}

ファイルの表示

@ -44,6 +44,10 @@ auto DriverSettings::construct() -> void {
settings.video.forceSRGB = videoColorSpaceToggle.checked();
ruby::video.setForceSRGB(settings.video.forceSRGB);
});
videoThreadedRendererToggle.setText("Threaded").onToggle([&] {
settings.video.threadedRenderer = videoThreadedRendererToggle.checked();
ruby::video.setThreadedRenderer(settings.video.threadedRenderer);
});
#endif
audioLabel.setText("Audio").setFont(Font().setBold());
@ -155,6 +159,7 @@ auto DriverSettings::videoRefresh() -> void {
videoBlockingToggle.setChecked(ruby::video.blocking()).setEnabled(ruby::video.hasBlocking());
#if defined(PLATFORM_MACOS)
videoColorSpaceToggle.setChecked(ruby::video.forceSRGB()).setEnabled(ruby::video.hasForceSRGB());
videoThreadedRendererToggle.setChecked(ruby::video.threadedRenderer()).setEnabled(ruby::video.hasThreadedRenderer());
#endif
videoFlushToggle.setChecked(ruby::video.flush()).setEnabled(ruby::video.hasFlush());
VerticalLayout::resize();

ファイルの表示

@ -57,6 +57,7 @@ auto Settings::process(bool load) -> void {
bind(boolean, "Video/Exclusive", video.exclusive);
bind(boolean, "Video/Blocking", video.blocking);
bind(boolean, "Video/PresentSRGB", video.forceSRGB);
bind(boolean, "Video/ThreadedRenderer", video.threadedRenderer);
bind(boolean, "Video/Flush", video.flush);
bind(string, "Video/Shader", video.shader);
bind(natural, "Video/Multiplier", video.multiplier);

ファイルの表示

@ -12,6 +12,7 @@ struct Settings : Markup::Node {
bool exclusive = false;
bool blocking = false;
bool forceSRGB = false;
bool threadedRenderer = true;
bool flush = false;
string shader = "None";
u32 multiplier = 2;
@ -337,6 +338,7 @@ struct DriverSettings : VerticalLayout {
CheckLabel videoFlushToggle{&videoToggleLayout, Size{0, 0}};
#if defined(PLATFORM_MACOS)
CheckLabel videoColorSpaceToggle{&videoToggleLayout, Size{0, 0}};
CheckLabel videoThreadedRendererToggle{&videoToggleLayout, Size{0, 0}};
#endif
//
Label audioLabel{this, Size{~0, 0}, 5};

ファイルの表示

@ -18,15 +18,6 @@ struct VideoMetal;
-(BOOL) acceptsFirstResponder;
@end
@interface RubyWindowMetal : NSWindow <NSWindowDelegate> {
@public
VideoMetal* video;
}
-(id) initWith:(VideoMetal*)video;
-(BOOL) canBecomeKeyWindow;
-(BOOL) canBecomeMainWindow;
@end
struct VideoMetal : VideoDriver, Metal {
VideoMetal& self = *this;
VideoMetal(Video& super) : VideoDriver(super) {}
@ -39,21 +30,43 @@ struct VideoMetal : VideoDriver, Metal {
auto driver() -> string override { return "Metal"; }
auto ready() -> bool override { return _ready; }
auto hasFullScreen() -> bool override { return false; }
auto hasFullScreen() -> bool override { return true; }
auto hasContext() -> bool override { return true; }
auto hasBlocking() -> bool override {
if (@available(macOS 10.15.4, *)) {
return !isVRRSupported();
return true;
} else {
return false;
}
}
auto hasForceSRGB() -> bool override { return true; }
auto hasThreadedRenderer() -> bool override { return true; }
auto hasFlush() -> bool override { return true; }
auto hasShader() -> bool override { return true; }
auto setFullScreen(bool fullScreen) -> bool override {
return initialize();
/// This function implements non-idiomatic macOS fullscreen behavior that sets the window frame equal to the display's
/// frame size and hides the cursor. Idiomatic fullscreen is still available via the normal stoplight window controls. This
/// version of fullscreen is desirable because it allows us to render around the camera housing on newer Macs
/// (important for bezel-style shaders), has snappier entrance/exit and tabbing behavior, and functions better with
/// recording and capture software such as OBS and screen recorders. Hiding the mouse cursor is also essential to
/// rendering with appropriate frame pacing in Metal's 'direct' presentation mode.
// todo: unify with cursor auto-hide in hiro, ideally ares-wide fullscreen mode option
if (fullScreen) {
frameBeforeFullScreen = view.window.frame;
[NSApp setPresentationOptions:(NSApplicationPresentationAutoHideDock | NSApplicationPresentationAutoHideMenuBar)];
[view.window setStyleMask:NSWindowStyleMaskBorderless];
[view.window setFrame:view.window.screen.frame display:YES];
[NSCursor setHiddenUntilMouseMoves:YES];
} else {
[NSApp setPresentationOptions:NSApplicationPresentationDefault];
[view.window setStyleMask:NSWindowStyleMaskTitled];
[view.window setFrame:frameBeforeFullScreen display:YES];
}
[view.window makeFirstResponder:view];
return true;
}
auto setContext(uintptr context) -> bool override {
@ -72,6 +85,12 @@ struct VideoMetal : VideoDriver, Metal {
} else {
view.colorspace = view.window.screen.colorSpace.CGColorSpace;
}
return true;
}
auto setThreadedRenderer(bool threadedRenderer) -> bool override {
_threaded = threadedRenderer;
return true;
}
auto setFlush(bool flush) -> bool override {
@ -79,7 +98,8 @@ struct VideoMetal : VideoDriver, Metal {
return true;
}
auto refreshRateHint(double refreshRate) -> void {
auto refreshRateHint(double refreshRate) -> void override {
if (refreshRate == _refreshRateHint) return;
_refreshRateHint = refreshRate;
updatePresentInterval();
}
@ -88,7 +108,8 @@ struct VideoMetal : VideoDriver, Metal {
if (@available(macOS 12.0, *)) {
NSTimeInterval minInterval = view.window.screen.minimumRefreshInterval;
NSTimeInterval maxInterval = view.window.screen.maximumRefreshInterval;
return minInterval != maxInterval;
_vrrIsSupported = minInterval != maxInterval;
return _vrrIsSupported;
} else {
return false;
}
@ -105,6 +126,8 @@ struct VideoMetal : VideoDriver, Metal {
CFTimeInterval minimumInterval = view.window.screen.minimumRefreshInterval;
if (_refreshRateHint != 0) {
_presentInterval = (1.0 / _refreshRateHint);
NSLog(@"Refresh rate hint changed to %lf", _refreshRateHint);
averagePresentDuration = _presentInterval;
} else {
_presentInterval = minimumInterval;
}
@ -116,20 +139,24 @@ struct VideoMetal : VideoDriver, Metal {
if (_filterChain != NULL) {
_libra.mtl_filter_chain_free(&_filterChain);
}
if (_preset != NULL) {
_libra.preset_free(&_preset);
}
if (_libra.preset_create(pathname.data(), &_preset) != NULL) {
print(string{"Metal: Failed to load shader: ", pathname, "\n"});
if(file::exists(pathname)) {
if (_libra.preset_create(pathname.data(), &_preset) != NULL) {
print(string{"Metal: Failed to load shader: ", pathname, "\n"});
return false;
}
if (_libra.mtl_filter_chain_create(&_preset, _commandQueue, nil, &_filterChain) != NULL) {
print(string{"Metal: Failed to create filter chain for: ", pathname, "\n"});
return false;
};
} else {
return false;
}
if (_libra.mtl_filter_chain_create(&_preset, _commandQueue, nil, &_filterChain) != NULL) {
print(string{"Metal: Failed to create filter chain for: ", pathname, "\n"});
return false;
};
return true;
}
@ -137,7 +164,10 @@ struct VideoMetal : VideoDriver, Metal {
return true;
}
auto clear() -> void override {}
auto clear() -> void override {
//force a resize of the framebuffer to clear it, then output one pixel
output(1, 1);
}
auto size(u32& width, u32& height) -> void override {
if ((_viewWidth == width && _viewHeight == height) && (_viewWidth != 0 && _viewHeight != 0)) { return; }
@ -168,16 +198,22 @@ struct VideoMetal : VideoDriver, Metal {
buffer = new u32[width * height]();
MTLTextureDescriptor *textureDescriptor = [MTLTextureDescriptor new];
textureDescriptor.pixelFormat = MTLPixelFormatBGRA8Unorm;
textureDescriptor.width = sourceWidth;
textureDescriptor.height = sourceHeight;
textureDescriptor.usage = MTLTextureUsageRenderTarget|MTLTextureUsageShaderRead;
_sourceTexture = [_device newTextureWithDescriptor:textureDescriptor];
bytesPerRow = sourceWidth * sizeof(u32);
if (bytesPerRow < 16) bytesPerRow = 16;
for (int i = 0; i < kMaxSourceBuffersInFlight; i++) {
if (sourceWidth < 1 || sourceHeight < 1) {
_sourceTextures[i] = nullptr;
continue;
}
MTLTextureDescriptor *textureDescriptor = [MTLTextureDescriptor new];
textureDescriptor.pixelFormat = MTLPixelFormatBGRA8Unorm;
textureDescriptor.width = sourceWidth;
textureDescriptor.height = sourceHeight;
textureDescriptor.usage = MTLTextureUsageRenderTarget|MTLTextureUsageShaderRead;
_sourceTextures[i] = [_device newTextureWithDescriptor:textureDescriptor];
}
}
pitch = sourceWidth * sizeof(u32);
@ -187,6 +223,7 @@ struct VideoMetal : VideoDriver, Metal {
auto release() -> void override {}
auto resizeOutputBuffers(u32 width, u32 height) {
NSLog(@"Resizing output buffers to %i, %i", width, height);
outputWidth = width;
outputHeight = height;
@ -229,6 +266,52 @@ struct VideoMetal : VideoDriver, Metal {
}
auto output(u32 width, u32 height) -> void override {
/// Synchronously copy the current framebuffer to a Metal texture, then call into the render dispatch queue
/// either synchronously or asynchronously depending on whether blocking is on and VRR is supported.
if (depth >= kMaxSourceBuffersInFlight) {
//if we are running very behind, drop this frame
return;
}
//can we do this outside of the output function?
//currently no, because in theory framebuffer size can change during runtime
if (width != outputWidth || height != outputHeight) {
resizeOutputBuffers(width, height);
}
@autoreleasepool {
frameCount++;
auto index = frameCount % kMaxSourceBuffersInFlight;
auto sourceTexture = _sourceTextures[index];
[sourceTexture replaceRegion:MTLRegionMake2D(0, 0, sourceWidth, sourceHeight) mipmapLevel:0 withBytes:buffer bytesPerRow:bytesPerRow];
if (@available(macOS 10.15.4, *)) {
depth++;
}
/// Only block with `dispatch_sync` if blocking enabled and VRR not supported, or if the threaded renderer
/// is explicitly disabled. if VRR is supported, we should try to not _literally_ block, because we'll be making a best
/// effort to synchronize to the guest and host refresh rate at the same time. It's easier to do that if we have
/// assurances that we won't block the emulation thread in the worst case system conditions.
if ((_blocking && !_vrrIsSupported) || !_threaded) {
dispatch_sync(_renderQueue, ^{
outputHelper(width, height, sourceTexture);
});
} else {
dispatch_async(_renderQueue, ^{
outputHelper(width, height, sourceTexture);
});
}
}
}
private:
auto outputHelper(u32 width, u32 height, id<MTLTexture> sourceTexture) -> void {
/// Uses two render passes (plus librashader's render passes). The first render pass samples the source texture,
/// consisting of the pixel buffer from the emulator, onto a texture the same size as our eventual output,
/// `_renderTargetTexture`. Then it calls into librashader, which performs postprocessing onto the same
@ -236,37 +319,59 @@ struct VideoMetal : VideoDriver, Metal {
/// We need this last pass because librashader expects the viewport to be the same size as the output texture,
/// which is not the case for ares.
//can we do this outside of the output function?
if (width != outputWidth || height != outputHeight) {
resizeOutputBuffers(width, height);
}
dispatch_semaphore_wait(_semaphore, DISPATCH_TIME_FOREVER);
id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
@autoreleasepool {
if (commandBuffer != nil) {
__block dispatch_semaphore_t block_sema = _semaphore;
dispatch_semaphore_wait(_semaphore, DISPATCH_TIME_FOREVER);
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer) {
dispatch_semaphore_signal(block_sema);
}];
id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
_renderToTextureRenderPassDescriptor.colorAttachments[0].texture = _renderTargetTexture;
if (commandBuffer != nil) {
__block dispatch_semaphore_t block_sema = _semaphore;
if (_renderToTextureRenderPassDescriptor != nil) {
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer) {
dispatch_semaphore_signal(block_sema);
}];
id<MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:_renderToTextureRenderPassDescriptor];
_renderToTextureRenderPassDescriptor.colorAttachments[0].texture = _renderTargetTexture;
_renderToTextureRenderPassDescriptor.colorAttachments[0].storeAction = MTLStoreActionStore;
if (_renderToTextureRenderPassDescriptor != nil) {
[renderEncoder setRenderPipelineState:_renderToTextureRenderPipeline];
[renderEncoder setViewport:(MTLViewport){0, 0, (double)width, (double)height, -1.0, 1.0}];
[renderEncoder setVertexBuffer:_vertexBuffer
offset:0
atIndex:0];
[renderEncoder setVertexBytes:&_viewportSize
length:sizeof(_viewportSize)
atIndex:MetalVertexInputIndexViewportSize];
[renderEncoder setFragmentTexture:sourceTexture atIndex:0];
[renderEncoder drawPrimitives:MTLPrimitiveTypeTriangle vertexStart:0 vertexCount:6];
[renderEncoder endEncoding];
if (_filterChain) {
_libra.mtl_filter_chain_frame(&_filterChain, commandBuffer, frameCount, sourceTexture, _libraViewport, _renderTargetTexture, nil, nil);
}
//this call will block the current thread/queue if a drawable is not yet available
MTLRenderPassDescriptor *drawableRenderPassDescriptor = view.currentRenderPassDescriptor;
drawableRenderPassDescriptor.colorAttachments[0].loadAction = MTLLoadActionClear;
if (drawableRenderPassDescriptor != nil) {
id<MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:_renderToTextureRenderPassDescriptor];
id<MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:drawableRenderPassDescriptor];
_renderToTextureRenderPassDescriptor.colorAttachments[0].storeAction = MTLStoreActionStore;
[renderEncoder setRenderPipelineState:_drawableRenderPipeline];
[_sourceTexture replaceRegion:MTLRegionMake2D(0, 0, sourceWidth, sourceHeight) mipmapLevel:0 withBytes:buffer bytesPerRow:bytesPerRow];
[renderEncoder setRenderPipelineState:_renderToTextureRenderPipeline];
[renderEncoder setViewport:(MTLViewport){0, 0, (double)width, (double)height, -1.0, 1.0}];
[renderEncoder setViewport:(MTLViewport){_outputX, _outputY, (double)width, (double)height, -1.0, 1.0}];
[renderEncoder setVertexBuffer:_vertexBuffer
offset:0
@ -276,85 +381,98 @@ struct VideoMetal : VideoDriver, Metal {
length:sizeof(_viewportSize)
atIndex:MetalVertexInputIndexViewportSize];
[renderEncoder setFragmentTexture:_sourceTexture atIndex:0];
[renderEncoder setFragmentTexture:_renderTargetTexture atIndex:0];
[renderEncoder drawPrimitives:MTLPrimitiveTypeTriangle vertexStart:0 vertexCount:6];
[renderEncoder endEncoding];
if (_filterChain) {
_libra.mtl_filter_chain_frame(&_filterChain, commandBuffer, frameCount++, _sourceTexture, _libraViewport, _renderTargetTexture, nil, nil);
id<CAMetalDrawable> drawable = view.currentDrawable;
if (@available(macOS 10.15.4, *)) {
[drawable addPresentedHandler:^(id<MTLDrawable> drawable) {
self.drawableWasPresented(drawable);
depth--;
}];
}
MTLRenderPassDescriptor *drawableRenderPassDescriptor = view.currentRenderPassDescriptor;
auto targetPresentDuration = determineNextPresentDuration();
drawableRenderPassDescriptor.colorAttachments[0].loadAction = MTLLoadActionClear;
if (drawableRenderPassDescriptor != nil) {
id<MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:drawableRenderPassDescriptor];
[renderEncoder setRenderPipelineState:_drawableRenderPipeline];
[renderEncoder setViewport:(MTLViewport){_outputX, _outputY, (double)width, (double)height, -1.0, 1.0}];
[renderEncoder setVertexBuffer:_vertexBuffer
offset:0
atIndex:0];
[renderEncoder setVertexBytes:&_viewportSize
length:sizeof(_viewportSize)
atIndex:MetalVertexInputIndexViewportSize];
[renderEncoder setFragmentTexture:_renderTargetTexture atIndex:0];
[renderEncoder drawPrimitives:MTLPrimitiveTypeTriangle vertexStart:0 vertexCount:6];
[renderEncoder endEncoding];
id<CAMetalDrawable> drawable = view.currentDrawable;
if (drawable != nil) {
if (_blocking) {
[commandBuffer presentDrawable:drawable afterMinimumDuration:_presentInterval];
} else {
[commandBuffer presentDrawable:drawable];
}
[view draw];
if (drawable != nil) {
if (_blocking) {
//_blocking is not enabled unless 10.15.4 is available, so ignore availability warnings here
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wunguarded-availability-new"
[commandBuffer presentDrawable:drawable afterMinimumDuration:targetPresentDuration];
#pragma clang diagnostic pop
} else {
[commandBuffer presentDrawable:drawable];
}
[view draw];
}
[commandBuffer commit];
if (_flush) {
[commandBuffer waitUntilCompleted];
}
}
[commandBuffer commit];
if (_flush) {
[commandBuffer waitUntilCompleted];
}
}
}
}
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wunguarded-availability-new"
auto drawableWasPresented(id<MTLDrawable> drawable) -> void {
if (drawable.presentedTime == 0) { return; }
if (previousPresentedTime <= 0) {
previousPresentedTime = drawable.presentedTime;
}
CFTimeInterval presentationDuration = drawable.presentedTime - previousPresentedTime;
const double alpha = kPresentIntervalRollingAverageWeight;
private:
averagePresentDuration = (presentationDuration * alpha) + (averagePresentDuration * (1.0 - alpha));
previousPresentedTime = drawable.presentedTime;
}
#pragma clang diagnostic pop
auto determineNextPresentDuration() -> CFTimeInterval {
/// We use a rolling average of the last few seconds worth of frames to determine if we are running fast or slow. If
/// we are running ahead, we do nothing special; it's sufficient to present at the prescribed interval and eventually
/// we will fall behind. When we fall behind, we need to present earlier than the target present interval. The way VRR
/// works on macOS, we can request an earlier present interval, but we don't always get it. So what we do in this
/// function is "nudge" the system to display our frame early, but not immediately, in an attempt to correct for running
/// behind. If in this process we get more than 3 frames behind, we start requesting immediate presents.
CFTimeInterval targetPresentDuration = _presentInterval;
CFTimeInterval differenceFromTarget = _presentInterval - averagePresentDuration;
if (-differenceFromTarget >= (_presentInterval * kVRRCorrectiveTolerance)) {
targetPresentDuration = _presentInterval + (differenceFromTarget * kVRRCorrectiveForce);
}
if (depth > kVRRImmediatePresentThreshold) {
return 0;
} else {
return targetPresentDuration;
}
}
auto initialize() -> bool {
terminate();
if (!self.fullScreen && !self.context) return false;
if (!self.context) return false;
auto context = self.fullScreen ? [window contentView] : (__bridge NSView*)(void *)self.context;
auto context = (__bridge NSView*)(void *)self.context;
auto size = [context frame].size;
NSError *error = nil;
//Put renderer on a separate queue so we can choose whether or not to block the main thread (audio) waiting on a drawable.
dispatch_queue_attr_t queueAttributes = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_USER_INTERACTIVE, -1);
_renderQueue = dispatch_queue_create("com.ares.metal-renderer", queueAttributes);
_device = MTLCreateSystemDefaultDevice();
_commandQueue = [_device newCommandQueue];
_semaphore = dispatch_semaphore_create(kMaxBuffersInFlight);
_semaphore = dispatch_semaphore_create(kMaxOutputBuffersInFlight);
_renderToTextureRenderPassDescriptor = [MTLRenderPassDescriptor new];
@ -365,27 +483,40 @@ private:
///We compile shaders at runtime so we do not need to add the `xcrun` Metal compiler toolchain to the ares build process.
///Metal frame capture does not get along with runtime-compiled shaders in my testing, however. If you are debugging ares
///and need GPU captures, you should compile shaders with debug symbols offline, then instantiate the shader library by
///directly referencing a compiled `.metallib` file, rather than the following instantiation flow. You will also need to alter
///the `desktop-ui` Makefile such that it copies the `.metallib` into the bundle, rather than only the `Shaders.metal`
///source.
///and need GPU captures, run `scripts/macos-metal-debug.sh` and then compile ares in debug mode.
NSString *bundleResourcePath = [NSBundle mainBundle].resourcePath;
const string& fileComponent = "/Shaders/Shaders.metal";
NSString *shaderFilePath = [bundleResourcePath stringByAppendingString: [[NSString new] initWithUTF8String:fileComponent]];
bool libraryCreated = false;
NSString *shaderLibrarySource = [NSString stringWithContentsOfFile:shaderFilePath encoding:NSUTF8StringEncoding error: &error];
if (shaderLibrarySource == nil) {
NSLog(@"%@",error);
return false;
#if defined(BUILD_DEBUG)
if (@available(macOS 10.13, *)) {
NSURL *shaderLibURL = [NSURL fileURLWithPath:@"ares.app/Contents/Resources/Shaders/shaders.metallib"];
_library = [_device newLibraryWithURL: shaderLibURL error:&error];
}
_library = [_device newLibraryWithSource: shaderLibrarySource options: [MTLCompileOptions alloc] error:&error];
if (_library == nil) {
NSLog(@"%@",error);
return false;
if (_library != nil) {
libraryCreated = true;
} else {
NSLog(@"Compiled in debug mode, but debug .metallib not found. If you require Metal debugging, ensure you are on macOS 10.13+ and compile debug Metal shaders with scripts/macos-metal-debug.sh.");
}
#endif
if (!libraryCreated) {
NSString *bundleResourcePath = [NSBundle mainBundle].resourcePath;
const string& fileComponent = "/Shaders/Shaders.metal";
NSString *shaderFilePath = [bundleResourcePath stringByAppendingString: [[NSString new] initWithUTF8String:fileComponent]];
NSString *shaderLibrarySource = [NSString stringWithContentsOfFile:shaderFilePath encoding:NSUTF8StringEncoding error: &error];
if (shaderLibrarySource == nil) {
NSLog(@"%@",error);
return false;
}
_library = [_device newLibraryWithSource: shaderLibrarySource options: [MTLCompileOptions alloc] error:&error];
if (_library == nil) {
NSLog(@"%@",error);
return false;
}
}
MTLRenderPipelineDescriptor *pipelineStateDescriptor = [MTLRenderPipelineDescriptor new];
@ -423,16 +554,13 @@ private:
bool forceSRGB = self.forceSRGB;
self.setForceSRGB(forceSRGB);
view.autoresizingMask = NSViewWidthSizable|NSViewHeightSizable;
_commandQueue = [_device newCommandQueue];
_threaded = self.threadedRenderer;
_libra = librashader_load_instance();
if (!_libra.instance_loaded) {
print("Metal: Failed to load librashader: shaders will be disabled\n");
}
setShader(self.shader);
_blocking = self.blocking;
initialized = true;
@ -446,7 +574,9 @@ private:
_library = nullptr;
_vertexBuffer = nullptr;
_sourceTexture = nullptr;
for (int i = 0; i < kMaxSourceBuffersInFlight; i++) {
_sourceTextures[i] = nullptr;
}
_mtlVertexDescriptor = nullptr;
_renderToTextureRenderPassDescriptor = nullptr;
@ -464,17 +594,9 @@ private:
[view removeFromSuperview];
view = nil;
}
if (window) {
[window toggleFullScreen:nil];
[window setCollectionBehavior:NSWindowCollectionBehaviorDefault];
[window close];
window = nil;
}
}
RubyVideoMetal* view = nullptr;
RubyWindowMetal* window = nullptr;
bool _ready = false;
std::recursive_mutex mutex;
@ -518,28 +640,3 @@ private:
}
@end
@implementation RubyWindowMetal : NSWindow
-(id) initWith:(VideoMetal*)videoPointer {
auto primaryRect = [[[NSScreen screens] objectAtIndex:0] frame];
if (self = [super initWithContentRect:primaryRect styleMask:0 backing:NSBackingStoreBuffered defer:YES]) {
video = videoPointer;
[self setDelegate:self];
[self setReleasedWhenClosed:NO];
[self setAcceptsMouseMovedEvents:YES];
[self setTitle:@""];
[self makeKeyAndOrderFront:nil];
}
return self;
}
-(BOOL) canBecomeKeyWindow {
return YES;
}
-(BOOL) canBecomeMainWindow {
return YES;
}
@end

ファイルの表示

@ -13,7 +13,30 @@
struct Metal;
static const NSUInteger kMaxBuffersInFlight = 3;
//macOS likes triple buffering, so only use three output buffers.
static const NSUInteger kMaxOutputBuffersInFlight = 3;
//If the frame queue gets above this depth, start discarding frames.
static const u32 kMaxSourceBuffersInFlight = 6;
/// MARK: VRR constants
/// These constants were picked through trial and error testing; not any objective principles. It may be possible to
/// achieve slight improvements, especially if optimizing for a particular system. These were found to generally perform
/// well across all of ares's systems that were tested. Despite thorough testing, the immediate present mode together
/// with the lowest possible audio latency may still be a better choice for some systems, particularly WonderSwan.
//How far back the rolling average of present intervals extends.
static const double kPresentIntervalRollingAverageWeight = 0.05;
//The amount that we must be off of the target refresh rate before we start 'nudging' it toward earlier presents; here, 0.5%.
static const double kVRRCorrectiveTolerance = .005;
//Amount that we "nudge" the present interval when we are behind the target refresh rate.
//In this way, we can sometimes achieve presents between 8ms and 16ms, leading to smoother pacing.
static const double kVRRCorrectiveForce = 5.0;
//If the frame queue gets above this depth, start scheduling immediate (0ms) presents.
static const u32 kVRRImmediatePresentThreshold = 3;
struct Metal {
auto setShader(const string& pathname) -> void;
@ -37,17 +60,26 @@ struct Metal {
u32 outputHeight = 0;
double _outputX = 0;
double _outputY = 0;
u32 depth = 0;
dispatch_queue_t _renderQueue = nullptr;
CGFloat _viewWidth = 0;
CGFloat _viewHeight = 0;
vector_uint2 _viewportSize;
double _presentInterval = .016;
CFTimeInterval _presentInterval = .016;
CFTimeInterval averagePresentDuration = .016;
CFTimeInterval previousPresentedTime = 0;
u32 frameCount = 0;
double _refreshRateHint = 60;
CFTimeInterval _refreshRateHint = 60;
bool _blocking = false;
bool _flush = false;
bool _vrrIsSupported = false;
bool _threaded = true;
NSRect frameBeforeFullScreen = NSMakeRect(0,0,0,0);
id<MTLDevice> _device;
id<MTLCommandQueue> _commandQueue;
@ -55,7 +87,7 @@ struct Metal {
dispatch_semaphore_t _semaphore;
id<MTLBuffer> _vertexBuffer;
id<MTLTexture> _sourceTexture;
id<MTLTexture> _sourceTextures[kMaxSourceBuffersInFlight];
MTLVertexDescriptor *_mtlVertexDescriptor;
MTLRenderPassDescriptor *_renderToTextureRenderPassDescriptor;

ファイルの表示

@ -68,6 +68,14 @@ auto Video::setForceSRGB(bool forceSRGB) -> bool {
return true;
}
auto Video::setThreadedRenderer(bool threadedRenderer) -> bool {
lock_guard<recursive_mutex> lock(mutex);
if(instance->threadedRenderer == threadedRenderer) return true;
if(!instance->hasThreadedRenderer()) return false;
if(!instance->setThreadedRenderer(instance->threadedRenderer = threadedRenderer)) return false;
return true;
}
auto Video::setFlush(bool flush) -> bool {
lock_guard<recursive_mutex> lock(mutex);
if(instance->flush == flush) return true;

ファイルの表示

@ -14,6 +14,7 @@ struct VideoDriver {
virtual auto hasContext() -> bool { return false; }
virtual auto hasBlocking() -> bool { return false; }
virtual auto hasForceSRGB() -> bool { return false; }
virtual auto hasThreadedRenderer() -> bool { return false; }
virtual auto hasFlush() -> bool { return false; }
virtual auto hasFormats() -> vector<string> { return {"ARGB24"}; }
virtual auto hasShader() -> bool { return false; }
@ -26,6 +27,7 @@ struct VideoDriver {
virtual auto setContext(uintptr context) -> bool { return true; }
virtual auto setBlocking(bool blocking) -> bool { return true; }
virtual auto setForceSRGB(bool forceSRGB) -> bool { return true; }
virtual auto setThreadedRenderer(bool threadedRenderer) -> bool { return true; }
virtual auto setFlush(bool flush) -> bool { return true; }
virtual auto setFormat(string format) -> bool { return true; }
virtual auto setShader(string shader) -> bool { return true; }
@ -49,6 +51,7 @@ protected:
uintptr context = 0;
bool blocking = false;
bool forceSRGB = false;
bool threadedRenderer = true;
bool flush = false;
string format = "ARGB24";
string shader = "None";
@ -90,6 +93,7 @@ struct Video {
auto hasContext() -> bool { return instance->hasContext(); }
auto hasBlocking() -> bool { return instance->hasBlocking(); }
auto hasForceSRGB() -> bool { return instance->hasForceSRGB(); }
auto hasThreadedRenderer() -> bool { return instance->hasThreadedRenderer(); }
auto hasFlush() -> bool { return instance->hasFlush(); }
auto hasFormats() -> vector<string> { return instance->hasFormats(); }
auto hasShader() -> bool { return instance->hasShader(); }
@ -102,6 +106,7 @@ struct Video {
auto context() -> uintptr { return instance->context; }
auto blocking() -> bool { return instance->blocking; }
auto forceSRGB() -> bool { return instance->forceSRGB; }
auto threadedRenderer() -> bool { return instance->threadedRenderer; }
auto flush() -> bool { return instance->flush; }
auto format() -> string { return instance->format; }
auto shader() -> string { return instance->shader; }
@ -112,6 +117,7 @@ struct Video {
auto setContext(uintptr context) -> bool;
auto setBlocking(bool blocking) -> bool;
auto setForceSRGB(bool forceSRGB) -> bool;
auto setThreadedRenderer(bool threadedRenderer) -> bool;
auto setFlush(bool flush) -> bool;
auto setFormat(string format) -> bool;
auto setShader(string shader) -> bool;

7
scripts/macos-metal-debug.sh 実行可能ファイル
ファイルの表示

@ -0,0 +1,7 @@
#!/bin/bash
set -euo pipefail
pushd ../ruby/video/metal
xcrun -sdk macosx metal -o shaders.ir -c -gline-tables-only -frecord-sources Shaders.metal
xcrun -sdk macosx metallib -o shaders.metallib shaders.ir
popd