sljit performs a single (potentially) oversized executable memory
allocation before generating code. Instead of reserving the requested
amount of space, we can reserve only the used amount and eliminate some
dead space between code blocks.
In some cases sljit may write extra data beyond the generated code.
Because of this, it is no longer safe to assume the remainder of the
code cache is zero initialized. In practice, this assumption only
affected pools which can easily zeroed on demand. This also saves time
on code cache flushes, as we no longer zero the entire cache at once.
sljit performs a single (potentially) oversized executable memory
allocation before generating code. Instead of reserving the requested
amount of space, we can reserve only the used amount and eliminate some
dead space between code blocks.
Using the host carry flag is a bit tricky because on ARM64 (and most
other non-x86 architectures) the meaning of the flag is inverted for
subtraction. There were some helpers in the recompiler to deal with this
but they performed explicit architecture checks which ran counter to the
goals of a generic recompiler.
Reading back the carry flag is now handled directly by sljit. This
feature had been removed from sljit but was recently added back. It
internally tracks whether the setting operation was an add or a subtract
so that x86 semantics are observed on all architectures.
Loading the carry flag is now handled with an explicit add or subtract
that matches the subsequent usage. This results in expected behavior on
all architectures.
- The primary lookup table is direct addressed
- The secondary lookup table is indexed by hash of code block contents
- IMEM invalidation is tracked at 64-byte granularity and used to
evict primary lookup table entries
- The secondary lookup table is only pruned on a full code cache flush
- Only executed instructions are hashed, so the code cache no longer
experiences unbounded growth
- Code blocks can now wrap around the end of IMEM
- generic::recompiler now supports aborting recompilation (required by
by an earlier version of this change)
Previously, the recompilers in ares would assume the System V calling
convention, then emit extra moves on Windows to handle the differences
in register usage. Now, they directly populate the registers used by the
target system for passing parameters to functions. This is achieved
with register aliases set for the appropriate ABI at compile time.
Furthermore, registers that are nonvolatile only on Windows - rsi and
rdi - are no longer used as temporaries, removing the need to save and
restore them in recompiled code blocks.
This accounts for roughly a 10% reduction in recompiled code size on
Windows while making functionally no difference to other systems.
- Mega Drive: VDP scanline renderer compiles once more
- Nintendo 64: improved dynamic recompiler
- PlayStation: began adapting CPU cached interpreter into a dynamic
recompiler
- lucia: allow mapping analog axes separately (allows mapping sticks to
the keyboard)
All that's left now is fixing the call() function in the N64/PS1 to support >3
(4) arguments on Windows and we can release v120
- Nintendo 64: began adapting CPU cached interpreter into a dynamic
recompiler
- Nintendo 64: began adapting RSP cached interpreter into a dynamic
recompiler
- Nintendo 64: added 64-bit addressing and TLB supporting
- Nintendo 64: added endian support to [LS][WD][LR] instructions
I could really use a hand with my CPU dynarec, n64/cpu/recompiler.cpp line 1335.
When calling an FPU instruction without SCC coprocessor 1 enabled, it throws an
exception.
If I call the exception function, the emulator crashes hard and I don't
understand why at all.
The stack is aligned, but it seems if I modify rbx/rbp/r13 as I do with the
RSP+SH2 just fine, it seems to die. Yet even if I make that code restore
rbx/rbp/r13 before calling the coprocessor exception function, it still dies.
Really at a loss >_>
Also, this will only work on Linux/BSD at the moment. The RSP vector functions
pass four parameters, and my Windows call wrapper can only handle 3 arguments
because after that the argument has to go on the stack. I'll fix that later.
I'm releasing ares v119 today, which includes very preliminary Sega 32X
emulation support. Compatibility is currently at around 50% with the dynamic
recompiler, probably closer to 75% with the cached interpreter, and since the
system requirements were a bit too steep (on account of my Mega Drive core being
cycle/dot-accurate), I downclocked the SH2s a bit until I can speed up the
emulator more. You'll currently need about a Ryzen 5 2600 to hit 60fps reliably.
With the underclock, my Ryzen 7 5800X hits 120fps, and without it, 85fps.
Although the compatibility for the new 32X core is a bit low, this is mostly due
to pesky CPU bugs. The accuracy is quite high and I've emulated as much of the
32X and SH7604 peripheral functions as possible. I pass all 161 tests from the
Mars Check Program. It should not take much more work to reach 98% compatibility
in the future. Sega CD 32X emulation support is present, but is likely not
functional, as I haven't tested it yet.
Other major features in this release include Sega SVP support for Virtua Racing,
plus improved Nintendo 64, Mega Drive, and Mega CD emulation compatibility.
I'm going to start working on Sega Saturn emulation now, while Luke Usher works
on Neo Geo emulation. No promises as to when or if these cores will become
playable, I'm just giving you all a heads up since you'll see the skeletons for
these systems in the source code now.
Changelog:
- Nintendo 64: improved Expansion Pak detection
- Nintendo 64: fixed swapped L and R buttons [simer]
- Nintendo 64: emulated the RST bit for gamepads
- ruby: added library detection support for the Linux and BSD targets
- lucia: gained support for game paks (custom file dialog only)
- Super Famicom: added support for Super Game Boy, Sufami Turbo, and BS
Memory packs
- hiro: made SourceEdit optional for gtk2 and gtk3 targets (disabled
for ares)
- Nintendo 64: emulated 2048-byte EEPROM identifier and transfer protocol
- Super Famicom: corrected ppu-performance widths table indexing to consider
overscan
- Mega CD: fixed a bug in register $ff8003; allows Popful Mail to boot
[TascoDLX]
- Mega Drive: fixed a bug with 256-width video modes
- Mega Drive: improved VDP DMA emulation which should fix many games
- Mega Drive: added SVP emulation
- Mega Drive: added 32X emulation
- Mega CD: fixed a bit-masking bug in register $ff8004; fixes Radical Rex
[TascoDLX]
- Nintendo 64: fixed C implementation of RSP VMACU instruction
- Nintendo 64: use correct NTSC and PAL PIF ROM images based on the region
- Mega Drive: fixed reset logic, Z80 interrupt timing + prefix timing +
bus control [TascoDLX]
- Mega CD: fixed word RAM access and CDC DMA word RAM transfers [TascoDLX]
- Nintendo 64: improved RSP VMOV emulation [Rasky]
- Nintendo 64: fixed RSP DMEM DMA alignment (&~7, not &~3) [Rasky]
- Nintendo 64: upgraded to the latest version of ParaLLEl-RDP
- nall: greatly expanded recompiler/amd64's supported intrinsics
- mia: substantial refactoring around a new virtual filesystem implementation
from nall