Admin warning: this resource replaces the DLL files contained within your VaM Install. Users accept all risks associated with using this resource
As requested in this thread https://hub.virtamate.com/threads/benchmark-result-discussion.13131/page-37 here is a release of the cpu performance patch.
FAQ at bottom
Only VaM 1.22.0.3 is supported!
Please share before and after benchmarks with your settings using this scene: https://hub.virtamate.com/resources/benchmark.11336/
Baseline 3 is the one that should improve most (or just any scene with more than one character).
Also share if you encounter any bugs like skin flopping around or other plugins breaking in the discussion thread!
If you see some weird improvements like "Simpler physics" benchmark now beeing slower than "Baseline3", please set
When reviewing please atleast name your CPU/GPU.
Summary:
A native C implementation of CPU intensive functions, offloaded to a dll, which gets called from a modified Assembly-CSharp.dll. The skinmeshing is now multithreaded, although not all CPUs benefit from it. The CPU part of the colliders computation is now multithreaded, all CPUs directly benefit from it. The patch can also automaticaly limit the threads to a single CCD, which is important for AMD CPUs, since the fast CPU cache is only shared between cores of the same CCD.
Installation:
Make a backup of \VaM_Data\Managed\Assembly-CSharp.dll first.
Extract into main directory, so you get:
\VaM.exe (not included in patch, just so you know you put it in the right folder)
\PerformancePatches\SkinMeshPartDLL.dll
\PerformancePatches\SkinMeshPartDLL.ini
\VaM_Data\Managed\Assembly-CSharp.dll
Pictured guide by @kretos
Version13beta1:
Now has engineAffinity parameter. Cores listed there will not be used by the patch's threads, so that the unity engine can use those exclusively. You want to put your two fastest cores in there, AMD users can use Ryzen Master to find them. If you dont want to use this feature just delete the parameter or comment it out.
Version12 :
Now supports MaxPerChar settings, so that it uses less threads per character when there are many chars in the scene and all threads for this chararacter if there is only one. That way you get the most in both scenarios, although your CPU cores will run at 100% when you have 1 char and 300fps.
Tune SkinMeshPartDLL.ini:
When running, your taskmanager should look like this:
This means the unity engine is totally maxed out and you are only limited by unity's physics engine. If you dont have your fastest core running at 100%, it means you can increase MaxPerChar a little closer to your core count.
In benchmarks, if you see WaitTime beeing 0.00, it means the next frame is not waiting for us and we are running as fast as we can. If it is bigger than 0.00 it means there is still some room for improvement (unless you already hit all cores).
Some CPUs:
AMD Ryzen 7950x(3d)
(we are using the first CCD only, which has the 3d cache)
AMD Ryzen 7800x(3d)
(has only one CCD)
AMD Ryzen 5950x
(we are using the first CCD only)
AMD Ryzen 5900x
(we are using the first CCD only)
AMD Ryzen 5800x(3d)
(has only one CCD)
AMD Ryzen 5600x(3d)
(has only one CCD)
AMD Ryzen 3800x
AMD Ryzen 3600x
Intel 14900K/14800K/14700K
(we are using the performance cores only)
Intel 14600K/14500/14400
(we are using the performance cores only)
Intel 13900K/13800K/13700K
(we are using the performance cores only)
Intel 13600K/13500/13400
(we are using the performance cores only)
Intel 12900K/12800K/12700K
(we are using the performance cores only)
Intel 12600K/12500/12400
(we are using the performance cores only)
You catch the drift: If its intel, use only the performance cores and skip the HTs. If its AMD, use only cores from first CCD and skip the HTs.
I dont have an Intel CPU, so please experiment and give me feedback.
FAQ
Instead of
\VaM_Data\Managed\Assembly-CSharp.dll
put the Assembly-CSharp.dll into
\sotr_Data\Managed\Assembly-CSharp.dll
If its running, it should output
In fact VaM already has clever optimizations, but they are still limited to C#.
Also he has an Intel, so he didnt really notice the issues on AMD I guess. Especially the quirk that AMD CPUs shouldnt use SMT when running Unity games.
Improvements on 5950x
A crispy 103% fps increase for FREE without any downsides.[/SPOILER]
FAQ at bottom
Only VaM 1.22.0.3 is supported!
Please share before and after benchmarks with your settings using this scene: https://hub.virtamate.com/resources/benchmark.11336/
Baseline 3 is the one that should improve most (or just any scene with more than one character).
Also share if you encounter any bugs like skin flopping around or other plugins breaking in the discussion thread!
If you see some weird improvements like "Simpler physics" benchmark now beeing slower than "Baseline3", please set
rerun the benchmarks, zip the ThreadProfile.csv file and share it in the discussion thread along with your setup. The csv only contains thread timings and uses random numbers for characters in the scene. Afterwards disable the profiler again, since it eats 1-2fps.
When reviewing please atleast name your CPU/GPU.
Summary:
A native C implementation of CPU intensive functions, offloaded to a dll, which gets called from a modified Assembly-CSharp.dll. The skinmeshing is now multithreaded, although not all CPUs benefit from it. The CPU part of the colliders computation is now multithreaded, all CPUs directly benefit from it. The patch can also automaticaly limit the threads to a single CCD, which is important for AMD CPUs, since the fast CPU cache is only shared between cores of the same CCD.
Installation:
Make a backup of \VaM_Data\Managed\Assembly-CSharp.dll first.
Extract into main directory, so you get:
\VaM.exe (not included in patch, just so you know you put it in the right folder)
\PerformancePatches\SkinMeshPartDLL.dll
\PerformancePatches\SkinMeshPartDLL.ini
\VaM_Data\Managed\Assembly-CSharp.dll
Pictured guide by @kretos
Version13beta1:
Now has engineAffinity parameter. Cores listed there will not be used by the patch's threads, so that the unity engine can use those exclusively. You want to put your two fastest cores in there, AMD users can use Ryzen Master to find them. If you dont want to use this feature just delete the parameter or comment it out.
Version12 :
Now supports MaxPerChar settings, so that it uses less threads per character when there are many chars in the scene and all threads for this chararacter if there is only one. That way you get the most in both scenarios, although your CPU cores will run at 100% when you have 1 char and 300fps.
Tune SkinMeshPartDLL.ini:
This means the unity engine is totally maxed out and you are only limited by unity's physics engine. If you dont have your fastest core running at 100%, it means you can increase MaxPerChar a little closer to your core count.
In benchmarks, if you see WaitTime beeing 0.00, it means the next frame is not waiting for us and we are running as fast as we can. If it is bigger than 0.00 it means there is still some room for improvement (unless you already hit all cores).
Enables multithreaded batch processing of rendering commands before they are submitted to the GPU
Update your GPU driver before enabling that.
Some CPUs:
AMD Ryzen 7950x(3d)
(we are using the first CCD only, which has the 3d cache)
(has only one CCD)
(we are using the first CCD only)
(we are using the first CCD only)
(has only one CCD)
(has only one CCD)
(we are using the performance cores only)
(we are using the performance cores only)
(we are using the performance cores only)
(we are using the performance cores only)
(we are using the performance cores only)
(we are using the performance cores only)
You catch the drift: If its intel, use only the performance cores and skip the HTs. If its AMD, use only cores from first CCD and skip the HTs.
I dont have an Intel CPU, so please experiment and give me feedback.
FAQ
Dont skip every second core, so instead of
write
You might have fallen into the Motion Smoothing hole or reached VSync wall, explaination:
If you run your quest3 at 90hz, have motion smoothing on and have between 45fps and 89fps, you will get 45fps, no matter what. When motion smoothing is enabled, the oculus driver forces the frames to wait until it settles at 45fps.
Every HMD has VSync enforced, you can never have more FPS than your HMD refreshrate! When you have motion smoothing enabled, your HMD driver temporarily or permanently switches VSync to half when you didnt quite reach 90fps, forcing you to 45fps. The effect is the same as the enforced VSync, just half.
If this is what happened for you, you should clearly see it in the benchmarks:
^Here I'm reaching perfect gameplay and I'm capped at my HMD refresh rate (90fps without motion smoothing, would be 45fps with motion smoothing on or it would constantly jump between 45 and 90 if I had adaptive motion smoothing). But you can calculate your theoretical FPS:
My average TotalTime was 5ms, which translates to 1000/5 = 200fps, so my game would render at 200fps if my HMD driver wouldnt force it to wait. This means I could run this benchmark at 200fps(!) in VR with a faster refreshing HMD. (Ignore the max1% values, these are rounding errors)
^Here I am truly not reaching more than 74fps, TotalTime was 12.22ms, which translates to 1000/12.22 = 81fps which is in the ballpark of 74fps, probably a few frames lost due to reprojection by steamvr. Before the patch I had far worse FPS at about 47fps:
If I had motion smoothing on, I would have seen constant 45fps before and after the patch!
Every HMD has VSync enforced, you can never have more FPS than your HMD refreshrate! When you have motion smoothing enabled, your HMD driver temporarily or permanently switches VSync to half when you didnt quite reach 90fps, forcing you to 45fps. The effect is the same as the enforced VSync, just half.
If this is what happened for you, you should clearly see it in the benchmarks:
^Here I'm reaching perfect gameplay and I'm capped at my HMD refresh rate (90fps without motion smoothing, would be 45fps with motion smoothing on or it would constantly jump between 45 and 90 if I had adaptive motion smoothing). But you can calculate your theoretical FPS:
My average TotalTime was 5ms, which translates to 1000/5 = 200fps, so my game would render at 200fps if my HMD driver wouldnt force it to wait. This means I could run this benchmark at 200fps(!) in VR with a faster refreshing HMD. (Ignore the max1% values, these are rounding errors)
^Here I am truly not reaching more than 74fps, TotalTime was 12.22ms, which translates to 1000/12.22 = 81fps which is in the ballpark of 74fps, probably a few frames lost due to reprojection by steamvr. Before the patch I had far worse FPS at about 47fps:
If I had motion smoothing on, I would have seen constant 45fps before and after the patch!
Then you also renamed the VaM_Data folder to sotr_Data.
Instead of
\VaM_Data\Managed\Assembly-CSharp.dll
put the Assembly-CSharp.dll into
\sotr_Data\Managed\Assembly-CSharp.dll
Open vam, wait until you see the main menu. Then open powershell and execute
Code:
Get-Process | select VaM.exe -expand Modules -ea 0 | where {$_.ModuleName -like 'skinmeshpartdll.dll'}
VaM relies heavily on physics calculations that were not included in Unity and those need to do a lot of RAM read/write and lots of float point calculations. Knowledge of native code development and bare-metal optimizations is usualy only available to AA+ game development studios. Writing C code when you develop C# is a very difficult task and such programmers are expensive. The newest UnityEngine might have optimizations for that.
In fact VaM already has clever optimizations, but they are still limited to C#.
Also he has an Intel, so he didnt really notice the issues on AMD I guess. Especially the quirk that AMD CPUs shouldnt use SMT when running Unity games.
Intel still has an advantage in RAM latency, the data gets read and written a lot faster. VaM does a lot of reads/writes for a game. If the data could be read, processed and written in one go that wouldnt matter, but it's not possible in Unity, only Unreal can compact the data into a single block.
Improvements on 5950x
Clean install vanilla:
Clean install vanilla, process affinity manually set to one CCD using task manager:
Clean install with patch v9
Settings:
[threads]
computeColliders=12
skinmeshPart=1
CCD=1
IterateCCD=0
v10 Patch
[threads]
computeColliders=6
skinmeshPart=1
affinity=1,3,5,7,9,11,13,15
Clean install vanilla, process affinity manually set to one CCD using task manager:
Clean install with patch v9
Settings:
[threads]
computeColliders=12
skinmeshPart=1
CCD=1
IterateCCD=0
v10 Patch
[threads]
computeColliders=6
skinmeshPart=1
affinity=1,3,5,7,9,11,13,15
Admin warning: this resource replaces the DLL files contained within your VaM Install. Users accept all risks associated with using this resource