CPU Performance Patch (Up to 30% faster physics, up to 60% more FPS)

Other CPU Performance Patch (Up to 30% faster physics, up to 60% more FPS)

  Admin warning: this resource replaces the DLL files contained within your VaM Install. Users accept all risks associated with using this resource
As requested in this thread https://hub.virtamate.com/threads/benchmark-result-discussion.13131/page-37 here is a release of the cpu performance patch.

FAQ at bottom

Only VaM 1.22.0.3 is supported!

Please share before and after benchmarks with your settings using this scene: https://hub.virtamate.com/resources/benchmark.11336/

Baseline 3 is the one that should improve most (or just any scene with more than one character).

Also share if you encounter any bugs like skin flopping around or other plugins breaking in the discussion thread!

If you see some weird improvements like "Simpler physics" benchmark now beeing slower than "Baseline3", please set
[profiler]
enabled=1
rerun the benchmarks, zip the ThreadProfile.csv file and share it in the discussion thread along with your setup. The csv only contains thread timings and uses random numbers for characters in the scene. Afterwards disable the profiler again, since it eats 1-2fps.

When reviewing please atleast name your CPU/GPU.

Summary:
A native C implementation of CPU intensive functions, offloaded to a dll, which gets called from a modified Assembly-CSharp.dll. The skinmeshing is now multithreaded, although not all CPUs benefit from it. The CPU part of the colliders computation is now multithreaded, all CPUs directly benefit from it. The patch can also automaticaly limit the threads to a single CCD, which is important for AMD CPUs, since the fast CPU cache is only shared between cores of the same CCD.

Installation:
Make a backup of \VaM_Data\Managed\Assembly-CSharp.dll first.
Extract into main directory, so you get:

\VaM.exe (not included in patch, just so you know you put it in the right folder)
\PerformancePatches\SkinMeshPartDLL.dll
\PerformancePatches\SkinMeshPartDLL.ini
\VaM_Data\Managed\Assembly-CSharp.dll

Pictured guide by @kretos
1708797406398.png

Version13beta1:
Now has engineAffinity parameter. Cores listed there will not be used by the patch's threads, so that the unity engine can use those exclusively. You want to put your two fastest cores in there, AMD users can use Ryzen Master to find them. If you dont want to use this feature just delete the parameter or comment it out.

Version12 :
Now supports MaxPerChar settings, so that it uses less threads per character when there are many chars in the scene and all threads for this chararacter if there is only one. That way you get the most in both scenarios, although your CPU cores will run at 100% when you have 1 char and 300fps.

Tune SkinMeshPartDLL.ini:
[threads]
# first core = 1
# the cores vam should use, for AMD it's best to select non SMT cores of your CPU
# if you have SMT enabled in BIOS, skip every second core (those are the HT/SMT), like 1,3,5,..., otherwise use all cores like 1,2,3,...
# If you have AMD ryzen CPU which has multiple CCDs (the ones with 16 cores, select only the cores from the first CCD, they are the fastest)
# eg 5950x with 16 cores, 32 threads across 2 CCDs means 1,3,5,7,9,11,13,15 will be the first CCD and 17,19,21,23,25,27,29,31 will be the second CCD
# if you really dont care just use all HT cores too and you will get the most FPS in a static scene without any movement, but physics and collisions will tank performance
affinity=1,3,5,7,9,11,13,15

# set those to your core count you use, more is better (number of cores from above)
# if you need other applications to work or if your CPU thermally throttles, you can reduce these
computeColliders=8
skinmeshPart=8
applyMorphs=8

# this patch will try to use all threads, no matter if it's worth it in order to maximize FPS
# it notices how many characters are in the scene and distributes the cores across the characters
# but if for example you only have 1 character in the scene, it will throw all cores at it, even if the gains are nonexistent. You heat up your CPU for a few more FPS
# 8 skinmeshParts per character are barely faster than 4 but if you absolutely must get max FPS, set them to your core count
# skinmeshPartMaxPerChar=2 is usually enough
# same goes for the applyMorph
# applyMorphMaxPerChar=4 is usually enough, unless you have 100 pages of active morphs
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8

# exactly the same as above, but will be used when launched in VR
# steamvr might work better if you use not all cores
[threadsVR]
....
When running, your taskmanager should look like this:
1707589520519.png

This means the unity engine is totally maxed out and you are only limited by unity's physics engine. If you dont have your fastest core running at 100%, it means you can increase MaxPerChar a little closer to your core count.
In benchmarks, if you see WaitTime beeing 0.00, it means the next frame is not waiting for us and we are running as fast as we can. If it is bigger than 0.00 it means there is still some room for improvement (unless you already hit all cores).

Additional switches in \VaM_Data\boot.config
Enables multithreaded batch processing of rendering commands before they are submitted to the GPU
wait-for-native-debugger=0
gfx-enable-gfx-jobs=1
gfx-enable-native-gfx-jobs=1
Update your GPU driver before enabling that.

I just want to run it, what settings does my CPU need?

Some CPUs:

AMD Ryzen 7950x(3d)
(we are using the first CCD only, which has the 3d cache)
computeColliders=8
skinmeshPart=8
applyMorphs=8
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8
affinity=1,3,5,7,9,11,13,15
AMD Ryzen 7800x(3d)
(has only one CCD)
computeColliders=8
skinmeshPart=8
applyMorphs=8
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8
affinity=1,3,5,7,9,11,13,15
AMD Ryzen 5950x
(we are using the first CCD only)
computeColliders=8
skinmeshPart=8
applyMorphs=8
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8
affinity=1,3,5,7,9,11,13,15
AMD Ryzen 5900x
(we are using the first CCD only)
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=6
applyMorphMaxPerChar=6
affinity=1,3,5,7,9,11
AMD Ryzen 5800x(3d)
(has only one CCD)
computeColliders=8
skinmeshPart=8
applyMorphs=8
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8
affinity=1,3,5,7,9,11,13,15
AMD Ryzen 5600x(3d)
(has only one CCD)
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=6
applyMorphMaxPerChar=6
affinity=1,3,5,7,9,11
AMD Ryzen 3800x
computeColliders=8
skinmeshPart=8
applyMorphs=8
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8
affinity=1,3,5,7,9,11,13,15
AMD Ryzen 3600x
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=6
applyMorphMaxPerChar=6
affinity=1,3,5,7,9,11
Intel 14900K/14800K/14700K
(we are using the performance cores only)
computeColliders=8
skinmeshPart=8
applyMorphs=8
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8
affinity=1,3,5,7,9,11,13,15
Intel 14600K/14500/14400
(we are using the performance cores only)
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=6
applyMorphMaxPerChar=6
affinity=1,3,5,7,9,11
Intel 13900K/13800K/13700K
(we are using the performance cores only)
computeColliders=8
skinmeshPart=8
applyMorphs=8
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8
affinity=1,3,5,7,9,11,13,15
Intel 13600K/13500/13400
(we are using the performance cores only)
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=6
applyMorphMaxPerChar=6
affinity=1,3,5,7,9,11
Intel 12900K/12800K/12700K
(we are using the performance cores only)
computeColliders=6
skinmeshPart=1
affinity=1,3,5,7,9,11,13,15
Intel 12600K/12500/12400
(we are using the performance cores only)
computeColliders=8
skinmeshPart=8
applyMorphs=8
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8
affinity=1,3,5,7,9,11,13,15

You catch the drift: If its intel, use only the performance cores and skip the HTs. If its AMD, use only cores from first CCD and skip the HTs.
I dont have an Intel CPU, so please experiment and give me feedback.

FAQ

I have disabled SMT/HT in my BIOS, I am using "Gaming Mode" in Ryzen Master
Dont skip every second core, so instead of
affinity=1,3,5,7,9,11,13,15
write
affinity=1,2,3,4,5,6,7,8

My VR performance hasn't increased, it's still at the same 45/60/120fps!
You might have fallen into the Motion Smoothing hole or reached VSync wall, explaination:
If you run your quest3 at 90hz, have motion smoothing on and have between 45fps and 89fps, you will get 45fps, no matter what. When motion smoothing is enabled, the oculus driver forces the frames to wait until it settles at 45fps.
Every HMD has VSync enforced, you can never have more FPS than your HMD refreshrate! When you have motion smoothing enabled, your HMD driver temporarily or permanently switches VSync to half when you didnt quite reach 90fps, forcing you to 45fps. The effect is the same as the enforced VSync, just half.
If this is what happened for you, you should clearly see it in the benchmarks:
1707860893133.png

^Here I'm reaching perfect gameplay and I'm capped at my HMD refresh rate (90fps without motion smoothing, would be 45fps with motion smoothing on or it would constantly jump between 45 and 90 if I had adaptive motion smoothing). But you can calculate your theoretical FPS:
My average TotalTime was 5ms, which translates to 1000/5 = 200fps, so my game would render at 200fps if my HMD driver wouldnt force it to wait. This means I could run this benchmark at 200fps(!) in VR with a faster refreshing HMD. (Ignore the max1% values, these are rounding errors)
1707861007186.png

^Here I am truly not reaching more than 74fps, TotalTime was 12.22ms, which translates to 1000/12.22 = 81fps which is in the ballpark of 74fps, probably a few frames lost due to reprojection by steamvr. Before the patch I had far worse FPS at about 47fps:
1707862729976.png


If I had motion smoothing on, I would have seen constant 45fps before and after the patch!

I renamed to VaM.exe to sotr.exe to use some fancy nvidia features...
Then you also renamed the VaM_Data folder to sotr_Data.
Instead of
\VaM_Data\Managed\Assembly-CSharp.dll
put the Assembly-CSharp.dll into
\sotr_Data\Managed\Assembly-CSharp.dll

How do I check if the patch is running?
Open vam, wait until you see the main menu. Then open powershell and execute
Code:
Get-Process | select VaM.exe -expand Modules -ea 0 | where {$_.ModuleName -like 'skinmeshpartdll.dll'}
If its running, it should output
Size(K) ModuleName FileName
------- ---------- --------
1260 SkinMeshPartDLL.dll C:\Games\vam\PerformancePatches\SkinMeshPartDLL.dll

How did meshedvr not notice those issues?
VaM relies heavily on physics calculations that were not included in Unity and those need to do a lot of RAM read/write and lots of float point calculations. Knowledge of native code development and bare-metal optimizations is usualy only available to AA+ game development studios. Writing C code when you develop C# is a very difficult task and such programmers are expensive. The newest UnityEngine might have optimizations for that.
In fact VaM already has clever optimizations, but they are still limited to C#.
Also he has an Intel, so he didnt really notice the issues on AMD I guess. Especially the quirk that AMD CPUs shouldnt use SMT when running Unity games.

Why is AMD still so much slower than Intel in VaM?
Intel still has an advantage in RAM latency, the data gets read and written a lot faster. VaM does a lot of reads/writes for a game. If the data could be read, processed and written in one go that wouldnt matter, but it's not possible in Unity, only Unreal can compact the data into a single block.

I am not seeing any improvements, why is that?

1.JPG

2.JPG

3.JPG

4.JPG

5.JPG

6.JPG

7.JPG

8.JPG

9.JPG

Improvements on 5950x

Clean install vanilla:
vanilla.png


Clean install vanilla, process affinity manually set to one CCD using task manager:
vanilla_procaffinity.png


Clean install with patch v9
Settings:
[threads]
computeColliders=12
skinmeshPart=1
CCD=1
IterateCCD=0
patched9.png


v10 Patch
[threads]
computeColliders=6
skinmeshPart=1
affinity=1,3,5,7,9,11,13,15
Benchmark-20240207-211808.png
A crispy 103% fps increase for FREE without any downsides.[/SPOILER]
  Admin warning: this resource replaces the DLL files contained within your VaM Install. Users accept all risks associated with using this resource
Author
turtlebackgoofy
Downloads
11,565
Views
64,002
Version
13b1
First release
Last update
Rating
4.96 star(s) 70 ratings

Latest updates

  1. less stutters

    - Generate less garbage per frame - Delay garbage collection until we wait for the GPU to render...
  2. Bugfixes, faster morphs and easier thread settings

    - Doesnt use AVX anymore, it didnt bring any benefit turns out - Fixes the bug when a default...
  3. Default Morphs fix and AVX1 and SSE DLLs

    - Fixes the bug of the default built-in morphs (the ones from vanilla vam) not loading on scene...

Latest reviews

Holy Sheep Shit Batman! This gained me 30 fps on the ave and like 50 on the max on my 3700x... It was so much faster when I was benching it that I really didn't believe it...

I dont even think I have the settings really fine tuned.,.. Crazy stuff!
Upvote 0
Excellent, and gives a performance boost just as great as advertised. I've giving it 4 instead of five stars only because it also introduced instability into my particular setup.

My testing suggests that the normal VaM problem with the Unity garbage collector crashing when there are too many vars/morphs, is actually worse with this patch in place.

For anyone with less than 5k vars, you may never notice anything, but in my setup with 10k+ vars, loading without this patch is slow as molasses, but it works 90% of the time. With this patch, most things will load faster, but every scene carries about a 60% chance of triggering the GC crash, with a higher chance for mocap or more complex scenes.
Upvote 1
Oh my days, I just installed this and ran for a few zen scenes and the wait times have gone from like 5 minutes to like 45 seconds for me personally. Unfortunately I can't run a benchmark 4 tests at the moment but I will once I find the problem. This man needs to be on MVR team immediately. Thanks for your effort with this. I would give more stars if I could.
Upvote 0
How to set 7950x
Upvote 0
Raised Baseline 3 significantly
Upvote 0
Must have for VaM. Big thanks for the author.
Upvote 0
Confirmed: this is not hype. Baseline 3 of Benchmark went from AVG 62 / MIN 56 FPS to 103 / 69 FPS.
Upvote 0
Amazing job.
I gained about 60% more FPS on my 10900k/3090 system with this patch.
Shout out to Vr_Vammer for suggesting using ChatGPT.
Upvote 1
i used this but my cpu still remains below 30percent....what should i do?........i have ryzen1800x....any help would be greatly appreciated
Upvote 0
In physics-heavy scenes, with both soft physics and high quality physics enabled, and all other settings maxed out (except for resolution scale... no need for higher than 1.0, imo), this plugin made an absolutely massive difference. Nearly unplayable before, between 15 and 30 FPS generally, I'm getting between 60-100 FPS consistently. Amazing addition, and MeshedVR and all other VAM team members really should consider this being added to VAM 1x by default (assuming turtlebackgoofy were okay with it - otherwise, at least encouraging everyone to download it). Amazing work!
Upvote 0
Back
Top Bottom