CPU Performance Patch (Up to 30% faster physics, up to 60% more FPS)

Other CPU Performance Patch (Up to 30% faster physics, up to 60% more FPS)

does the script need modifying for my 5800X3D?

since your CPU doesnt have a second CCD, you need the reserve like 2 cores for everything else:

in the script edit this:
Code:
# binary: 1111000000000000
# means set all processes to core 13-16
$affinity = 61440

and the ini file:
Code:
[threads]
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=6
applyMorphMaxPerChar=6
affinity=1,3,5,7,9,11

[threadsVR]
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=6
applyMorphMaxPerChar=6
affinity=1,3,5,7,9,11
 
since your CPU doesnt have a second CCD, you need the reserve like 2 cores for everything else:

in the script edit this:
Code:
# binary: 1111000000000000
# means set all processes to core 13-16
$affinity = 61440

and the ini file:
Code:
[threads]
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=6
applyMorphMaxPerChar=6
affinity=1,3,5,7,9,11

[threadsVR]
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=6
applyMorphMaxPerChar=6
affinity=1,3,5,7,9,11
I gave this a shot, started vam, loaded my scene, and it was already lower performance than before, then ran the script in powershell, get prompt to allow elevate, hit yes, it closes, then back into vr, no improvement from before running the script, and worse than before modifying the ini. Maybe it benefits your CPU more than mine, hard to say.

This is what I ran just to be sure:

Code:
# elevate privileges if we are not running as Administrator, so we can set affinity of Windows owned processes
# source: http://superuser.com/questions/108207/how-to-run-a-powershell-script-as-administrator

param([switch]$Elevated)

function Test-Admin {
    $currentUser = New-Object Security.Principal.WindowsPrincipal $([Security.Principal.WindowsIdentity]::GetCurrent())
    $currentUser.IsInRole([Security.Principal.WindowsBuiltinRole]::Administrator)
}

if ((Test-Admin) -eq $false)  {
    if ($elevated) {
        'tried to elevate to full privileges, did not work, aborting'
    } else {
        'running my self again with full privileges'
        Start-Process powershell.exe -Verb RunAs -ArgumentList ('-executionpolicy bypass -noprofile -file "{0}" -elevated' -f ($myinvocation.MyCommand.Definition))
    }
    exit
}
'running with full privileges'

# binary: 1111000000000000
# means set all processes to core 13-16

$affinity = 61440

'setting all processes to affinity: '+$affinity
'processes unable to set affinity of: '

$allProcesses = Get-Process | Where {$_.ProcessName -NotLike "vam"}
foreach ($process in $allProcesses) {
    try {
        'process: ' + $process
        $process.ProcessorAffinity = $affinity
    }
    catch {
        $process
    }
}
 
I gave this a shot, started vam, loaded my scene, and it was already lower performance than before, then ran the script in powershell, get prompt to allow elevate, hit yes, it closes, then back into vr, no improvement from before running the script, and worse than before modifying the ini. Maybe it benefits your CPU more than mine, hard to say.

This is what I ran just to be sure:

Code:
# elevate privileges if we are not running as Administrator, so we can set affinity of Windows owned processes
# source: http://superuser.com/questions/108207/how-to-run-a-powershell-script-as-administrator

param([switch]$Elevated)

function Test-Admin {
    $currentUser = New-Object Security.Principal.WindowsPrincipal $([Security.Principal.WindowsIdentity]::GetCurrent())
    $currentUser.IsInRole([Security.Principal.WindowsBuiltinRole]::Administrator)
}

if ((Test-Admin) -eq $false)  {
    if ($elevated) {
        'tried to elevate to full privileges, did not work, aborting'
    } else {
        'running my self again with full privileges'
        Start-Process powershell.exe -Verb RunAs -ArgumentList ('-executionpolicy bypass -noprofile -file "{0}" -elevated' -f ($myinvocation.MyCommand.Definition))
    }
    exit
}
'running with full privileges'

# binary: 1111000000000000
# means set all processes to core 13-16

$affinity = 61440

'setting all processes to affinity: '+$affinity
'processes unable to set affinity of: '

$allProcesses = Get-Process | Where {$_.ProcessName -NotLike "vam"}
foreach ($process in $allProcesses) {
    try {
        'process: ' + $process
        $process.ProcessorAffinity = $affinity
    }
    catch {
        $process
    }
}
right click on cmd in your start, run as admin, type "powershell" and hit enter, paste the script in there and hit enter.
it worked for me with pimax and steamvr, but maybe it fixed an issue that only exists on AMD cpus with multiple CCDs?
 
Easier and more automated is to use Process Lasso with the following rules:
CPU affinity rule for vam:
1707694386067.png

Default affinity rule (when vam is not running):
1707694445599.png

Auto profile switcher rule (so when vam starts it will apply the affinity as above, when you close it then it will rollback changes):
1707694424982.png
 
There is a setting called Physics Update Cap. This setting specifies the maximum number of Physics Updates that can run per rendered frame. If your FPS is low enough that this cap gets hit, you could experience time slowing down, which will make everything look slo-mo. This is just an unfortunate side effect of physics running at a fixed frame rate and being independent of the rendered frames. Try setting to 3 to minimize that possibility, but note that it could lower your FPS because each physics update takes a lot of time and up to 3 could happen per rendered frame. This could happen if you have physics rate set very high and/or your rendered FPS is very low.

I appreciate everything you are doing here, and I'm glad to see you discovered the Physx single thread bottleneck (which is a bit better in newer Unity versions but it still sucks - I'll be doing custom skin physics in VaM2 for this reason). I also glad you posted those charts of how the character threads timing works. The goal for VaM1 was to hit 90FPS for VR usage. I realize with heavy physics it often does not hit that unless you have a very good CPU/GPU and are not running a scene with lots of models all with soft-body physics on. It was a challenge. The character threads code was optimized just enough to try to hit this 90FPS so as not to be the bottleneck. The skinning system is very inefficient as is the morph code. It is great you addressed both of those. When I developed this I didn't see much of a point trying to hit a max FPS above 90 since I was targeting VR and that was the sweet spot. The ultimate bottleneck is Physx on physics heavy scenes and there isn't any way around that. That all said, I am still very glad you looked into this and found some amazing fixes just sitting there for many/most users. I would love to roll this back into official release at some point if you are willing and I can find some time in between VaM2 work. That might be tough. I would definitely compensate you if we manage go this route. If you are interested in making some $$$, I could even give you access to the VaM github repos so we can add this in an easier-to-review manner. I would want to review everything just to make sure nothing is getting broken. DM me if you want to go this route.

Thank you!

I agree with almost everything you said, except my multithreading of GPUCollidersManager::ComputeColliders(), I realized while multithreaded writing to physx is a no-go, you can multithread reading access to the physx engine without problems. In my patch I fill the GPSphere[] and the GPLineSphere[] that goes to the shaderkernel multithreaded. This call happens in FixedUpdate() and it cut the physics time a little, what we yearn for most.
All other optimizations are purely for dick measuring in benchmarks lol.
The other thread schedueling workarounds apply to pretty much every game and there was nothing wrong with VaM in that regard.

You are also correct about the physics thing. Here is a simpler explaination as far as I understood:
If you your physics rate is 90hz and the physics update cap is 3, the optimal boobs update is 270 times per second, so if you manage to hit 270 physics updates per second the boobs get updated correctly and there is no flopping around. If you lower the physics update cap to 1 there is more flopping because the velocity kept constant for a longer time. Same happens if your CPU is too slow, because you also skipped physics updates.

What many however dont realize is that the average FPS you see is not the physics updates per seconds! Your PhysicsRate*PhysicsUpdateCap needs not only match your FPS, but your min FPS, the points where physics happens! My patch makes this actually harder to realize because it boosts the "physics free" frames while not really touching the min FPS.
Users install my patch, see their avg FPS go from 180 to 270, update their physics rate from 60*3 to 90*3 and wonder why boobs start flopping around, not realizing that their min FPS mostly stayed the same.
"Physics Time" also isnt really physics time because the Performance Monitor in VaM also counts physics free frames into the average as "0.00ms", that took me a while to realize and I added a "Physics Time (only fixedUpdate frames)" metric for myself for testing. Maybe add a "physics fps" metric in the perfmon? Showing how many FixedUpdate() were ran per seconds, so users can adjust their settings to perfectly match that.

I also contemplated if its even worth doing improvements to the physics free frames, but I guess for scenes where there isnt that much physics or when its mostly morphs doing the content, it still counts as an improvement if the the physics rate is dialed down.

TLDR: boob physics didnt improve that much, keep the physics rate and soft body setting you had before or they start flopping around.
 
I gave this a shot, started vam, loaded my scene, and it was already lower performance than before, then ran the script in powershell, get prompt to allow elevate, hit yes, it closes, then back into vr, no improvement from before running the script, and worse than before modifying the ini. Maybe it benefits your CPU more than mine, hard to say.

This is what I ran just to be sure:

Code:
# elevate privileges if we are not running as Administrator, so we can set affinity of Windows owned processes
# source: http://superuser.com/questions/108207/how-to-run-a-powershell-script-as-administrator

param([switch]$Elevated)

function Test-Admin {
    $currentUser = New-Object Security.Principal.WindowsPrincipal $([Security.Principal.WindowsIdentity]::GetCurrent())
    $currentUser.IsInRole([Security.Principal.WindowsBuiltinRole]::Administrator)
}

if ((Test-Admin) -eq $false)  {
    if ($elevated) {
        'tried to elevate to full privileges, did not work, aborting'
    } else {
        'running my self again with full privileges'
        Start-Process powershell.exe -Verb RunAs -ArgumentList ('-executionpolicy bypass -noprofile -file "{0}" -elevated' -f ($myinvocation.MyCommand.Definition))
    }
    exit
}
'running with full privileges'

# binary: 1111000000000000
# means set all processes to core 13-16

$affinity = 61440

'setting all processes to affinity: '+$affinity
'processes unable to set affinity of: '

$allProcesses = Get-Process | Where {$_.ProcessName -NotLike "vam"}
foreach ($process in $allProcesses) {
    try {
        'process: ' + $process
        $process.ProcessorAffinity = $affinity
    }
    catch {
        $process
    }
}
correct, remember you have to rerun it every time you restart steamvr, because it restarts the processes too.
Yeah the problem is you have no cores to spare for the steamvr stuff, so you need to cut the cores that would be used for vam instead. We are basically trying to make that one unity physics thread run as best as possible.
On my CPU it had a bigger impact because steamvr processes kept jumping CCDs and running into my vam threads causing delays. A thread running in CCD2 and then going into CCD1 is especialy bad because his cpu cache has to move with him to the other CCD, which is extra slow.
 
Hi. I have definitely seen a performance increase using this, however I don't think I"m maxing out my primary core. Here are my screenshots. My CPU is a AMD Ryzen 5 2600X Six core. Any recommendations for settings? I don't want to run it too hot, but I'm guessing I may have room to push it more. Or is my GPU bottlenecking it all?

CPU.png
GPU.png
Settings.png
 
correct, remember you have to rerun it every time you restart steamvr, because it restarts the processes too.
Yeah the problem is you have no cores to spare for the steamvr stuff, so you need to cut the cores that would be used for vam instead. We are basically trying to make that one unity physics thread run as best as possible.
On my CPU it had a bigger impact because steamvr processes kept jumping CCDs and running into my vam threads causing delays. A thread running in CCD2 and then going into CCD1 is especialy bad because his cpu cache has to move with him to the other CCD, which is extra slow.
I went ahead and ran it via admin cmd like you said, and didn't see any noticeable increase, but I also bypass steamVR and oculus runtime which helps performance by like 10-15% for me usually. Either way, still seeing gains over vanilla so we're still winning. Thanks so much for putting so much energy into working on performance. Its the main pain point imo, especially in VR. Love my 5800X3D, but its hard not to consider high clocked intel for my next build.
 
amd r5 5600
ps1
$affinity = 3840 # 1111 0000 0000
ini
affinity=1,3,5,7
others set 4
With OC,BlueScreen, CLOCK_WATCHDOG_TIMEOUT
Without OC, No increase in VR and Desktop mode
Maybe only affect in more than one ccd
 
I have a fairly clean, basically unused windows 11 install on my system for testing. I daily windows 10. I know windows 11 is supposed to have better scheduling, so I decided to test my vr scene with patch 12 there, and I am getting about 10% avg fps increase, and my lows are closer to 25% better I believe, and seeing far fewer physics time spikes during heavy bouncing. My windows 10 install is pretty probably two years old and full of junk. Nothing like testing local LLM's and other generative ai tools with like 10 differnt python version requirements, and model porting/ripping tools to make for a messy os. Hard to say if it's windows 11 or just a clean os that's giving me the additional gains.
 
Awesome patch! I tried it out, things appear to be far more smooth in my own custom scenes, however, I'm not really seeing any FPS increase. More just FPS or Physics stability. I ran the benchmark 3 times through with various options as marked below, and oddly enough my FPS actually seems to be going down. Yet what I'm seeing as far as Physic's smoothness/responsiveness in my existing scenes, is much much better. So I dunno what exactly is occurring lol. After applying the patch, my CPU utilization definitely changed, however my core usage is never exceeding 65% usage across the cores in use.

So Vanilla, with no patch looked like this.
1707712822242.png

Base patch applied with the settings for 5900x looked like this:
1707712969141.png

And then I did another one with an edited .ini with MaxPerChar increased from 6 to 8 and got:
1707713160392.png

On the last benchmark that had the edited ini, as instructed if my CPU usage wasn't maxed out in the cores being used, it says to up the MaxPerChar settings closer to Core count. I assume that means both of the option with MaxPerChar included, so I set them both from 6 to 8. Even after upping the MaxPerChar to 8, the usage was still only around 65% at most. Now, my CPU is overclocked from base 3.7GHz to base 4.55GHz. Not sure if that's what is altering the max usage or not. No crashes or anything like that. Like I said in general things appear much more smooth, and where slowdowns would occur in my own existing scenes, those slowdowns(or rather spikes in FPS) no longer occur. Instead it just stay's far more consistant.

[threads]
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8
affinity=1,3,5,7,9,11
 
Last edited:
Awesome patch! I tried it out, things appear to be far more smooth in my own custom scenes, however, I'm not really seeing any FPS increase. More just FPS or Physics stability. I ran the benchmark 3 times through with various options as marked below, and oddly enough my FPS actually seems to be going down. Yet what I'm seeing as far as Physic's smoothness/responsiveness in my existing scenes, is much much better. So I dunno what exactly is occurring lol. After applying the patch, my CPU utilization definitely changed, however my core usage is never exceeding 65% usage across the cores in use.

So Vanilla, with no patch looked like this.

Base patch applied with the settings for 5900x looked like this:

And then I did another one with an edited .ini with MaxPerChar increased from 6 to 8 and got:

On the last benchmark that had the edited ini, as instructed if my CPU usage wasn't maxed out in the cores being used, it says to up the MaxPerChar settings closer to Core count. I assume that means both of the option with MaxPerChar included, so I set them both from 6 to 8. Even after upping the MaxPerChar to 8, the usage was still only around 65% at most. Now, my CPU is overclocked from base 3.7GHz to base 4.55GHz. Not sure if that's what is altering the max usage or not. No crashes or anything like that. Like I said in general things appear much more smooth, and where slowdowns would occur in my own existing scenes, those slowdowns(or rather spikes in FPS) no longer occur. Instead it just stay's far more consistant.

[threads]
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=8
applyMorphMaxPerChar=8
affinity=1,3,5,7,9,11

There is no reason to have have skinmeshPartMaxPerChar > skinmeshPart, it will still only run at most skinmeshPart amount of threads per character, in the baseline3 scene there are 3 characters, so 6/3=2 is the amount of threads that will be run per character in that regard. You could also try

[threads]
computeColliders=6
skinmeshPart=6
applyMorphs=6
skinmeshPartMaxPerChar=1
applyMorphMaxPerChar=1
affinity=1,3,5,7,9,11

and see what happens, it could improve performance if the 3 CPU cores then clock higher with less threads running. It really depends on your memory clock and latency. If you have fast memory, less threads are sometimes faster.
Also check if you have other processes running in the background like chrome, steam or discord. That baseline3 is really sensitive to other processes stealing CPU time.

Your max1% times in baseline3 are pretty much exactly the same as mine on my 5950x.

1707730878899.png

This means that your physics happen as fast as mine. As the scene has 3 characters, my CPU runs 8/3=2 threads per character and yours runs 6/3=2 threads per characters, so we should have the exact same benchrmark results... You can see that for me 1000/9.00 = 111fps, which is spot on on my avg FPS, while in your benchmark

1707731125448.png


1000/10.20 should equate to 98 FPS. Something is stealing away the CPU time after VaM stopped recording the time it needed for work and waited for the next frame to start.
 
Last edited:
Hi. I have definitely seen a performance increase using this, however I don't think I"m maxing out my primary core. Here are my screenshots. My CPU is a AMD Ryzen 5 2600X Six core. Any recommendations for settings? I don't want to run it too hot, but I'm guessing I may have room to push it more. Or is my GPU bottlenecking it all?

View attachment 334203View attachment 334204View attachment 334201
1707731646560.png


This is the time when your CPU waits for the GPU to finish and your GPU is at 99% at the same time, so you have perfected the settings and the GPU is bottlenecking. When you see 1 or 2 cores running at absolutely flat 100% it means your GPU waits for the CPU and it is bottlenecking.
 
View attachment 334246

This is the time when your CPU waits for the GPU to finish and your GPU is at 99% at the same time, so you have perfected the settings and the GPU is bottlenecking. When you see 1 or 2 cores running at absolutely flat 100% it means your GPU waits for the CPU and it is bottlenecking.

Thanks! That's what I figured. Any thoughts on my VR settings? I use VR mostly. Can I up those or should I leave at what they are? I got these current settings from someone who helped me get started with this.
 
Thanks! That's what I figured. Any thoughts on my VR settings? I use VR mostly. Can I up those or should I leave at what they are? I got these current settings from someone who helped me get started with this.
I would suggest to use exactly the same settings and bypass steamvr somehow, there is a guide on this forum somewhere.
 
Sorry....to clarify....exactly the same settings as I already have or exactly the same between Desktop and VR?
Thanks again for your help!
 
I think I found the cause of the regular garbage collection stutter, VaM creates 2926 strings per second which then need to be garbage collected. A fix is easy.
Edit: done, I had 1-2 stutters in the 70secs of benchmark baseline3, now I have 0 lol. Will be released in next version.
 
Last edited:
Can you even bypass steam VR with an Index or a Vive? Ive only successfully done it with Oculus Quest.
That's the weird thing. For years I've had VAM added into my Oculus software as an app. I don't think it even runs Steam to use it. I did try this trick and didn't really see much change, if any.
 
I agree with almost everything you said, except my multithreading of GPUCollidersManager::ComputeColliders(), I realized while multithreaded writing to physx is a no-go, you can multithread reading access to the physx engine without problems. In my patch I fill the GPSphere[] and the GPLineSphere[] that goes to the shaderkernel multithreaded. This call happens in FixedUpdate() and it cut the physics time a little, what we yearn for most.
All other optimizations are purely for dick measuring in benchmarks lol.
The other thread schedueling workarounds apply to pretty much every game and there was nothing wrong with VaM in that regard.

That is interesting. I didn't realize that it was possible to get data from Physx on a thread. Yes - anything you can take out of FixedUpdate will directly improve performance! Awesome! If I could officially roll in 3 changes, it would be this, the zip read fix, and the memory/runtime optimization you did for the morph code that runs in a thread. I do think killing the cache in the morph processing could easily hurt main thread performance. I hadn't considered that before, so I'm glad you pointed that aspect out.

You are also correct about the physics thing. Here is a simpler explaination as far as I understood:
If you your physics rate is 90hz and the physics update cap is 3, the optimal boobs update is 270 times per second, so if you manage to hit 270 physics updates per second the boobs get updated correctly and there is no flopping around. If you lower the physics update cap to 1 there is more flopping because the velocity kept constant for a longer time. Same happens if your CPU is too slow, because you also skipped physics updates.

That isn't quite right. If you have physics rate at 90FPS, that is what physics runs at. But lets say your actual FPS is 45FPS. In that case, 2 FixedUpdates run for every Update and rendered frame. Let's now say your actual FPS is 30FPS. In that case, 3 FixedUpdates run for every Update. At some point, this gets ridiculous. The Cap was put in to limit the number of FixedUpdates that can occur for every Update and rendered frame. If you set this to 1, there will be exactly 1 FixedUpdate per Update if your FPS is below 90FPS. If your FPS is above 90FPS, you will have more Updates than Fixed Updates. If your FPS is below 90FPS, physics now runs at your rendered frame rate. This results in a slow-mo effect because Physics is not able to hit the desired 90 Hz updates. The ideal situations is your rendered FPS is always above the physics rate and CAP would play not part.
 
That is interesting. I didn't realize that it was possible to get data from Physx on a thread. Yes - anything you can take out of FixedUpdate will directly improve performance! Awesome! If I could officially roll in 3 changes, it would be this, the zip read fix, and the memory/runtime optimization you did for the morph code that runs in a thread. I do think killing the cache in the morph processing could easily hurt main thread performance. I hadn't considered that before, so I'm glad you pointed that aspect out.



That isn't quite right. If you have physics rate at 90FPS, that is what physics runs at. But lets say your actual FPS is 45FPS. In that case, 2 FixedUpdates run for every Update and rendered frame. Let's now say your actual FPS is 30FPS. In that case, 3 FixedUpdates run for every Update. At some point, this gets ridiculous. The Cap was put in to limit the number of FixedUpdates that can occur for every Update and rendered frame. If you set this to 1, there will be exactly 1 FixedUpdate per Update if your FPS is below 90FPS. If your FPS is above 90FPS, you will have more Updates than Fixed Updates. If your FPS is below 90FPS, physics now runs at your rendered frame rate. This results in a slow-mo effect because Physics is not able to hit the desired 90 Hz updates. The ideal situations is your rendered FPS is always above the physics rate and CAP would play not part.
The man himself! hope you are doing fine with vam 2 project after unity new bullshit fee policy. That messed up alot of devs :(
 
This results in a slow-mo effect because Physics is not able to hit the desired 90 Hz updates. The ideal situations is your rendered FPS is always above the physics rate and CAP would play not part.
Yes, I was mistaken. When FPS matches physics rate, every physics update happened and there is no slowdown and no flopping. Once it dips below that, there were missed adjustements and the boobs can end up in out of bounds trajectories. This slowdown is just masked a little by AtomTimeline because it seems to still set the positions of bones at fixed times, so only the boobs are in slowmotion.
 
The man himself! hope you are doing fine with vam 2 project after unity new bullshit fee policy. That messed up alot of devs :(
Looking through VaM code I wonder if there is even a need for unity lol. Unity seems to just be for running mono, displaying UI, keeping a list of game objects, forwarding positions to physx and starting rendering. Everything else was written by @meshedvr from scratch.
 
Looking through VaM code I wonder if there is even a need for unity lol. Unity seems to just be for running mono, displaying UI, keeping a list of game objects, forwarding positions to physx and starting rendering. Everything else was written by @meshedvr from scratch.
Interesting thats news to me, mesh really did alot on his own didn't he, he is a quality over quantity person which shows in VAM and within its dedicated community.
 
ThreadAlloc countAlloc sizeKlass name
0​
175535​
8425680​
ConfigurableJoint[]
61
67398
5712342
String
71
61572
4859688
String
0​
175535​
4212840​
KeyCollection
50
46592
4011298
String

Just need to find a way to safely prevent those excessive ConfigurableJoint[] allocations in SyncMorphBoneRotations() and then there should be a lot less often and alot faster garbage collection stutters. Those strings in ApplyBoneMorphs() were 23% of all allocations, those ConfigurableJoints are 46%(!) in every frame. Stutters should be basicaly nonexistant after I fix this.
 
Back
Top Bottom