Performance Patch (up to 30% more physics speed)

Seraphim · Feb 4, 2024

Is this the updated fixed one you posted in benchmark thread, or the original one?

turtlebackgoofy · Feb 4, 2024

Seraphim said:
Is this the updated fixed one you posted in benchmark thread, or the original one?

when number goes up, its a new version

turtlebackgoofy · Feb 4, 2024

hijku said:
Maybe even simply changing the condition could help a lot (without going into two lists straight away)?
from
if(!morph.disable)
to
if(morph.active)

added a patch for it, please test

trety · Feb 4, 2024

Just a question, since i haven't tested v9 yet [last version i tested was v7 back from benchmark thread]... I see you started messing with morphs, and it makes me wonder... If the plugins still will be able to use the morphs which weren't loaded on scene launch? Like Naturalis, Shake it etc?
I just noticed it also makes my local morphs disseapearing. I worked a bit past few days on a model, creating a few versions of hers morph, and once i start VaM it shows only the active morph with hers name. I need to rescan morphs to see the other versions, and no, they're named differently, like 'character name moph', and next one is 'character_m_v2, next 'character_final' etc, it's not VaM 'versioning'.

All of the morphs were installed since the begining ofc.
Maybe it is on purpose...?

turtlebackgoofy · Feb 4, 2024

trety said:
Just a question, since i haven't tested v9 yet [last version i tested was v7 back from benchmark thread]... I see you started messing with morphs, and it makes me wonder... If the plugins still will be able to use the morphs which weren't loaded on scene launch? Like Naturalis, Shake it etc?
I just noticed it also makes my local morphs disseapearing. I worked a bit past few days on a model, creating a few versions of hers morph, and once i start VaM it shows only the active morph with hers name. I need to rescan morphs to see the other versions, and no, they're named differently, like 'character name moph', and next one is 'character_m_v2, next 'character_final' etc, it's not VaM 'versioning'.

View attachment 331520

View attachment 331521

All of the morphs were installed since the begining ofc.
Maybe it is on purpose...?

what version did you use? My patch doesnt remove the morphs, it just prevents the engine from rendering the ones that have a value of 0.0. Does it only happen to local morphs? Can you still see morphs from vars?

Naturalis works as it should. Once a morph has been changed from 0.0 once in the scene, it gets rendered until scene reload. All installed morphs are available in the UI, they are just not applied to the model until they are changed from 0.0 once.
Try it with the newest patched9_morph_clutter_fix2.zip on the first post.

trety · Feb 4, 2024

Okay i actually installed yesterday V8 according to files dates. Sorry, thought i'm still at v7.
It hides pretty much all of my local morphs, and as for vars i have no idea what criteria it takes.
My install is very custom, i manually editted and created two versions of 'morphs' packages and i'm switching them between 'play' and 'work' sessions, one with preload turned off and 2nd, with preload on.
Still in my 'play session' i have 950 pages of female morphs [mostly cuz of expressions which are lighter than fully rigged morphs], but with patch i see only 237 pages of totally random morphs.

Well, as long as plugins can 'hot-load' the missing morphs durring a scene play it doesn't bother me much.

/ Edit /
Installed v9_fixed

and it's the same. I can see only the part of my morphs from vars and nothing from local files. Reload button in morphs tab is loading the local ones, but still can't see the rest of morphs from vars - guess that the purpose to disable them when no need them. It might be problematic for scenes content creators - if they won't be able to see expression morphs etc.
Like i said it's not a problem for me, if it's stay that way. It seems like i have already 100+ pages of my local morphs [mostly models i made\ported lol] so it might be better if theirs morphs are disabled until i actually need them. And since i can quickly load them up via in game reload button, for example when starting new model with brand new morphs, it's all fine.

// Edit 2 //
Just tried to roll back to V7 and it seems it's the same. Just an part of var morphs showing up and no local at all.
So... In current state it's super useful patch for people who just want to play scenes, but no edit stuff on their own.

How useful?
I also tried newest version, tuned it to my cpu, and started another benchmark.
Got another 0.02 physics time reduction in baseline 3 test comparing to v7

/ Edit 3 /
I'm blind, the issue was only for local morphs, which is fixed now.

giallone · Feb 4, 2024

turtlebackgoofy said:
what version did you use? My patch doesnt remove the morphs, it just prevents the engine from rendering the ones that have a value of 0.0. Does it only happen to local morphs? Can you still see morphs from vars?

Naturalis works as it should. Once a morph has been changed from 0.0 once in the scene, it gets rendered until scene reload. All installed morphs are available in the UI, they are just not applied to the model until they are changed from 0.0 once.
Try it with the newest patched9_morph_clutter_fix2.zip on the first post.

I confirm too, unless they are used in the scene (so different value from 0), the local morphs don't appear at all in the morphs list. A "reload custom morphs" will make them appear as usual.

turtlebackgoofy · Feb 4, 2024

trety said:
Okay i actually installed yesterday V8 according to files dates. Sorry, thought i'm still at v7.
It hides pretty much all of my local morphs, and as for vars i have no idea what criteria it takes.
My install is very custom, i manually editted and created two versions of 'morphs' packages and i'm switching them between 'play' and 'work' sessions, one with preload turned off and 2nd, with preload on.
Still in my 'play session' i have 950 pages of female morphs [mostly cuz of expressions which are lighter than fully rigged morphs], but with patch i see only 237 pages of totally random morphs.

Well, as long as plugins can 'hot-load' the missing morphs durring a scene play it doesn't bother me much.

/ Edit /
Installed v9_fixed

View attachment 331540

and it's the same. I can see only the part of my morphs from vars and nothing from local files. Reload button in morphs tab is loading the local ones, but still can't see the rest of morphs from vars - guess that the purpose to disable them when no need them. It might be problematic for scenes content creators - if they won't be able to see expression morphs etc.
Like i said it's not a problem for me, if it's stay that way. It seems like i have already 100+ pages of my local morphs [mostly models i made\ported lol] so it might be better if theirs morphs are disabled until i actually need them. And since i can quickly load them up via in game reload button, for example when starting new model with brand new morphs, it's all fine.

how do I create local morphs so I can test the bug?

giallone · Feb 4, 2024

turtlebackgoofy said:
how do I create local morphs so I can test the bug?

You can just take a morph from a .var and copy it into the appropriate folder in \Custom\Atom\Person\Morphs\Female (or the others if is a genital or male morph)...
It will appear in your morphs list with blue background after a "reload custom morphs" or restarting VaM

turtlebackgoofy · Feb 4, 2024

giallone said:
You can just take a morph from a .var and copy it into the appropriate folder in \Custom\Atom\Person\Morphs\Female (or the others if is a genital or male morph)...
It will appear in your morphs list with blue background after a "reload custom morphs" or restarting VaM

Oops, I made all morphs ondemand loaded in version 5 for testing, although it didnt give any performance benefit and accidentally released the change. I updated the first post, so update to patched9_morph_clutter_fix3 https://pixeldrain.com/u/XPexV2Rf and its fixed again.

trety · Feb 4, 2024

So... uhm, i'm sorry but seems like i was blind lol
Like i said my install is pretty well tuned 'for play' without morphs bloat effect and... i have only 350 pages, not 950 as i said before. ?‍

I'm using 4K TV, and started Vam at windowed 1080p for testing and seems like i can't even read properly lol. Sorry.
So the 'issue' was only with local morphs, which is now fixed with v9_fix3.

And it's now makes me wonder... if it's possible to make it toggleable in ini? I know you just reverted it, but it might actually benefit havin local morphs 'on demand' for people like me, with many of them. I thought it affects them in vars too, and thats why i even started this, if i knew it was really just for local ones i wouldn't even say anything in regard of this QQ

turtlebackgoofy · Feb 4, 2024

trety said:
So... uhm, i'm sorry but seems like i was blind lol
Like i said my install is pretty well tuned 'for play' without morphs bloat effect and... i have only 350 pages, not 950 as i said before. ?‍
I'm using 4K TV, and started Vam at windowed 1080p for testing and seems like i can't even read properly lol. Sorry.
So the 'issue' was only with local morphs, which is now fixed with v9_fix3.

And it's now makes me wonder... if it's possible to make it toggleable in ini? I know you just reverted it, but it might actually benefit havin local morphs 'on demand' for people like me, with many of them. I thought it affects them in vars too, and thats why i even started this, if i knew it was really just for local ones i wouldn't even say anything in regard of this QQ

It does not benefit to make morphs ondemand loaded, it was a red hering when I tested it. All known morphs except local morphs were loaded at scene load anyway, so the only thing it did was introduce the bug lol.
However what DOES benefit is not RENDERING morphs while they are loaded, but werent changed yet. This behaviour is now included in the newest version you just ran.

I also noticed that hyperthreads benefit 2D VaM heavily, but tank the performance in VR somehow. Disabling HT for VR gets me ironman numbers (half physics time, +50% fps) in VR lol. I will release a new version with more options tomorrow. I always wondered why I get half the fps in VR although I have a 4090 and the render time is 1ms 2D vs 2ms VR

trety · Feb 4, 2024

I'm not a tech guy, i just brute forced VaM to run as best as possible with tries and errors.
So,

turtlebackgoofy said:
However what DOES benefit is not RENDERING morphs while they are loaded, but werent changed yet. This behaviour is now included in the newest version you just ran.

means that local morphs are visible, but not really 'loaded' on the model until they are changed? That super cool. [reverting back from v9fix2 to v9fix3

]

I also don't really get that HT talk. From my testing, i got better results with Intel's HT disabled in bios than enabled. Even with yours patched dlls.
I still have the cores\threads, they are just not using intel HT technology.
I can't give exact numbers how much it improved my VR experience. But it definatelly helps a lot. My 'quick vr session scene' - cua map, male, female, bunch of plugins, 4 lights, all soft body physics on, went up from 40ish to 55-60fps in male pov cowgirl poses on G2 HMD. Thats freakin awesome.

beme · Feb 5, 2024

what are you guys setting in the INI file for an i9-13900k?

based on raw thread count I set mine like this

[threads]
computeColliders=24
skinmeshPart=1
CCD=0
IterateCCD=0

Not seeing any gains on my system...

turtlebackgoofy · Feb 5, 2024

beme said:
what are you guys setting in the INI file for an i9-13900k?

based on raw thread count I set mine like this

[threads]
computeColliders=24
skinmeshPart=1
CCD=0
IterateCCD=0

Not seeing any gains on my system...

maybe try setting it lower? If your cpu is very fast and the computecolliders goes through very fast, the time to wait for the threads takes longer than the work. Also try

[threads]
computeColliders=6
skinmeshPart=1
CCD=0
IterateCCD=0

and then set process affinity via task manager. Go to details, right click on vam.exe and select process affinity to your first 8 cores (I assume those are performance cores)

FapperGOD · Feb 5, 2024

Can I set CCD to 0 in skinmeshpartdll.ini if I have an AMD 5600X with two CCDs?
One is disabled and only CCD2 is functional.
Also what skinmeshpart value should I use?

turtlebackgoofy · Feb 5, 2024

FapperGOD said:
Can I set CCD to 0 in skinmeshpartdll.ini if I have an AMD 5600X with two CCDs?
One is disabled and only CCD2 is functional.
Also what skinmeshpart value should I use?

if its disabled in bios or ryzen master:
if you have HT enabled
set CCD=0, computeColliders=4, skinmeshPart=1
if HT is disabled too
set CCD=0, computeColliders=2, skinmeshPart=1

meshedvr · Feb 5, 2024

Thanks for doing this!

I mentioned on the other thread as well before I realized you had a dedicated one. We reached out to you to see ask to move this to resources where we can flag it for providing a replacement dll.

I considered rewriting the morph iteration code you mention but unfortunately there were some complexities in doing so that I couldn't easily address with the time I had. VaM2 has already fixed this by morphs registering changes to their values on-demand. Event driven. That way the iteration is only done on morphs that changed that frame. Also VaM2 is using Unity's new Burst and Jobs system with better memory structs for the code that is most demanding. VaM2 also doesn't support morph-controlled-morphs or formula morphs which is the cause of some of the extra possible iterations in the morph code.

I appreciate your effort to make the original VaM more performant. In the other thread I also mentioned you are sometimes fixing things that are running on a side thread and would typically complete before the main thread needed the data. But I suppose on some CPUs it is not able to so the main thread waits on that side thread to complete and lowers overall FPS. I spent most of my time trying to optimize main thread code as that is generally where the bottleneck is.

turtlebackgoofy · Feb 5, 2024

meshedvr said:
Thanks for doing this!

I mentioned on the other thread as well before I realized you had a dedicated one. We reached out to you to see ask to move this to resources where we can flag it for providing a replacement dll.

I considered rewriting the morph iteration code you mention but unfortunately there were some complexities in doing so that I couldn't easily address with the time I had. VaM2 has already fixed this by morphs registering changes to their values on-demand. Event driven. That way the iteration is only done on morphs that changed that frame. Also VaM2 is using Unity's new Burst and Jobs system with better memory structs for the code that is most demanding. VaM2 also doesn't support morph-controlled-morphs or formula morphs which is the cause of some of the extra possible iterations in the morph code.

I appreciate your effort to make the original VaM more performant. In the other thread I also mentioned you are sometimes fixing things that are running on a side thread and would typically complete before the main thread needed the data. But I suppose on some CPUs it is not able to so the main thread waits on that side thread to complete and lowers overall FPS. I spent most of my time trying to optimize main thread code as that is generally where the bottleneck is.

wow the main dev noticed. If you want I can send you a cleaned up source code so you can integrate it into the main code. It's really not that hard to integrate it into the main development. I moved it to "other" and removed the external links.

I disagree on the "side thread" part, because every DAZCharacterRun thread takes about 6ms and the main thread has to wait for the slowest DAZCharacterRun thread to finish. Also every thread that finishes fast and does no longer polute the CPU cache or hogs the memory bus, speeds up the main thread (including the unity physics engine). The only thing that a side thread does not influence is the GPU rendering, which with current GPUs is already the lowest time.

All performance patches summed up:
1) (side threads) doing float point math like in SkinMeshPart is a lot faster if you compile the same operations with native SSE2/AVX2 instead of doing them in C# IL
2) disabling your threaded skinmeshing and instead letting the .dll do the thread management natively is faster.
3) if skinmeshing is already very fast you can skip the threading altogether, it also fixes the bug where skinmeshes flop around for a few frames, because you processed the vertexes in the wrong operation order, you already tried to mitigate it by splitting the threads not by bones but by vertexIDs, but some vertexes are shared between bones and those can spazz out
4) (main thread 0) if you call a lot of unity engine methods like getPosition or getTransform, its faster to call them directly from native->native using their native names internalcall_XXXX instead of C#->native, because every C#->native transition costs a lot of CPU. ComputeColliders did those 100k times per frame, this was the biggest performance gain.
5) (all threads) setting core affinity for the process, making sure the code stays on one CCD so all threads share a highspeed CPU cache. This is basicaly free performance on top and should be configurable for advanced users

Additional things that really should be considered bugs:
In DAZCharacterSelector.cs in the method SetActiveClothingItem() please add an additional bool skipSyncAnatomy parameter so when it gets called by ResetClothing() it skips the SyncAnatomy() at the end and instead do SyncAnatomy() once in SetActiveClothingItem(). Otherwise you iterate over all clothingItems exponentially!

I know it will be fixed in VAM2, but it only took half an hour patching it using dnspy:
Another performance patch was to mark dazmorphs as "touched" once appliedValue or morphValue was set via the setters and to add them to a seperate list in their corresponding morphbank, so that on every frame in ApplyMorphsThreadedFast() only those in the "touched list" get iterated. Otherwise you iterate all installed morphs, eventhough they are at 0.0 and while it looks harmless in terms of operations wasted, it pollutes the CPU cache heavily.

meshedvr · Feb 5, 2024

I would consider rolling this back into the main release if you are willing to allow me to do that. I would have to review in case you missed corner cases.

It may be dependent on your CPU, but generally my wait time on the side threads is 0. I won't complain about optimizations to the side thread though. If it helps in certain or many cases that is great. The main thread is doing other things while the side thread is running in an async manner (1 frame delay). So the wait can be 0 depending on which thread wins the race.

I do 100% agree with the issue with memory and cache invalidation. We are minimizing that in VaM2 with the new structs and jobs system as best possible. The jobs system outputs optimized assembly so it is very efficient. The rest of VaM2 is going to be primarily c# though.

1) (side threads) doing float point math like in SkinMeshPart is a lot faster if you compile the same operations with native SSE2/AVX2 instead of doing them in C# IL

Yeah I can imagine. This is essentially what VaM2 does with new skinning and morph engine (using Unity Jobs system). I would have loved to do that for VaM but I had to move on to VaM2.

2) disabling your threaded skinmeshing and instead letting the .dll do the thread management natively is faster.

Yeah I could see that. The built-in uses fixed assumption for how many threads to run because there isn't an overall coordinator for all Person atoms and each Person atom runs the async side thread which then dispatches the fixed number of subthreads for the actual skinning part. Glad you found a better way to handle that.

3) if skinmeshing is already very fast you can skip the threading altogether, it also fixes the bug where skinmeshes flop around for a few frames, because you processed the vertexes in the wrong operation order, you already tried to mitigate it by splitting the threads not by bones but by vertexIDs, but some vertexes are shared between bones and those can spazz out

I'm not quite sure what you mean there. The subthreads of the skinning all write to consistent vertex array and then that array is not used until all subthreads are complete. Something doesn't sound right here.

4) (main thread 0) if you call a lot of unity engine methods like getPosition or getTransform, its faster to call them directly from native->native using their native names internalcall_XXXX instead of C#->native, because every C#->native transition costs a lot of CPU. ComputeColliders did those 100k times per frame, this was the biggest performance gain.

Yeah I optimized as much as possible while staying in c#. I am glad you took the time to figure this out with native calls. This is again solved (mostly) in VaM2 because of the jobs system bypassing all the c# layers for the code that needs to be performant.

5) (all threads) setting core affinity for the process, making sure the code stays on one CCD so all threads share a highspeed CPU cache. This is basicaly free performance on top and should be configurable for advanced users

That's cool!

I know it will be fixed in VAM2, but it only took half an hour patching it using dnspy

I'm curious what you used for profiling. VaM was only optimized using Unity's profiler on several different testcases, and without moving to native calls there was limited amount of things that could be done. Unity doesn't allow accessing native methods without making a C-based plugin. It is a bit of a pain. I guess you did that though.

Another performance patch was to mark dazmorphs as "touched" once appliedValue or morphValue was set via the setters and to add them to a seperate list in their corresponding morphbank, so that on every frame in ApplyMorphsThreadedFast() only those in the "touched list" get iterated. Otherwise you iterate all installed morphs, eventhough they are at 0.0 and while it looks harmless in terms of operations wasted, it pollutes the CPU cache heavily.

That is how VaM2 works, but on VaM we have morphs that can modify other morphs values and I believe it doesn't work correctly with what you describe. I was considering doing a fix like this for VaM, but something held me back from doing it, and I think the reason was the morph-controlled-morph scenario and formula morphs. I didn't want to break functionality for something that is running on a sidethread, but I do see your point about cache invalidation due to massive size of morph class data and random access.

turtlebackgoofy · Feb 5, 2024

meshedvr said:
Thanks for doing this!

I mentioned on the other thread as well before I realized you had a dedicated one. We reached out to you to see ask to move this to resources where we can flag it for providing a replacement dll.

I considered rewriting the morph iteration code you mention but unfortunately there were some complexities in doing so that I couldn't easily address with the time I had. VaM2 has already fixed this by morphs registering changes to their values on-demand. Event driven. That way the iteration is only done on morphs that changed that frame. Also VaM2 is using Unity's new Burst and Jobs system with better memory structs for the code that is most demanding. VaM2 also doesn't support morph-controlled-morphs or formula morphs which is the cause of some of the extra possible iterations in the morph code.

I appreciate your effort to make the original VaM more performant. In the other thread I also mentioned you are sometimes fixing things that are running on a side thread and would typically complete before the main thread needed the data. But I suppose on some CPUs it is not able to so the main thread waits on that side thread to complete and lowers overall FPS. I spent most of my time trying to optimize main thread code as that is generally where the bottleneck is.

oh and in DAZMorph.LoadDeltasFromBinaryFile() you need to load all deltas in one go, instead of loading them by float, like this:

Code:

public void LoadDeltasFromBinaryFile(string path)
    {
        try
        {
            using (FileEntryStream fileEntryStream = FileManager.OpenStream(path, true))
            {
                using (BinaryReader binaryReader = new BinaryReader(fileEntryStream.Stream))
                {
                    this.numDeltas = binaryReader.ReadInt32();
                    this.deltas = new DAZMorphVertex[this.numDeltas];
                    int num = 16;
                    using (MemoryStream memoryStream = new MemoryStream(binaryReader.ReadBytes(this.numDeltas * num)))
                    {
                        using (BinaryReader binaryReader2 = new BinaryReader(memoryStream))
                        {
                            for (int i = 0; i < this.numDeltas; i++)
                            {
                                DAZMorphVertex dazmorphVertex = new DAZMorphVertex();
                                dazmorphVertex.vertex = binaryReader2.ReadInt32();
                                Vector3 vector;
                                vector.x = binaryReader2.ReadSingle();
                                vector.y = binaryReader2.ReadSingle();
                                vector.z = binaryReader2.ReadSingle();
                                dazmorphVertex.delta = vector;
                                this.deltas[i] = dazmorphVertex;
                            }
                        }
                    }
                }
            }
        }
        catch (Exception ex)
        {
            Debug.LogError(string.Concat(new object[] { "Error while loading binary delta file ", path, " ", ex }));
        }
    }

Otherwise you call the zip library 4 times per delta per numDeltas, which results in 100k calls and 100k small reads from the disk. This removes lag in scenes with prerecorded animations.

meshedvr · Feb 5, 2024

Would you like to join my team?!?

hijku · Feb 5, 2024

I run few tests on one of my scenes and here are my results on 4090, i9-13900k. VAM set to use 8 p cores and:
[threads]
computeColliders=6
skinmeshPart=1
CCD=0
IterateCCD=0

settings: soft body & high qual physics on, 90Hz + physic cap 2
mssa 8 - but gpu is not fully utilized anyway so it's cpu bottleneck

Clean VAM
Without patch: 115 fps avg when idle, 75 fps during animation
With patch9_morph_fix3: 118 fps avg when idle, 77 fps during animation

VAM with 5k+ vars
Without patch: 102 fps avg when idle, 50 fps during animation
With patch8: 105 fps avg when idle, 54 fps during animation
With patch9_morph_fix3: 108 fps avg when idle, 57 fps during animation

So while the patch8 and patch9 definitely improves the performance, something still tanks the FPS in my beefy VAM.
I'm no where near the 77 fps that I have on clean vam.
What are you using for profiling? I never done it for unity and dotTrace shows me nothing.

/edit and just to be clear, I can reproduce the performance hit on clean VAM by just copying Saves, Custom and AddonPackages from beefy VAM

meshedvr · Feb 5, 2024

turtlebackgoofy said:
Otherwise you call the zip library 4 times per delta per numDeltas, which results in 100k calls and 100k small reads from the disk. This removes lag in scenes with prerecorded animations.

Wait you are telling me ReadInt32() and ReadSingle() in the original code is actually opening the file every single time? That's insane and I had no idea the zip library would work like that.

C#:

        public void LoadDeltasFromBinaryFile(string path) {
                //Debug.Log("Loading deltas for morph " + morphName + " from "+path);
                try {
                        using (FileEntryStream fes = FileManager.OpenStream(path, true)) {
                                using (BinaryReader binReader = new BinaryReader(fes.Stream)) {
                                        numDeltas = binReader.ReadInt32();
                                        deltas = new DAZMorphVertex[numDeltas];
                                        for (int ind = 0; ind < numDeltas; ind++) {
                                                DAZMorphVertex dmv = new DAZMorphVertex();
                                                dmv.vertex = binReader.ReadInt32();
                                                Vector3 v;
                                                v.x = binReader.ReadSingle();
                                                v.y = binReader.ReadSingle();
                                                v.z = binReader.ReadSingle();
                                                dmv.delta = v;
                                                deltas[ind] = dmv;
                                        }
                                }
                        }
                }
                catch (System.Exception e) {
                        Debug.LogError("Error while loading binary delta file " + path + " " + e);
                }
        }

turtlebackgoofy · Feb 5, 2024

Yeah I can imagine. This is essentially what VaM2 does with new skinning and morph engine (using Unity Jobs system). I would have loved to do that for VaM but I had to move on to VaM2.

I actually tried the JobSystem first (without the burst compiler) and it made things even worse, because there was alot of additional memory copying and in the end the unity jobs were just glorified C# threads that you already use in skinmeshing. The newest version of unity jobs allows you to access transformations without the C#->Native transition. I think if you also do IL2CPP for those special jobs (is it even possible without making the whole game IL2CPP and breaking all third party scripts?) you might see some benefit, but it will not match a .dll compiled for AVX2 and optimized with clang.

I'm not quite sure what you mean there. The subthreads of the skinning all write to consistent vertex array and then that array is not used until all subthreads are complete. Something doesn't sound right here.

Yeah gotta look it up again, I think I mixed something up. Please correct me if I didnt understand it correctly: When you process skinmeshing you iterate over all bones and then apply the weights (not an expert on game engines) and as a result you get morphed vertexes that get rendered with the skin. What I learned was that you need to morph the vertexes in the order of the bones, otherwise you get a wrong result and the skin is all over the place. Basicaly you need to process every DAZSkinV2VertexWeights of every bone one by one and cant multithread it in theory. You however did a trick where each thread processes each bone in the same order, but each thread has a range of final vertexes that they are allowed to touch.

I'm curious what you used for profiling. VaM was only optimized using Unity's profiler on several different testcases, and without moving to native calls there was limited amount of things that could be done. Unity doesn't allow accessing native methods without making a C-based plugin. It is a bit of a pain. I guess you did that though.

BepInEx.Debug/src/SimpleProfiler at master · BepInEx/BepInEx.Debug

Tools for debugging and developing BepInEx plugins - BepInEx/BepInEx.Debug

github.com

I changed the code a little so it prints the last 3 methods of the callstack to the csv. I basicaly used it to find methods that are called many many times per frame or take an enourmous amount of time. Take the numbers with a grain of salt though, if a method is fast and gets called a lot, the profiling overhead inflates the measured time and a method you suspect of costing 3ms every frame ends up beeing 0.2ms. It also obviously inflates the time of the method that called the method.

That is how VaM2 works, but on VaM we have morphs that can modify other morphs values and I believe it doesn't work correctly with what you describe.

In ApplyMorphsThreadedFast() you iterate over all morphs and then only continue with the morph if (dazmorph.morphValue != 0.0 || dazmorph.appliedValue != dazmorph.morphValue). My solution caches the morphs where this condition can even be true in a _touchedMorphs list. This condition can only ever be true if appliedValue or morphValue have ever been changed from their default 0.0 value. Once one of those two setters were called, they are marked as "touched" and put into that _touchedMorphs list. Both lists are ofc just references to the same object, since they arent structs. So I dont see any reason where that might break things.
Also be careful when unity says "faster with event system" and you have a lot of objects like you do have morps in vam. When working with huge numbers of data you need to really careful when they are actually read and what unity additionaly reads for every one of them since again, CPU cache and so on.

Performance Patch (up to 30% more physics speed)

Active member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

New member

Well-known member

New member

Well-known member

Administrator

Well-known member

Administrator

Well-known member

Administrator

Member

Administrator

Well-known member

Similar threads