I really love this idea, i was trying to do this before but using an external python connection directly to VAM via a plugin. Didnt work as i expected but was fun to experiment with. You should really check out BoneReceiver it is actually pretty cool and its actually so simple.
This is an experimental project to drive Virt-A-Mate (VaM) skeleton animations using a Python-based CVAE Machine Learning engine. [ Quick Start Guide ] 1. Download the ML Skeleton Engine (Python) [...
hub.virtamate.com
This isn't "LLM for scenes" or some universal model trained "entirely on VaM." It's a PyTorch model for skeletal motion generation, specifically GRU-based CVAE—the class is called PoseGRU_CVAE in the code. It takes a history of poses and condition flags, generates the next pose, then the Python engine sends the result to VaM via UDP, and a C# plugin applies it to the controllers.
What is this model, exactly?
Architecture — CVAE + GRU;
Input — a sequence of length 20 (SEQ_LEN = 20);
Latent space — 24 (LATENT_DIM = 24);
Hidden size — 1024;
Controls only 9 bones/controllers: hip, chest, head, both arms, both legs, both knees.
How is its pose format structured:
Each target receives 9 numbers;
The first 3 are the position;
the remaining 6 are the rotation in 6D representation;
then this is converted to a quaternion and sent to VaM.
How it works at runtime:
Python reads flag.csv as a list of condition names;
maintains a GUI with sliders for these conditions;
loads the latest .pth checkpoint from the current folder;
generates the pose;
sends a packet via UDP to the IP/port from target_ip.csv;
The VaM plugin BoneRemoteReceiver listens on UDP port 9998 by default and applies coordinates/quaternions to the same 9 controllers.
So yes, it's implemented in Python, but with an important caveat:
the repo contains mostly inference/runtime, not the full training pipeline. The published files show CVAE_GEN.py, BoneReceiver.cs, START_ENGINE.bat, flag.csv, target_ip.csv, README, and a var package; there's no separate train script, dataset, or DataLoader/optimizer/loss loop in the repo. CVAE_GEN.py itself loads the completed .pth file, not trains the model.
Is it trained "on scenes"?
It's highly likely that it was trained using extracted movements/poses from VaM scenes and community assets, not "on scenes" as in some full-fledged simulator. The README explicitly states that the model was trained using community assets like Voxta Demo, Beta Scene, dance/mocap, and other free/CC scenes/assets. However, the author didn't publish the exact dataset extraction process in the repo, so I won't pretend to know his pipeline down to the last detail.
flag.csv also reveals this: it doesn't contain abstract tokens, but behavioral labels like idle, listen, think, speak, wave, dance, love, sit, missionary, sleep, and cowgirl. These look more like conventionally labeled movement modes than a free understanding of the scene.
What does this mean in simple terms?
This isn't a "smart agent," but a motion priors generator:
You specify a mixture of states/flags;
the model wanders in latent space;
decodes smooth motion;
VaM receives commands for nine controllers.
That's why it looks "simple but alive" in the video: it doesn't think; it generates believable movements based on a trained motion distribution.