Hello Creators,
I’m excited to share a project I’ve been working on: a real-time skeleton-driving framework for VaM using a Variational Autoencoder (VAE).
Unlike traditional timeline animations or recorded loops, this system generates per-bone pose data in real-time based on the emotional context of the conversation.
How it works:
Emotion Inference: Using Voxta as the command center to infer emotional states (Excited, Boring, etc.) from the conversation.
Generative Motion: A custom Python hook server receives these states and generates motion data via a VAE model trained on animation data.
Hybrid Control: While the body and limbs are AI-driven, eye gaze, blinking, and basic head motions remain rule-based for stability.
Showcase Video:
Why this approach?
By moving away from pre-recorded timelines, characters can have a "living" presence with natural swaying and emotional transitions that never repeat exactly the same way.
I’d love to hear your thoughts or feedback on this generative approach!