VamX 1.8 Voice Control is a real game changer!!!

Conner7

New member
Messages
2
Reactions
1
Points
3
I encourage everyone to get VamX 1.8 which includes an amazing new voice control system. It's a real game changer because one no longer has to stop playing the game play to manipulate computer buttons. It's one HUGE step towards making this game more realistic.
 
*cough*

Trigger arbitary things, not just what someone predefined:


(Edit: And actually this was floating around for years before in a more simple way as "VoiceTrigger" plugin, but it broke with some VaM update, so I was asked to make it work again. I used the opportunity for a proper integration, including GrammarRecognition, not just Keywords)

(Edit2: Just to clarify, as this could be misunderstood...vamX did not try to claim the credit here or use my code, and he did actually contact me....we both have our own implementations, simply using the Unity/Microsoft API for speech is all they have in common. His version is integrated into vamX, mine is more generic to work in arbitary scenes)
 
Last edited:
How complex and interactive can this be?
For instance "Come over here" could trigger the balance plugin/walk to make the model walk to where the player is standing. That would be awesome. More complex behaviour like "sit down" making them sit in the nearest chair etc.
It would be awesome to send generic conversation to an AI chatbot and have the reply text to speech converted and pushed out with lipsync plugin :)
I am quite sad about the lack of AI integration with VaM considering how many amazing open source and "free" implementations there are for various things.
 
How complex and interactive can this be?
Well, its Windows 10 speech recognition. You can teach it quite complex grammar and alternatives. Even slang words.
In the end it provides a trigger signal, you can hook it up to anything in your scene that accepts said signal.

I am quite sad about the lack of AI integration
That's mostly due to lack of understanding that this is NOT an AI. It does not understand what you say, it just matches your speech to the languague structure defined by the creator who build the scene and then triggers something. It can't do anything the scene creator did not account for. (Yes, internally there is probably some deep learning algorithm going, but that doesn't change what you get on the outside.)

By the way, all this fancy new deep learning and neural net AI is not very useful for use in games. These things only work, IF you can train it automatically, which requires implementing code that evaluates what is "good gameplay" automatically....if you can do that, you could just implement the good gameplay directly and save a lot of time and resources....which is why nobody is using deep learning for gameplay AI. In the end "AI" is just math and statistics, not magic miracles. E.g. with self-driving cars you can "just" record data from real drivers and have the AI learn from from what the sensors saw and what the real driver did in response. Doesn't work for games as you don't have the budget to have millions of players play as some unimportant NPC in your unfinished game, and even if you could get millions of players....many of them would just do random shit you don't want your AI to do. So, instead of you go with old fashioned behavior trees with thousands of rules and conditions. Much more reliable ;)
 
I think you miss-understood what I was asking. I was not referring to the "magic" AI will solve everything that the media portray, but shit like this,

Can you imagine a fully animated model that can walk around your scene based only on telling them where to go?
Then take a look at this one,

They train the character to be able to walk to and then sit in 'ANY' chair. The flexibility this natural movement that doesn't have to be pre-programmed would give you in VaM scenes is amazing.

These videos are years old. The training is done, this stuff is "common" now.
 
Hi again J,

I more or less know what you want. This is how you achieve it:

Either

1. A system: state machine, behavior tree or similar entity which triggers dialog, behavior and action at different stages of the game.

Or

2. A dialog driven system which triggers behavior and action.

These two approach's start at opposite ends and, if you like, work the other way around from each other.

The behavior tree moves the dialog and other action.

The dialog moves the character on from one set of actions or scene to the next.

1. As MacGruber wrote, you can use behavior trees. There are loads of them for sale in the Unity asset Store. Assets like Playmaker incorporate AI behaviors. I've played with one of them which guides a model around a course and it avoids walking into walls. You can set it up in minutes. However, for me there's not enough instructions to do much else, except have your model walk around. You can use Playmaker's dialog designer but it's not very good. I tried to integrate the excellent pro Dialog System for Unity, but it won't work in the same scene as Playmaker. And this goes on with anything you try to do in Unity. It either won't work, or, worse still you end up with stopped play and 999 red warnings filling up your console. Which is one reason I moved over to VAM.

One of the big complaints with Unity assets is some devs spend years perfecting an asset but pay scant regard to explaining to their customers how to use that asset-- it's as if they are not bothered what happens after someone's bought it. A good example is Synbot, which I refused to even test for the devs for free because of it's lack of any comprehensible documentation.

The other issue with anything like this in Unity, is that the assets and tutorials are mostly geared towards melee: sword fighting, evasion and combat etc., not small-scale , everyday human interactions. So, you are now in the game of adapting the macro to the micro of one-to one interpersonal actions and behaviors.

This asset for Unreal Engine does look quite promising : VisAI - Companion - Modern AI Framework in Blueprints - UE Marketplace (unrealengine.com) .
However, at $240.00 I'd want a detailed user guide, not a brief generic explanatory video.

So, the idea with behavior trees is to use them to trigger dialog. In this way you can not only mimic a chatbot, but when a a dialog response is triggered also simultaneously trigger a full range of movement and body language. No doubt you could, for example, pay to use Deep Motion to mo-cap the full body animations.

All of the above is very, very complicated stuff.



OR...

2. You use a dialog driven approach. In this approach you are creating a similar plugin to MacGruber's Speech Recognition. What this (as yet imaginary) plugin does is develop things stage further by using more complex XML, and you'd utilize mo-capped animations triggered by keywords and grammar using MG's Timeline. It would give you the ability to have (weighted) yes-no branching narratives, instead of simple commands that force behavioral compliance from models. In this way, you could hold an almost natural conversation with your character and she, or he , would also physically respond to that conversation, body language, movement, behavior, as if it's happening in real life.



A final point here: This is a product of research from about six years ago : https://webspace.science.uu.nl/~yumak001/UUVHC/index.html, and here's a hard to find video:

Note even back then, the team managed to integrate visual/object recognition.

The speech animation asset Rogo Digital Pro has been deprecated because the stupid thing crashed the game after an hour of setting it up, every damn time you tried to use it. Aggh!


Note the animations are clunky, this is what happens when you use a handful Mixamo animations and don't spend even more time experimenting with blending them in the Animator. But this all was state of the art back then, before affordable mo-cap, Final IK baker etc, and nobody much cared.

Nightmare stuff to integrate.



2a. For me the second approach is by far the simplest, stress free and most relevant to small scale, intimate VAM scenes. I've just played with MG's and VRifter's speech recognition. They are far, far easier to use than anything in Unity. What would take me months of trial and error, due to insufficient asset user instructions, can be done in an afternoon in VAM. And, I actually enjoy doing it rather than getting stressed out with Unity.

The well-documented assets that are brilliant, are SALSA , Dialog System for Unity, and RTVoice and Chatbot for Unity. I would avoid everything else for making a 'talking' model. But it's not an easy ride to put them together.

Sometimes we need a specific word to encompass the complexity of our ideas. What is needed is a simultaneity of speech, movement, gesture and body language.
 
Last edited:
Hmm, why do I write so much? Well, that's because I've not written anything at all, not a word, for about two years. But I am enjoying talking to you guys at VAM.
 
Hmm, why do I write so much? Well, that's because I've not written anything at all, not a word, for about two years. But I am enjoying talking to you guys at VAM.

You are welcome, some of us are in boring home office for nearly 1,5 years, lol.
 
Back
Top Bottom