Chatbots for VAM

Thanks, I have considered Azure, but it involves a hella lot of coding. There are far better and simpler codeless ways of building a conversational AI. However, these bots require complex coding to integrate them into Unity, or they are simply incompatible. SDKs rarely work out of the box. Even if you manage to get them working in Unity, you can set up lipsync, but driving animations from the dialog is near impossible. You have to resort to all kinds of trickery. The only satisfactory way that I have found of embodying a responsive bot voice, so far, is using MG's speech recognition system in VAM, plus the plugins.

It looks promising, especially if I can adapt Logic Bricks to give functionality. what I actually need is the bot to have the ability to give the default answer- If not recognized the player's intent, ie their question to the bot, belongs to a category of 'anything else.' Then it replies "I don't understand your question."
Actually have you looked at Power Virtual Agent? Azure Bot Service no code conversational AI. - Power Virtual Agents on Azure | Microsoft Azure
 
Actually have you looked at Power Virtual Agent? Azure Bot Service no code conversational AI. - Power Virtual Agents on Azure | Microsoft Azure

Thanks! This is new since I last looked at Microsoft bots.

The PVA raises a few basic questions:

How would it integrate with VAM?
Can it return audio files--we don't have text-to-speech in VAM
Would we be able to lipsync from those audio files?

The other problem to solve would be how to trigger the model's animations and behaviours in the model from the audio file. It took me long while to realize, it isn't the chatbot's responses that can be used trigger its behaviour, as if it were thinking what to do and how to act, and then doing those things, it's actually the player's input, the recognized words that trigger the response, ie the returned audio file, behaviour and animations.

Also, we would have a chain of communication to the in-cloud bot and back again, with different stages and processes- this is slow on response times and can be fragile. It would be brilliant if there was some way we could get PVAs type of visual dialog design and functionality onto our desktops.
 
Last edited:
Thanks! This is new since I last looked at Microsoft bots.

The PVA raises a few basic questions:

How would it integrate with VAM?
Can it return audio files--we don't have text-to-speech in VAM
Would we be able to lipsync from those audio files?

The other problem to solve would be how to trigger the model's animations and behaviours in the model from the audio file. It took me long while to realize, it isn't the chatbot's responses that can be used trigger its behaviour, as if it were thinking what to do and how to act, and then doing those things, it's actually the player's input, the recognized words that trigger the response, ie the returned audio file, behaviour and animations.

Also, we would have a chain of communication to the in-cloud bot and back again, with different stages and processes- this is slow on response times and can be fragile. It would be brilliant if there was some way we could get PVAs type of visual dialog design and functionality onto our desktops.

I am 100% positive it could be integrated into VAM. That said, I'm not a dev :) and I'm sure it would require some coding via a plugin to do so. Azure Bot Service does do text to speech, and could be returned via audio files I'm pretty sure. If you are loading in audio files and playing them via the person atom then LipSync would work. My experience using these bots is even though it is cloud based, they are very fast and responsive, so delay would be negligible if you have a decent internet connection (not bandwidth but latency wise, it doesn't require much pipe at all.) Using voice recognition to trigger what commands get sent to Azure bot service you would already know what animations or actions you would want to trigger, so yes, all doable.
 
I am 100% positive it could be integrated into VAM. That said, I'm not a dev :) and I'm sure it would require some coding via a plugin to do so. Azure Bot Service does do text to speech, and could be returned via audio files I'm pretty sure. If you are loading in audio files and playing them via the person atom then LipSync would work. My experience using these bots is even though it is cloud based, they are very fast and responsive, so delay would be negligible if you have a decent internet connection (not bandwidth but latency wise, it doesn't require much pipe at all.) Using voice recognition to trigger what commands get sent to Azure bot service you would already know what animations or actions you would want to trigger, so yes, all doable.

I might have a play with it. That said, I'd have to find someone in VAM who could do the coding to integrate it.

If we are going down that route, there is another way to consider. It might be possible to use the web browser in VAM to directly interact with the PVA (or any other bot embedded in a webpage). To achieve this we would need:

1. To be able to input the player's speech into the webpage in a VAM scene.

2. Be able to open up the webpage in the developer's section of whatever cloud service we are using. At the moment in VAM I can't login to any bot developer's section like I can in Unity.

3. And this probably the most difficult, convert the audio emitted from the webpage into the same format as normally used by audio files, and then link it to lipsync. This must must be possible because we record audio files in MP3 from audio in webpages, and in doing so, they are being converted. Also SALSA lipsync, can be made to work directly from IBM Watson voice responses without audio files just 'live' voice. However, I have not yet found a way to put an AudioSource on an active webpage/cloud connection and lipsync from it, without using a ready-made IBM SDK designed for this purpose.

3a. And, unfortunately, problems don't end there, SALSA lipsync is easy to configure with a Reallusion model, but configuring Daz models to the same quality with Salsa lipsync is a bitch- you have manually configure the visemes and after an hour of fiddling with a complex array of settings you can still end up with a gorgeous DAZ model looking like she wants to chew off your face, while she's talking.



If we could connect to a wepage/cloud, then we would have live communication with the bot, any bot , for example ReplicaAI, which would be so incredibly cool, wouldn't it!

As I have previously written, if anyone can find a way of integrating a bot with VAM, I will build the most lifelike and complicated conversational flow for all of us.
 
Last edited:
Hi,

I haven't read the whole thread but some stuff popped out as overkill to so here's how i would do it if helps anybody here get there faster and simpler.



requirements
- install freeware xampp or any other local web server. web browsers in vam will point to something like 127.0.0.1/vambot/*
- install freeware balabolka for TTS, it has a very useful balcon.exe tool that reads texts from the command line with tons of options (pitch,voice,rate etc).


v0.1: web bare-bones (estimate 1-2h)
- create a php page like vambot/talk.php that reads a text input in a form ,e.g. 127.0.0.1/vambot/talk.php
- in the script do a simple function like processMessage($message){ return "Received the following message: ". $message;}
- in the script run a exec("balcon.exe -t " . processMessage($input)); command (probably something like exec("nohup [command here] > /dev/null 2>&1 &"); to prevent for the page to wait after balabolka )
goal: if you go in a normal browser to 127.0.0.1/vambot/talk.php and type abc you should hear "Received the following message:abc" etc


v0.2: vam bare-bones commands (estimate 1-2h)
- add a few buttons on your page like "Pick a random number", "Say a joke", etc that submit the form with that same input filled based on the button e.g. "pick a number"
- in processMessage($message) add stuff like if($message=="pick a number") return "I pick:".rand(0,100);
- in vam add a web panel that goes to 127.0.0.1/vambot/talk.php
- when you click a button you should hear the audio you set for that message
goal: simple vam command interaction/mini-game scene


v1: vam chat (1h-infinity)
- the bot will work with typing the text in the page input field. To dictate at first you basically need to be focused on the input in the browser.
- you can format the page with css/js to display a message like "Listening..." when focused
- I haven't played with speech recognition but all dictation software should work, even microsoft's default e.g. "Start listening" command
- once you have this set up, in php you can update processMessage() to do anything you can image. I would do it from the grounds up, all the bots suck anyway imo. You can start with stuff like most text games did it: "[[Hmm... | Ok. | Sure. | No problem. |That's easy.| Numbers...| ]] [[The number I | I | What I]] [[choose|pick|select|want|like]] is $number ", and process it in php to find strings encapsulated by [[ and ]] and split them by "I" and pick a random value. Very quick and simple to do and you get lots of variety, .105 different messages just from that, imo that's more than enough for VAM and even for assistant uses. But you can easily integrate an actual chatbot with AIML (e.g. https://github.com/Program-O/Program-O) or external services but to me that's overkill.

v2: sexy lipsync magic (4-10h?)
- use this plugin https://hub.virtamate.com/resources/realtime-lipsync.1286/
- update your 127.0.0.1/vambot/talk.php script to export the text to a file instead of playing it (http://www.cross-plus-a.com/bconsole.htm)
- save the file always as the same name in a vam folder e.g. "/Audio/vambot/latest_response.mp3"
- add a vam button to the scene "Talk!" that when pressed loads that audio and plays it. This might be tricky a bit but can be done

v3: improve flow
- to hide the talk button and automatically read the response file in vam you can do through a vam script that checks for a flag (e.g. timestamp file) and when done it automatically does what the button did
- vam button/UI like "TALK", when clicked automatically focus through a script on a web panel
- maybe use https://hub.virtamate.com/resources/speechrecognition.6865/ to get the focus, like a "Hey VAM" command using this script to switch the focus to the webpanel, and from the there it's dictation mode
 
went ahead and did a proof of concept. demo files here: https://file.io/R7lzWhgwgHgI

- requires xampp & balabolka (with balcon http://www.cross-plus-a.com/bconsole.htm)
- requires this vam plugin for lipsync https://hub.virtamate.com/resources/realtime-lipsync.1286/
- files in [place in your local web server] go in the webserver folder (/htdocs if using xampp) and http://127.0.0.1/vambot/talk.php should work by itself in a browser
- there's a config.php where you set your vam folder and balcon location
- in the vam folder there's a scene vambot demo.json

process:
- you write something and hit submit; "time", "random" return some custom messages, for the rest it just echoes it back
- you have to press the vam buttons: "clear cache" + "answerz" x2 . don't ask me why, i have no clue, trial and error. The problem was VAM was caching the response sound file, what i did was to force play the newer one. there's probably better ways to do it but did the fastest at hand. I'm sure it can be improved maybe with a custom plugin or https://hub.virtamate.com/resources/logicbricks.1975/.





 
Hi Vamsnow,

That all looks amazingly well figured out!

I'm a writer not a coder. I have a masters degree in creative writing, and I trained as a chatbot copywriter with Soul Machines.

I have spent the last two years trying to figure out how to build an embodied conversational AI. I've integrated IBM Watson with a 3D model in Unity , but its too slow on response times. It's fragile and frequently breaks.

However, I'm 90% there in VAM.

What I need to hit 98% is an advanced desktop speech recognition system which integrates with MacGruber's Speech Recognition plugin. There's even a course which shows how this level of functionality can be achieved. Using the Speech Recognition and Synthesis .NET APIs | Pluralsight . I'd happily pay you to watch the video and integrate this level of functionality into MacGruber's plugin.

This might sound crazy, but I don't mind what you do with the plugin after its built, you can sell it if you like, no worries. Even though I will be paying you, I actually have a very sensible reason for letting you do you this, which I can explain in more detail, if you are interested.

Please, give me an estimate of the hours involved, and the cost, for building desktop speech recognition in a way which radically enhances the performance of MacGruber's plugin.



Best wishes,

Hedgepig.
 
If you can give me your email address, I'll send you a couple of videos of what I'm doing. I don't want to make them public. You might be surprised by how far I've got with this project. I'm actually building a fully embodied conversational AI for a six-year old girl who suffers from multiple disabilities. She has speech impairment and mainly communicates with sign language. I need to match voice input to the correct hand sign. I simply cant do this in Unity.
 
Sorry bro, i'm not watching no pluralsight courses. As for therapeutic uses for disabled people, I'm personally against having experimental technology being used like that as it might alienate them from society even more imho. I don't think vam is the right tool for that anyway.

But what I wrote works if you follow the steps. You don't need to build speech recognition software, you don't even need the vam plugin for SR imho. If you are focused on a web field (click on a input on a page inside vam/unity), if you type on your keyboard, the text goes in the field, right? Mostly all existing SR software have dictation mode which basically emulate key presses to send what you say to any other software.

As for chatbots, what I wrote works, you don't need to reinvent the wheel or pay for external services. Bots have been around for decades now, there are sets available and tools to train them.

proof of concept #2 :
- added AIML bot with program-o to the previous demo. There's a better php one but i did the fastest one for me to implement
- uploaded to it a few random AIML sets from alicebot and mitsuku https://github.com/hosford42/AIML_Sets
- disabled the lipsync to avoid having to click those buttons for a quicker chat

will add some more stuff and post the files for this demo too probably next week


 
My dear friend, I'm a writer, not a coder, you lost me at Xmap and 'php', I have no idea of what any of what you wrote means. You might as well be talking me through building a warp-drive engine. It's how my brain is wired up and totally not your fault. Sorry, but this is one of the problems with creatives talking to coders, you often don't understand why on Earth we can't understand you. But thanks for considering how to make a text based chatbot. I can see it works very well and I'm sure more than a few people here will love it. As for the disabled girl, there simply aren't enough human input hours in the day to reinforce teaching her sign language, that's why we're using AI. I need speech recognition so she can say the words and get the correct sign language mocap response. She will never be able to read or write or hear a spoken voice, so that's why the plugin has to recognize her voice input. The same goes for millions of other kids throughout the world with the same level of disabilities. Crazy isn't it, that you have to adapt and use a sex simulator to be able to make something as useful as this? I'm not going to sell the finished plugin, I'm giving it away to families and schools who need it. Whoever I pay to build this plugin is free to do whatever they want with it. They can make as much money as they want or need from it, and I honestly won't care in the slightest. I can't take it with me, and I want it to do some good. And hell yeah, if you use it in VAM and enjoy, then that is so good too!
 
Last edited:
Hi Jiraiya

In V-a-M, we have every plugin to make this happen. All that's required is 'something' to coherently activate the triggers.

Have you seen the amazing game, 'Detroit'? The game designers/ devs chose non-verbal UI over voice, and what they made is awesome. Huge argument to be had over non-verbal UI or voice. Just to widen the debate.

There's also a really deep debate about what constitutes 'will', or volition/autonomy. Do we organic humans actually posses free will ourselves, or is it illusionary?

But yeah, the end goal, for me, is to design and create an 'autonomous' AI that can move, act and behave as if it were possessed by what we understand as ' free will.'

This is my end goal as well with Virt-a-mate.

I will help in any way I can.
 
It's free at the moment. And it's pretty easy to get access. They might charge for it in the future though. They only have Unity integrations currently. But to request custom integration packages they request people email support@inworld.ai.

I havent seen the other thread, no. I'll check it out.
 
The cost of an online service would be super prohibitive anyway. I'm talking $100.00 per month per instance of a bot, which, anyway, wouldn't respond fast enough in a game engine. You are exactly right, it all has to run independently on every users' PCs. Online bots can be 'fragile' so many links in the chain.

As I wrote, once I've experimented and figured everything out, I'll upload it to the hub so everyone can try it for free. Why free? Simple, it will glitch and, at times, break . But sometimes it will work well. That is the nature of conversational AI. Sam the bot/avatar in the first video was created by two teams of 'bot builders' -- they have hundreds of engineers and software writers and billion dollar companies. I've managed to break Sam, intentional testing, which might have been down to low bandwidth or simply using a crappy microphone. The point is, a consumer will never imagine for a moment that glitching-out a billion dollar bot is their fault. That's human nature, in a nutshell. Then your bot will get trashed on social media.

So yeah, it will be absolutely free. Let someone else try to commercially exploit it. They can have the curse of broken bots and social media mobs of irate consumers. There are far better ways to make a living, that don't result in you having a stroke before you are fifty years old. If they really want the stress, they can be my guest.
I´m very interested in your work with this, I´m mostly interested in a conversational Ai and it does not have to be a part of what is usually the focus of VAM. And that it breaks from time to time I think is just perfect because people in real life break aswell and things dont turn out as you wanted or misscommunication comes in the way. It would be neat if there could be some voiced answers when it breaks with some different answers "I dont feel like talking anymore" or etc (connected to the error).

I love the military simulation "Arma" and this game is known for weird bugs where a tank just fly 50 meters up in the air or weird stuff happens becuase of the coding is not working, but me and my friend just see it as the tanks engine broke and make it a part of the simulation. Allmost no games has errors that you can see as accidents. Anyhow I dont mind some crashes from time to time and Im very curious how your local Ai conversational chatbot will turn out. Its just more fun if it does not have all the knowledge of the world and instead can really be its own entity.

Will it be able to have a memory bank so you can go back to topics or things you discussed before?

Problems I´ve had with conversational ai befoe related to gtp and such is that its to easy to break out from their character and then all the feeling of a sentient being is destroyed.

Do you have a patreon?
 
Back
Top Bottom