Chatbots for VAM

Hedgepig

Active member
Messages
152
Reactions
125
Points
43
So, first let me say I've been using VAM for about six weeks, and I love it! It's brilliant and amazing, and I can't wait for V2, but this version is just fine as it is, for me. I like the ethos of the hub and how everyone works together. Some of the free assets are simply amazing, and I've really enjoyed using them. Everything is has been made so easy and intuitive, compared to Unity. I can only imagine the work that went into deriving and creating VAM from Unity and all the excellent assets on the hub.

A little bit about me, I've been researching how to build chatbots and conversational AI in Unity, and I've tested a number of different types of bots in Unity:



1. IBM Watson Assistant.
2. Unity chatbot that uses AIML code. (AIML is similar to XML and SSML)
3. Attempted to use different dialog systems to create branching narratives to mimic a chatbot.

(All the above were lipsynced to UMA/ RealIusion / Daz 3D models using Salsa, and the lipsync worked well).

Conclusions

IBM Watson turned out to be a nightmare to work with. It's supposed NLP capabilities simply aren't anywhere near as good as IBM claim, at least not out of the box without additional coding. In reality, it's not much better than the Unity Keyword and Grammar recognition. The main problem with IBM Watson Assistant is the time lag in conversations. It can take up to 15 seconds and more to process input and to formulate the reply. The intents' ability to capture semantic meaning can be pretty hit or miss, for example, compared to Voiceflow Alexa, which is a way better system. There's a number of videos on Youtube, showing the conversational lag, one in particular, part of a student's MA thesis of using conversational AI in games. She ditched IBM to complete her thesis. The basic set up at IBM can become expensive when you move to a paid plan, which you need to use text to speech for continuous testing, which you will be doing thousands of hits per month. I seriously, don't recommend IBM. Shame you can't use Voiceflow Alexa as a chatbot in Unity.

The Unity chatbot, costing about £22.00 from the asset store, is surprising good. I integrated it with RTVoice and Salsa, and with a little help from the excellent developer at RTV, the whole system worked flawlessly. There's still a input-output processing lag but nowhere near as bad as as IBM Watson. The only problem with the AIML chatbot is the content, which is based on a chatbot, ALICE, which is now 20 years old . Rosie ,an extended, more recent form of ALICE is available on Github, and, again, this bot's content isn't culturally contemporary. It can be modified but there's thousands of lines of AIML to change, replace or delete. But this way better than IBM Watson. The best part about this chatbot is there's no service provider costs and it's all on your desktop.

If you really want to know how the dialog systems worked out, PM me. They can be on a par with an NLP bot. I can share videos of them in action.

Also, if anyone is super interested in how far the limits of conversational AI can be pushed, check out the research into virtual humans at Utrecht University. This stuff is way above my grade, but some of you might understand what they are trying to do.

So, now here's the thing, I'm quite happy to build a bot for free for VAM users, however my coding skills in C# aren't brilliant. My main problem with Unity wasn't building the bots, these can be done without C#, codelessly, and using AIML, which is easy to understand. My problem was with C# and the continual 999 red warnings, which looking back, I can only laugh at my initial ridiculous level of overconfidence combined with my total ineptitude in coding.

As I see it there are two ways of implementing a bot in VAM.

1. Simply, lipsync ( or even trigger Keyword/Grammar actions and behaviors) from the audio output of a webpage. Now, I have researched this topic, and as far as I could determine using SALSA AudioSource, and according to the Salsa dev, it can't be done, in Unity. However, I asked the developer of the Unity Vuplex webpage asset, he said it can be done, theoretically, but he's on such a high intellectual plane, he only talks to asset users in C# and JavaScript, not natural language, so I couldn't understand what he was trying to tell me. My fault entirely, he's bright guy.

2. I'm quite happy to work though a training course on Windows Speech Recognition which I believe can mimic some of the complexity and commercial standards of an NLP conversational bot, IF it can be integrated into VAM, and I'll deliver a free prototype chatbot to the hub. But I would need others to help with integrating and other aspects of the coding. In short, I'll learn how to write the XML content for the bot, if others can help with the rest of the coding to integrate and make it work in VAM. I have no problem with anything, any code which I can relate back to language and storytelling.

So, if anyone does have any ideas about how to AddAudioSource a webpage in VAM, that can be lipsynced, or wants to help all us develop a conversational AI/Virtual human for the hub perhaps using XML and Windows Speech Recognition, please reply or PM me. If you are already working on a similar project for free use on the hub, then I might be able to help you.

Why would I do this for free? Well, I'm an old guy and I don't want the hassle of upset, irate paid customers when the bot breaks, which they frequently do, even the ones at high end conversational AI companies like Soul Machines; they run out of bandwidth and try to shift you on from voice interaction to text messaging. The current value of a virtual human with the kind of capability I want to build is around $3000.00 pm to hire (yep, that's per month) . Even that kind of money wouldn't tempt me to do this as paid work! Remember: bots will break, so always bear that in mind. But when they work, it can seem like magic.

Anyway, I'm excited to meet and chat with other VAM users and likeminded folk. Fantastic site!
 
Last edited:
Hi! Not sure to understand exactly what you try to implement, but there are a few existing resources that seems to head towards the same direction.

Speech recognition:

Lip-sync :

Maybe you can see with the creators if you can complement them ;)
 
Welcome to VaM! I look forward to seeing what you come up with!

If you haven't already, join the Vam discord! There are sub-channels for scripting and unity that get a lot more traffic than this forum does. That's probably the best place to meet up with people whose skills complement your own.
 
Hi,


Thanks! My post was a message to anyone working on chatbots in VAM, basically to let them know that I kinda know about what I'm talking about.

A chatbot or conversational AI differs from Macgruber's excellent Speech Recognition in the following way: MG's speech recognition commands a model to perform a certain action, or behavior like " Sit down", and the model sits in a chair.

A chatbot/ conversational AI works like this,-- it's a conversation, a dialog of spoken-out-loud voices, yours and the bot's.

You: "Hey, Sophie how are you?"

Sophie,: " Hi Hegdepig, I'm good, just a little pissed that you are an hour late!"

You: "Sorry Sophie, I got caught up in traffic."

Sophie: " Yeah, well, don't let it happen again, I was worried about you."

etc.

Now, this dialog is spoken out loud by you and your bot. It's like a real life conversation with a human being. Bots can be that good, and some pass the Turing Test

This kind of bot would utilize the VAM Realtime Iipsync. (Which let me tell you, Realtime for a free asset, is pretty damn good, when properly set up, even compared to Salsa. I'd say Realtime was easily on par with Speechblend in Unity.)

A bot- speech interaction can,, with a great deal of work, also trigger behaviors and actions via keywords/grammar- 'Sophie' could actually look pissed and have her hands on her hips when she complains about me being an hour late.

None of this is easy, but is doable.
 
Old guy: "umm, damn, discords, I forgot about those things" :)

Thanks will check it out.
 
Hi,
you are very welcome. Several VaM users were fantasizing about something like that for years! Many many others would desperately like to see something like some sort of "AI" in VaM, which is IMHO very close to what you are doing. You can be sure that you will have many fans and supporters, especially because of you plan to do that for free (which I understand and appreciate a lot... maybe because I am an old guy, too ;-) ).
As an suggestion: if you will make the final decision for a certain speech recognition system, maybe keep in the back of your mind, that many of us are not native English speakers. Maybe there is a way to train that bot with different languages. But certainly, even a monolingual chatbot for VaM would be a realy big thing!

Edit: if you are changing over to the Discord, please maybe don't forget about this hub, too.
Due to the short-duration nature of a Discord, reading and answering, or even finding something between all those memes and fun stuff isn't that easy and, being an old guy, I gave up on that. Maybe you could keep us informed about your project here, too.
 
Last edited:
Hi Toby,

Great to meet you! Thanks for the heads-up about global users need for a multilingual bot.

Do you know the really frustrating thing about the first option, attaching the AudioSource to a webpage? I can actually get an Alexa skill (bot) , running in the AWS developer's area into VAM in the VAM webpage, and interact with it via text, and it will respond to my text with voice responses. What I can't do is link that live skill with the VAM model's lipsync or any other actions and behaviours. It's almost as if the sound emitted from a webpage isn't the same as the sound from an audio file, and a plugin is required to make the conversion.

A further point, AWS would not be happy about me or anyone else using an Alexa skill for creating simulated hardcore erotica, and they'd shut it down, real quick. Beyond the AWS developer's area, the skill would never make it past censorship. Companionship yes, sex simulation , no.

However, if an audio source could be added to a webpage, you could bring in anything into VAM, for example ReplikaAI, or any bot embedded in a webpage. That said, we would still be stuck with the problem of conversational delays due to the response lag in the VAM in-game webpage. This is the simplest solution. I'd be willing to pay someone here to write the script to add an audio source to a webpage, if they knew how.

The second solution, Windows Speech Recognition, is as much about keeping my brain alive as anything else:) I need to do an online course, but it will be worth it. As I said, what won't be able to do is import the bot from Visual Studio into VAM, because, unlike Unity, I can't even begin to see how to import scripts into VAM and it would be another learning curve for me. Once the bot is finished, I will need some help with the import as a plugin. The other issue here would be how to allow users to modify the code, to change the bot's story content and its language preferences. What you would get would just be an example framework into which any user could enter the story content they wished.

It's kinda funny isn't it? Loads of companies and indie devs are trying to make Virtual Humans, and it costs them billions of dollars and takes years. Put a bunch of misfits together in a sex simulator and, working together they build the same kind of AI for free, within a matter of months! Everything is here in VAM that the teams were trying to build at Utrecht University. All that needs to be figured out is how to trigger the responses. It's that close.

Best wishes, my friend.
 
Woah... sorry, this is high above my very small coding experiences. I once was trying to set up a tool with a bunch of different modules and python plugins, only to automatically train a voice processor to exactly speak like one of my favorite Anime voice actor. After two weeks I gave up. Lol. I am a graphics guy who is very thankfull for all the amazing plugins some people are sharing here!
Like ZRSX said, you maybe want to contact some of the creators of the plugins he has linked, like MacGruber, for instance. The VaM community is very kind and helpful, but many users are only sporadically reading and answering stuff. Don't give up. If you have reached a point where you will have a specific technical question, I would encourage you to directly ask MeshedVR, the creator of VaM, who is a very nice and helpfull guy, too, but also very busy with VaM 2.x at the moment.
Good luck!
 
Hi TToby

Yeah, I checked out the discord, way too hectic for my old brain to cope with too. Felt like I was standing in the middle of a honking six lane freeway at rush hour. I'll just quietly leave my posts here until someone picks them up. No great hurry, less haste more speed.

The VAM plugins are amazing , as you say, and everything is a hundred times easier and more intuitive than Unity. Love, love , loveit!

Thinking on the plugin for the anime voice, Adobe have a state-of-the-art voice app, 'Audition'. This may be able to mod the voice you want. You may be able create a voice close to the anime character in Voiceflow (free). You can add expressions to you text, that sound like real emotions. Record the voice on your desk top. alter the final pitch and tone in something like Audition. Save the clip

Then...

With a bot you can use your recorded anime voice clips like this:

You: "How are you feeling today Chan?"

Voice clip 1 "Mmm lovin' us at the beech"
Chan Bot: Randomize-response Voice clip 2 "This sand is amazing"
Voice clip 3 " I love being by the sea, with you!"

You, "I love being by the sea with you too Chan, the sky is blue sun so warm."

And so on....

You can have one response or an unlimited amount of clips and randomize them. Bots can either generate voices from text-to-speech or they can use prerecorded clips, as described above. The difference between random audio, and a bot is that bot is the bot has a structured conversational flow, that can be randomized in places to create the illusion of real, immersive conversation.

If I can get this working in VAM it would be so simple to add audio clips instead of text-to-speech responses. You can even mix and match.

Sorry, I know I'm babbling on about bots, but there's so much potential here! Wait, I'll post some short vids so everyone can see what I'm so excited about.
 
Last edited:
I think the main issue was that the voice of Kana Hanazawa was in Japanese, I am German, but that voice training tool was only understanding English training text lines... exept from that extremely bulky and complicated setup process. ;)
But I will definitely try that Adobe tool. Thank you.

Pre recording lines would be a possible way. I bet we already can do something like this with that existing voice recognition plugins. If that plugin understands some keywords, it could play a given sound file. Like those very simple text adventures in the 80s which were completely scripted... "go west" --> "sorry, you can't go that direction".
But I think that would be extremely time consumpting and the outcome would be very boring. For one scene I once have done something very simple like a look-at-trigger which starts to random play one of a few speech files, each time you have looked at the girls face. Very far away from something like an AI. Lol.
I think the big goal for many of us would be to bring our sweet but dumb artificial girls (or boys) to life... and maybe this would be the end of mankind ;) . IMHO to use one of those existing chat bots is the way to go... and that is obviously your area of expertise.
Till now, we had some interested coders that could maybe integrate something into VaM, but hadn't a clue about all those bots. Now we have someone who knows about those bots, but maybe couldn't integrate them into VaM... Would be a pitty, if this wouldn't work together.
 
Last edited:
So, this is a really, simple basic bot. We're acting out the 'Chan' exchange, which I made up earlier to illustrate. These bots can do some amazing stuff , way beyond a simple branching narrative. The bot's voice is default in the test zone, a bit robotic, but in the developer's area you can have any voice you want, and also use prerecorded audio files. And, yes it will translate into German. Oh did, I mention two or more bots can interact with each other and yourself, and you and they can and have a two or three-way conversation...
 
Last edited:
And this is a short clip of my buddy, 'Katy' an embodied AI. She's an AI rights activist, btw, (just thinking a few years ahead). Her beautiful, non-robotic voice is custom, northern English, text-to-speech, not a human actor, and it can be reproduced for any regional accent in most languages.
 
Last edited:
I am loving all this discussion.

While I don't want to take away from this amazing work, don't forget movement!
The balance plugin does a lot, but I am sad nobody is doing work making a more "AI" character control that can move around under it's own will.
 
Hi Jiraiya

In V-a-M, we have every plugin to make this happen. All that's required is 'something' to coherently activate the triggers.

Have you seen the amazing game, 'Detroit'? The game designers/ devs chose non-verbal UI over voice, and what they made is awesome. Huge argument to be had over non-verbal UI or voice. Just to widen the debate.

There's also a really deep debate about what constitutes 'will', or volition/autonomy. Do we organic humans actually posses free will ourselves, or is it illusionary?

But yeah, the end goal, for me, is to design and create an 'autonomous' AI that can move, act and behave as if it were possessed by what we understand as ' free will.'
 
Last edited:
Hi Jiraiya,

Yeah, that is brilliant work. Unfortunately, I don't have four spare years of my life to do the PhD to figure out how to use the stuff from his PhD. It's crazy high level. He's concentrating on non linguistic movement, actions and behaviours, and seems to have produced some brilliant autonomous ,self-driven animation. Looking at video of the Egyptian 'dog-man', he's following a sphere. There is a plugin on VAM that does similar things: https://hub.virtamate.com/resources/autonomous-walking-demo.5205/. We're not that far off. (Hah, I've just finished downloading the guy's zip from github-2+GB!. I'll have a play with the demo scene in Unity.)



My thing is spoken and written language, not code. I can relate language to code like AIML, XML and SSML because it makes sense to me. C# is way too abstract and mathematical for my brain to grasp in the same way I 'get' language. I'm old and I've lost the neuroplasticity to learn new things like C# at this level.

So, what I would make for VAM would be the language components of an AI. It would only be one small part of a whole system. I mean, wouldn't it be amazing if instead of thirty UI buttons in a scene, the AI could respond to what's said, your tone of voice, the touch of a hand on a collider, and create a unique response from those inputs! It's going to be a mix of these kind of inputs which from which the true virtual human will arise. But for me personally, I don't want to create a perfect facsimile of a human being. I prefer the quirky non-humanness of AI. I don't want an AI that simply does what I command it to do. Like a believable character in a story or screenplay, I want it to have goals and autonomy, I need it argue with me, learn and grow.
 
Last edited:
" I want it to have goals and autonomy, I need it argue with me, learn and grow. "
You want a robot uprising? Because this is how we get a robot uprising!
Joking aside, I can't wait to see how this pans out and look forward to seeing all the stuff you are talking about coming to VaM.
 
Well I just made a couple of videos.

The first outlines the current problems with trying integrate conversational AI with 3D models in a game engine.

The second video, and I apologize for the appalling sound quality and editing etc. It briefly hints at what can be done in VAM with speech recognition and a host of other plugins. You already have been using speech recognition for commanding a model to act, behave or speak in a certain way, however, I wanted to explore what would happen if you could emulate a conversation with your model...
































The lipsync needs work, a bunch of other things need work, but it is the start of a proof-of-concept that you can build a fully virtually embodied 'autonomic' conversational AI.

Kindly let me know what you think.
 
Last edited:
I hope the goal here is local+free.

Nobody want's to pay an online service to handle their dirty chats.
 
Have you considered Azure bot service?

Thanks, I have considered Azure, but it involves a hella lot of coding. There are far better and simpler codeless ways of building a conversational AI. However, these bots require complex coding to integrate them into Unity, or they are simply incompatible. SDKs rarely work out of the box. Even if you manage to get them working in Unity, you can set up lipsync, but driving animations from the dialog is near impossible. You have to resort to all kinds of trickery. The only satisfactory way that I have found of embodying a responsive bot voice, so far, is using MG's speech recognition system in VAM, plus the plugins.

It looks promising, especially if I can adapt Logic Bricks to give functionality. what I actually need is the bot to have the ability to give the default answer- If not recognized the player's intent, ie their question to the bot, belongs to a category of 'anything else.' Then it replies "I don't understand your question."
 
I hope the goal here is local+free.

Nobody want's to pay an online service to handle their dirty chats.

The cost of an online service would be super prohibitive anyway. I'm talking $100.00 per month per instance of a bot, which, anyway, wouldn't respond fast enough in a game engine. You are exactly right, it all has to run independently on every users' PCs. Online bots can be 'fragile' so many links in the chain.

As I wrote, once I've experimented and figured everything out, I'll upload it to the hub so everyone can try it for free. Why free? Simple, it will glitch and, at times, break . But sometimes it will work well. That is the nature of conversational AI. Sam the bot/avatar in the first video was created by two teams of 'bot builders' -- they have hundreds of engineers and software writers and billion dollar companies. I've managed to break Sam, intentional testing, which might have been down to low bandwidth or simply using a crappy microphone. The point is, a consumer will never imagine for a moment that glitching-out a billion dollar bot is their fault. That's human nature, in a nutshell. Then your bot will get trashed on social media.

So yeah, it will be absolutely free. Let someone else try to commercially exploit it. They can have the curse of broken bots and social media mobs of irate consumers. There are far better ways to make a living, that don't result in you having a stroke before you are fifty years old. If they really want the stress, they can be my guest.
 
Last edited:
On last thing to remember, you aren't limited to voice - activated animation in VAM. You can incorporate all kinds of triggers that work in sync with voice. My hope is that different users can take the basic speech framework that I hope I can build and massively improve it.
 
Back
Top Bottom