Question AUDIO Simulation Experiment - A VERY Challenging Idea I would love to try!


New member
Hi All,

I hope that you'll forgive me about my bad English,
hopefully you will understand most of this experimental idea document I'm sharing:

1️⃣ Let's start with some facts:
- I'm still very new to the VAM universe and learning as I experimenting, still have much more to learn.
- I had this "out of the box" idea that I would like to try, the problem is... I'm not sure if it's possible with available plugins (since I'm not a programmer)

2️⃣ The Experiment:
The main idea is to make some kind of VOICE INTERACTION between 2 CHARACTERS (or more).
The idea is to create a FAKE yet amusing conversation based on a HUGE original audio library of many different options I'm about to record from scratch (Female + Male)
The tricky part is considering I'm still a fresh newbie, I don't know how exactly I should even approach this experimental, but I'm really interested trying.

The audio will be split into Sub-Directories so each Sub-Directory will be related to a "SUBJECT" to make more sense of a fake-chat.
In general this could work with ANY LANGUAGE since it will be based on Audio + Lip-Sync and not TTS.

3️⃣ Scene Example:
For the sake of simple TESTING I think starting with 2 UI Buttons could help:
- Button 1 = START CONVERSATION (Play Random, I'll explain the random in the next section)
- Button 2 = STOP

- START: Activate the first RANDOM audio of person 1 or 2 (doesn't matter who start the chat)
- Person #1: Start Playing 1 RANDOM AUDIO file - out of many categorized Audio (split in Sub-Directories to make it organize to "make sense" of Audio options)
- Person #1: Current Audio is finished, 0-2 seconds delay before the next action:
- Person #2: REACT to Person #1 (based on what Sub-Directory (subject) they played (what they said, I'll explain about my HUGE audio directory idea next)
- Person #1: React to Person #2 ...and so on until we press "STOP"

The thing is that EVERY TIME, based on a HUGE AUDIO LIBRARY with many MANY options, the conversation will lid to other situations (different animations can join different AUDIO plays later) as you can imagine this could be so interesting and weird at the same time because of more non-linear unexpected situations!

If this works for 2 characters, it could work for larger amount of characters as long as it works based on TURNS, so no other characters will talk at the same time (unless we choose to do so, it could be interesting but... I'm trying to start as simple as possible for the sake of Experimental tests.

4️⃣ The Audio Library Structure:
- The ROOT of all ORIGINAL RECORDED AUDIO FILES will be created PER-CHARACTER (or voice if you like: Female / Male / different voices)
- So each ROOT directory will be 1 Unique Voice, Each ROOT Directory will contain Sub-Directories in it. so you can save ALL and share with others.
- Sub-Directories (this is a lot of work, I'm not worried about it too much) are split to MAIN Different type of attitudes such as:
- Start Chat (some standard chat-starting sentences or questions), Questions, Answers... and in the future expand with other options (Joke, Laugh, Cry, Scream, Moans, etc..)
Again I would like to start simple, so it will probably be starting with few Sub-Directories and not much more than that to SEE if it works fine.

IT'S ALIVE! (the scenes we can make with it)
This idea is not limited ONLY for AUDIO interaction, the concept is 1 action trigger another and lid to another action, a fake "yet natural" flow of actions.
Just like some scenes I guess already exist by many talented creators, but MUCH MORE RANDOM... which may get more immersive and ALIVE results!
Means it could start with a chat but gets to flirt or whatever ideas based on the scene / scripted voices and directed animations. but I would like to start with a SIMPLE AUDIO concept first.

of course I will also like to make their Lip-Sync match and add some Animation variety to make things more interesting, but the FIRST main focus is the AUDIO FAKE-CHAT challenge.

5️⃣ The Approach:
I'm currently thinking how to start working on it.
It's still TEORETHICAL for me since I don't know all the options of related PLUGINS I should use yet, unfortunately I'm not a programmer so my guess is to use some "Logic Visual Scripting" plugins to create the main idea (if these options are possible with some plugins) + Combine with some relevant AUDIO / LIP-SYNC plugins that will allow me to control them with access to: SPECIFIC SUB-DIRECTORIES + PLAY / STOP that will allow to PLAY another SOUND from another SUB-DIRECTORY when SOUND is finished, to start the "CONVERSATION LOOP"

I hope that there are some plugins out there that I can start playing with which are not limited to what I try to do,
I believe it's possible since it's like triggering characters to start audio or animation, but using mostly VOICES that trigger other VOICES in a way...

If you followed so far, understood my bad English and would like to help me get started by give me some directions PLEASE DO!
Any step-by-step explanation on Plugins or other tools will be VERY HELPFUL since I'm still learning and visual video tutorials are easier for me to follow if there are any of course.

Thanks ahead to anyone who's helping, this could be very interesting experiment!
Hi, first of all, your English is very good. And your idea is very interesting, but I am replying to tell you that it is also very ambitious. Even if you were an experienced programmer, this would be a very difficult (near impossible) project. The problem is, that creating “life like” dialogue is complex. Let’s say that you want to create a dialogue with 6 sentences. Sentence 1, has maybe 10 possibilities. Let’s say you want to create 10 possible reactions to each of these 10 sentences from sentence 1. That would mean you would need 10 x 10 = 100 Sentences. If you want to add another layer, sentence 3, this would mean 10 sentences for each of the 100 combinations from 1 and 2. Now you’re at 100 x 10 = 1000 sentences for layer 3. Continuing to layer 6, you will need 10^6 = 1 million sentences. As you can see, the amount you need grows exponentially. Another problem, is that if you want to make this, you have to somehow make decisions on which sentences are logical reactions. If sentence 1 is about the weather, sentence 2 needs to be about the weather. So now you need a system which decides what sentences are good matches and which are not. If you don’t have such a system, with true random choices, you will have very bizarre nonsensical dialogue. I hope you understand at this point, how complex this problem is. If you still want to continue and experiment, you will need: logicbricks by MacGruber (statemachines, randomchoice) and also jayjaywon has very good tools (VUMLC and actiongrouper).
Upvote 0
Thank you for your kind reply @pinosante and for taking the time to read and share your opinion,
I'm glad that you understood my English :)

I did some calculations waaaay before for the same situations you're describing because I made something like it (years ago, 2D + Visual Scripting) but later as I messed with it I realized it's not what I was looking for, as you said it's almost impossible and also... probably unlimited Audio Scripts / options / animation, etc.. I totally agree as I was there before.

BUT! 🙂
Either it's my bad English or I wasn't clear enough with my description above.
I'm not aiming for this system / idea to make a complex human-like "AI" talking with each other, but something more simplistic, lite, fun but yet VERY random with different situations.

As I mentioned, I actually made something like that many years ago using visual scripting (simple UI / 2D / lots of AUDIO files organized in Directories) and obviously the interaction wasn't "LIFE LIKE" but the more voices and options I added, the more interesting and VERY random it was at every run.
The voices library was HUGE but the nice thing was that I could always expand it, it was just a matter of adding more .WAV files to different sub-directories as I explained above.
I remember when I showed it to some friends about a decade ago, they thought it's a sophisticated AI because no matter how many times they "reset" the scene, there were almost no repeats but yet continuedly flowless chatting! that's because of the amount of voices and some "smart" random delay between phases to make it less "robotic" with so many other things I've added such as "room noise" to combine all sounds in one atmosphere and so many other things I can't count at the moment, it was pretty advanced but I did it for fun when I tried to learn visual scripting as a hobby. (instead of making a game)

If I'll find it on one of my old HD, I'll be happy to show a video of it but I'm not sure if I can find it or if it will work, we'll see.

of course the "more accurate" or should I say, the correct approach will be to make tables / brainstorm and start directing an actual "script" but again, that's not what I want to do, I want to do an Experiment, starting with VERY LIMITED options but yet making the whole logic system expandable easy via just adding Sub-Directories and extra files as I wish at any time.
So whenever I have some spare time, I can get to long session of recordings (Male + Female) and keep expanding it, almost endlessly... but this is also too far away looking on it.

What I really want to try is to make it VERY simple with little options, but just to SEE that it works will sure inspire not only me but I'm sure many others to give it a try.

I agree with you, to make it "Life Like" it would be almost impossible (at least without AI / Training with neural network and machine learning, pre-training models and so on... ) but that's not my direction.
What I want to go for is an old fashion as simple as possible to SEE it functioning.
That's why I asked all these questions about PLUGINS / Visual Script or way to block it inside VAM, I'm just a newbie so I will have to look for them and see if there is a nice (user-friendly) way to actually build these "LOGIC" blocks together so I can build the basic rules and update it with AUDIO once it will work.

That what made me thinking why not the same "fake chat" but... in VAM!


I will check out the Plugins you've mentioned as I'm curious which is the most "dynamic" or open to do these things I've mentioned, if they're not super hard to learn and use, and of also how limited (or not) they are based on the specific goals that I'm trying to do which isn't super complex in most visual scripting I've tried but VAM... is a new beast for me. and I don't know what I can or cannot do without actual programming.
Upvote 0
Ok, in that case, if you just want some random audio files to be played, you can do fine with MacGruber Logic Bricks (Statemachine). You can use the statemachine to set an "Idle" state for a person. And then have that transition in a few dialogue lines. Each dialogue is a new state, where it just plays the audio clip. The cool part is that you can set for each state, to what other states it is allowed to transition. So you could have an idle state, with transitions to weather-topics-1 to weather-topics-8 or something (and if there are multiple states given to which it can transition, a random state is picked from that list). And each of these weather topics would be an audiofile playing. After playing one of these weather-topics, the state machine is now in this new state, and can transition to something new. You can set for each weather topic state, to what new state you want to transition. For instance a "weather-topic-response-1" or "general-agreement-response" or whatever your classification is. With just this simple building block of a statemachine, you can control the whole flow.
Upvote 0
Thanks once again for your kind reply and explanation.

I will start by downloading the plugin you've mentioned, obviously the harder task for me now will be to learn how to use it so I'll look for some basic tutorials, because the best way for me to learn is visual, following step-by-step rather and try things by myself. I hope it's not too hard to use :

If I'll have some questions related I'll keep update on this post, hopefully you or anyone else will be able to help me out :)
Upvote 0
Top Bottom