AI can bring so much into your scenes

Saint66

Well-known member
Messages
336
Reactions
618
Points
93
As I am fiddling a lot with LLM models these days, I thought I just want to share an excerpt of a scene where I just let the AI keep speaking.
I just created a character in like 3 minutes for this, with the idea of a lonely girl at christmas eve.

Thanks to Voxta team, TacoCat for the environment and CuddleMocap for the sitting animation.

Note: speech is purely generated by AI, it's no recorded voiceline or such.
 
Last edited:
Impressive. I noticed at 0:48 when she says, "I'm not asking for much" that the accent seems to slip into British English and then recover. Can we add an "EMO Girl Friend" dial to that? :ROFLMAO: For a lament, she seems pretty calm.
 
Impressive. I noticed at 0:48 when she says, "I'm not asking for much" that the accent seems to slip into British English and then recover. Can we add an "EMO Girl Friend" dial to that? :ROFLMAO: For a lament, she seems pretty calm.
very well observed with the british accent… it‘s a local TTS engine called Coqui, it sometimes prefers that somehow, even in German lol
 
very well observed with the british accent… it‘s a local TTS engine called Coqui, it sometimes prefers that somehow, even in German lol
Sounds so much better then piper and silero .. i haven't tried it yet, i will now though. Are you running it all local? Seems pretty quick on the responses. (Are you using a xx90 series? I knew i shoulda forked out more money)
 
I also support Voxta although I haven't had much time to mess around with it. How did you trigger the ai to give a response without any verbal input?
 
I also support Voxta although I haven't had much time to mess around with it. How did you trigger the ai to give a response without any verbal input?
There‘s a function in Voxtas inspector view, I am just sending „continue“ every few seconds.
 
Sounds so much better then piper and silero .. i haven't tried it yet, i will now though. Are you running it all local? Seems pretty quick on the responses. (Are you using a xx90 series? I knew i shoulda forked out more money)
Only online service I use is Deepgram for STT (not needed in this scene).
Then Ooba for text generation and Coqui for TTS.
Sometimes I use OpenAi for text and Eleven, if I want perfect German or need more VRAM….
And yes, I use a 4090. Coqui and the LLM need about 9GB of VRAM
 
I also support Voxta although I haven't had much time to mess around with it. How did you trigger the ai to give a response without any verbal input?
I've used an can have a timer set, or create a trigger to prompt for more responses with a timed interval.
Only online service I use is Deepgram for STT (not needed in this scene).
Then Ooba for text generation and Coqui for TTS.
Sometimes I use OpenAi for text and Eleven, if I want perfect German or need more VRAM….
And yes, I use a 4090. Coqui and the LLM need about 9GB of VRAM
I just switched to the latest version of voxta and installed the xTTS module... incredible...
 
Sure, you can use it on any model i'm sure. It's just an AI interface, and if you give the AI a Male persona in the character description, it'll act like a male. It interfaces the API for a LLM, which if you are familiar with, can be customized to impersonate any thing you can think of... from a ant to a giant space alien, straight female, to flamboyant male. Whatever your heart desires, it can be.
 
I like this a lot. But I'd like to do this with video game girls, say Lara Croft to use a typical example. The voice, however, would have to be at least one of the voices found in one of the games; the idea is that I would want to use a character that actually sounds like said character in the scene, rather than using a voice from something like Vammoan's list of voices on her, which wouldn't fit at all.

On top of that, for some characters, accents can be an obstacle. If I want to bring say Manon from SF6 in a scene like this, her French accent would destroy almost any attempt I've seen from A.I.-generated voices so far (I haven't found good A.I. generated voices where the speaker speaks English with a French accent, if there's one out there I'd want to know). Then again it wouldn't have to be a generic female voice. It'd have to be the voice from the game.

The next problem with this is that for some characters (especially in Fighting games) there's just not enough total voice samples to feed an A.I. in order to get enough unique generated sentences, since the source material is very sparse. It might work for games where there's a lenghty campaign, with lots of cut-scenes and dialogue though (say, indeed, any of the recent Tomb Raider games for Lara, or maybe Horizon Zero Dawn for Aloy would work well, etc).

I'm a sucker for video gaming girls in VAM, so my 'needs' with this would be very specific and tougher to achieve (not impossible though, just tougher, it would require lots of samples and a very good A.I. voice generator that can deal with all sorts of accents, or even could produce voices in the actual foreign languages outside of the typical base English most of them use).
 
Use XTTS, get an 8 second clean sample of her voice, and it'll do the rest. You just put a wav file in a folder, and it learns, in seconds after you select the voice in voxta. Youtube, MP3 Converter Online to WAV -- See the voxta page, it has some useful info on it : https://doc.voxta.ai/docs/xtts-server/

XTTS has a language specifier in the JSON // en-fr might be good for french, though i haven't tried accents.

it doesn't require lots of samples, just a good clean sample, not ummmmsss, no deep breathing, and use Audacity to remove the background noise, and you'll be very pleased.
 
Use XTTS, get an 8 second clean sample of her voice, and it'll do the rest. You just put a wav file in a folder, and it learns, in seconds after you select the voice in voxta. Youtube, MP3 Converter Online to WAV -- See the voxta page, it has some useful info on it : https://doc.voxta.ai/docs/xtts-server/

XTTS has a language specifier in the JSON // en-fr might be good for french, though i haven't tried accents.

it doesn't require lots of samples, just a good clean sample, not ummmmsss, no deep breathing, and use Audacity to remove the background noise, and you'll be very pleased.
Ok, fair enough, that's new info for me.

Thank you very much. I'll give all of that a try in the coming days.
 
Sure, you can use it on any model i'm sure. It's just an AI interface, and if you give the AI a Male persona in the character description, it'll act like a male. It interfaces the API for a LLM, which if you are familiar with, can be customized to impersonate any thing you can think of... from a ant to a giant space alien, straight female, to flamboyant male. Whatever your heart desires, it can be.
Thanks, I went ahead and started checking out Voxta. One thing I'm not clear on from reading the webpage and watching some videos: do the voices come with Voxta and if not, where are we supposed to get them from? Is there a tutorial on that topic?
 
Thanks, I went ahead and started checking out Voxta. One thing I'm not clear on from reading the webpage and watching some videos: do the voices come with Voxta and if not, where are we supposed to get them from? Is there a tutorial on that topic?
You have to install the prerequisite applications... A LLM Chat Engine (Kobold, oobabooga) that will handle the AI conversation, Inferencing and Summarization. A Text to Speech engine, as referenced above, xTTS. The voices are just wave files of any speech from a person talking, whether it be a simple interview, a short sentence, or longer. xTTS will use that to create the voice, it doesn't need anything more then a short clean WAV file of a person talking naturally. You'll have to experiment and research a bit on how to get the best quality. You can also include a STT (Speech to Text) engine, which is included with Voxta, and works pretty good. VOXTA is a front end to the APIs of the other apps you install, amongst other things. Read the docs, google alot. It took me a short while to figure out how to get good sounding voices, and how to clean them up too and make them sound pretty believable.
 
Thanks, I went ahead and started checking out Voxta. One thing I'm not clear on from reading the webpage and watching some videos: do the voices come with Voxta and if not, where are we supposed to get them from? Is there a tutorial on that topic?
There are several ways to get your own voices, pretty much depends on what services you are using. It’s quite easy.

And to add to @AWWalker detailed reply: if you have no beefy setup or don’t want to install local services, they just released their own cloud, hosting everything you need.
I would suggest to use at least a local TTS to save on your cloud credits
 
Use XTTS, get an 8 second clean sample of her voice, and it'll do the rest. You just put a wav file in a folder, and it learns, in seconds after you select the voice in voxta. Youtube, MP3 Converter Online to WAV -- See the voxta page, it has some useful info on it : https://doc.voxta.ai/docs/xtts-server/

XTTS has a language specifier in the JSON // en-fr might be good for french, though i haven't tried accents.

it doesn't require lots of samples, just a good clean sample, not ummmmsss, no deep breathing, and use Audacity to remove the background noise, and you'll be very pleased.
Ok just a small update on this.

Finally had some time to check this out. I followed the video to a T, and it works up to the point at about 2:10 in the video (the installation guide video).

It says this: " run xtts server with -d cuda and --deepspeed flags " with the specific CMD command to type.

I typed that command, and now it says: " No module named xtts_api_server "

So like I said, so far everything shown in the video works, I've done exactly as described (same download locations, and installation paths, folder names, etc, followed to a T, nothing different). But yeah I'm stuck there. Any tip for that one?

Thanks.
 
Ok just a small update on this.

Finally had some time to check this out. I followed the video to a T, and it works up to the point at about 2:10 in the video (the installation guide video).

It says this: " run xtts server with -d cuda and --deepspeed flags " with the specific CMD command to type.

I typed that command, and now it says: " No module named xtts_api_server "

So like I said, so far everything shown in the video works, I've done exactly as described (same download locations, and installation paths, folder names, etc, followed to a T, nothing different). But yeah I'm stuck there. Any tip for that one?

Thanks.
What version of python is showing in stalled? Cuz most apps under python are version specific. I have had that error... a while back
 
What version of python is showing in stalled? Cuz most apps under python are version specific. I have had that error... a while back
The exact same as shown in the video guide. Which is python 3.11.7 amd64.
 
Back
Top Bottom