AI can bring so much into your scenes

Saint66 · Dec 29, 2023

As I am fiddling a lot with LLM models these days, I thought I just want to share an excerpt of a scene where I just let the AI keep speaking.
I just created a character in like 3 minutes for this, with the idea of a lonely girl at christmas eve.

Thanks to Voxta team, TacoCat for the environment and CuddleMocap for the sitting animation.

Note: speech is purely generated by AI, it's no recorded voiceline or such.

AWWalker · Dec 29, 2023

This is soo coool

Those guys have taken it to the next level!

SlimerJSpud · Dec 29, 2023

Impressive. I noticed at 0:48 when she says, "I'm not asking for much" that the accent seems to slip into British English and then recover. Can we add an "EMO Girl Friend" dial to that?

For a lament, she seems pretty calm.

Saint66 · Dec 29, 2023

SlimerJSpud said:
Impressive. I noticed at 0:48 when she says, "I'm not asking for much" that the accent seems to slip into British English and then recover. Can we add an "EMO Girl Friend" dial to that? For a lament, she seems pretty calm.

very well observed with the british accent… it‘s a local TTS engine called Coqui, it sometimes prefers that somehow, even in German lol

AWWalker · Dec 29, 2023

Saint66 said:
very well observed with the british accent… it‘s a local TTS engine called Coqui, it sometimes prefers that somehow, even in German lol

Sounds so much better then piper and silero .. i haven't tried it yet, i will now though. Are you running it all local? Seems pretty quick on the responses. (Are you using a xx90 series? I knew i shoulda forked out more money)

TreyWilly · Dec 29, 2023

I also support Voxta although I haven't had much time to mess around with it. How did you trigger the ai to give a response without any verbal input?

Saint66 · Dec 30, 2023

TreyWilly said:
I also support Voxta although I haven't had much time to mess around with it. How did you trigger the ai to give a response without any verbal input?

There‘s a function in Voxtas inspector view, I am just sending „continue“ every few seconds.

Saint66 · Dec 30, 2023

AWWalker said:
Sounds so much better then piper and silero .. i haven't tried it yet, i will now though. Are you running it all local? Seems pretty quick on the responses. (Are you using a xx90 series? I knew i shoulda forked out more money)

Only online service I use is Deepgram for STT (not needed in this scene).
Then Ooba for text generation and Coqui for TTS.
Sometimes I use OpenAi for text and Eleven, if I want perfect German or need more VRAM….
And yes, I use a 4090. Coqui and the LLM need about 9GB of VRAM

AWWalker · Dec 30, 2023

TreyWilly said:
I also support Voxta although I haven't had much time to mess around with it. How did you trigger the ai to give a response without any verbal input?

I've used an can have a timer set, or create a trigger to prompt for more responses with a timed interval.

Saint66 said:
Only online service I use is Deepgram for STT (not needed in this scene).
Then Ooba for text generation and Coqui for TTS.
Sometimes I use OpenAi for text and Eleven, if I want perfect German or need more VRAM….
And yes, I use a 4090. Coqui and the LLM need about 9GB of VRAM

I just switched to the latest version of voxta and installed the xTTS module... incredible...

Jocks3D · Feb 18, 2024

I'm not that familiar with Voxta. Can it use male models?

AWWalker · Feb 18, 2024

Sure, you can use it on any model i'm sure. It's just an AI interface, and if you give the AI a Male persona in the character description, it'll act like a male. It interfaces the API for a LLM, which if you are familiar with, can be customized to impersonate any thing you can think of... from a ant to a giant space alien, straight female, to flamboyant male. Whatever your heart desires, it can be.

BStarG2 · Feb 18, 2024

I like this a lot. But I'd like to do this with video game girls, say Lara Croft to use a typical example. The voice, however, would have to be at least one of the voices found in one of the games; the idea is that I would want to use a character that actually sounds like said character in the scene, rather than using a voice from something like Vammoan's list of voices on her, which wouldn't fit at all.

On top of that, for some characters, accents can be an obstacle. If I want to bring say Manon from SF6 in a scene like this, her French accent would destroy almost any attempt I've seen from A.I.-generated voices so far (I haven't found good A.I. generated voices where the speaker speaks English with a French accent, if there's one out there I'd want to know). Then again it wouldn't have to be a generic female voice. It'd have to be the voice from the game.

The next problem with this is that for some characters (especially in Fighting games) there's just not enough total voice samples to feed an A.I. in order to get enough unique generated sentences, since the source material is very sparse. It might work for games where there's a lenghty campaign, with lots of cut-scenes and dialogue though (say, indeed, any of the recent Tomb Raider games for Lara, or maybe Horizon Zero Dawn for Aloy would work well, etc).

I'm a sucker for video gaming girls in VAM, so my 'needs' with this would be very specific and tougher to achieve (not impossible though, just tougher, it would require lots of samples and a very good A.I. voice generator that can deal with all sorts of accents, or even could produce voices in the actual foreign languages outside of the typical base English most of them use).

AWWalker · Feb 18, 2024

Use XTTS, get an 8 second clean sample of her voice, and it'll do the rest. You just put a wav file in a folder, and it learns, in seconds after you select the voice in voxta. Youtube, MP3 Converter Online to WAV -- See the voxta page, it has some useful info on it : https://doc.voxta.ai/docs/xtts-server/

XTTS has a language specifier in the JSON // en-fr might be good for french, though i haven't tried accents.

it doesn't require lots of samples, just a good clean sample, not ummmmsss, no deep breathing, and use Audacity to remove the background noise, and you'll be very pleased.

BStarG2 · Feb 18, 2024

AWWalker said:
Use XTTS, get an 8 second clean sample of her voice, and it'll do the rest. You just put a wav file in a folder, and it learns, in seconds after you select the voice in voxta. Youtube, MP3 Converter Online to WAV -- See the voxta page, it has some useful info on it : https://doc.voxta.ai/docs/xtts-server/

XTTS has a language specifier in the JSON // en-fr might be good for french, though i haven't tried accents.

it doesn't require lots of samples, just a good clean sample, not ummmmsss, no deep breathing, and use Audacity to remove the background noise, and you'll be very pleased.

Ok, fair enough, that's new info for me.

Thank you very much. I'll give all of that a try in the coming days.

Saint66 · Feb 19, 2024

BStarG2 said:
Ok, fair enough, that's new info for me.

Thank you very much. I'll give all of that a try in the coming days.

Use this for cleaning up the voices, it’s free

Enhance Speech from Adobe | Free AI filter for cleaning up spoken audio

This AI audio filter improves spoken audio to make it sound like it was recorded in a soundproofed studio.

podcast.adobe.com

Jocks3D · Feb 19, 2024

AWWalker said:
Sure, you can use it on any model i'm sure. It's just an AI interface, and if you give the AI a Male persona in the character description, it'll act like a male. It interfaces the API for a LLM, which if you are familiar with, can be customized to impersonate any thing you can think of... from a ant to a giant space alien, straight female, to flamboyant male. Whatever your heart desires, it can be.

Thanks, I went ahead and started checking out Voxta. One thing I'm not clear on from reading the webpage and watching some videos: do the voices come with Voxta and if not, where are we supposed to get them from? Is there a tutorial on that topic?

AWWalker · Feb 19, 2024

Jocks3D said:
Thanks, I went ahead and started checking out Voxta. One thing I'm not clear on from reading the webpage and watching some videos: do the voices come with Voxta and if not, where are we supposed to get them from? Is there a tutorial on that topic?

You have to install the prerequisite applications... A LLM Chat Engine (Kobold, oobabooga) that will handle the AI conversation, Inferencing and Summarization. A Text to Speech engine, as referenced above, xTTS. The voices are just wave files of any speech from a person talking, whether it be a simple interview, a short sentence, or longer. xTTS will use that to create the voice, it doesn't need anything more then a short clean WAV file of a person talking naturally. You'll have to experiment and research a bit on how to get the best quality. You can also include a STT (Speech to Text) engine, which is included with Voxta, and works pretty good. VOXTA is a front end to the APIs of the other apps you install, amongst other things. Read the docs, google alot. It took me a short while to figure out how to get good sounding voices, and how to clean them up too and make them sound pretty believable.

Saint66 · Feb 20, 2024

Jocks3D said:
Thanks, I went ahead and started checking out Voxta. One thing I'm not clear on from reading the webpage and watching some videos: do the voices come with Voxta and if not, where are we supposed to get them from? Is there a tutorial on that topic?

There are several ways to get your own voices, pretty much depends on what services you are using. It’s quite easy.

And to add to @AWWalker detailed reply: if you have no beefy setup or don’t want to install local services, they just released their own cloud, hosting everything you need.
I would suggest to use at least a local TTS to save on your cloud credits

BStarG2 · Feb 22, 2024

AWWalker said:
Use XTTS, get an 8 second clean sample of her voice, and it'll do the rest. You just put a wav file in a folder, and it learns, in seconds after you select the voice in voxta. Youtube, MP3 Converter Online to WAV -- See the voxta page, it has some useful info on it : https://doc.voxta.ai/docs/xtts-server/

XTTS has a language specifier in the JSON // en-fr might be good for french, though i haven't tried accents.

it doesn't require lots of samples, just a good clean sample, not ummmmsss, no deep breathing, and use Audacity to remove the background noise, and you'll be very pleased.

Ok just a small update on this.

Finally had some time to check this out. I followed the video to a T, and it works up to the point at about 2:10 in the video (the installation guide video).

It says this: " run xtts server with -d cuda and --deepspeed flags " with the specific CMD command to type.

I typed that command, and now it says: " No module named xtts_api_server "

So like I said, so far everything shown in the video works, I've done exactly as described (same download locations, and installation paths, folder names, etc, followed to a T, nothing different). But yeah I'm stuck there. Any tip for that one?

Thanks.

AWWalker · Feb 23, 2024

BStarG2 said:
Ok just a small update on this.

Finally had some time to check this out. I followed the video to a T, and it works up to the point at about 2:10 in the video (the installation guide video).

It says this: " run xtts server with -d cuda and --deepspeed flags " with the specific CMD command to type.

I typed that command, and now it says: " No module named xtts_api_server "

So like I said, so far everything shown in the video works, I've done exactly as described (same download locations, and installation paths, folder names, etc, followed to a T, nothing different). But yeah I'm stuck there. Any tip for that one?

Thanks.

What version of python is showing in stalled? Cuz most apps under python are version specific. I have had that error... a while back

BStarG2 · Feb 23, 2024

AWWalker said:
What version of python is showing in stalled? Cuz most apps under python are version specific. I have had that error... a while back

The exact same as shown in the video guide. Which is python 3.11.7 amd64.

AI can bring so much into your scenes

Well-known member

Another White Walker (AWWalker)

Invaluable member

Well-known member

Another White Walker (AWWalker)

Member

Well-known member

Well-known member

Another White Walker (AWWalker)

Active member

Another White Walker (AWWalker)

Active member

Another White Walker (AWWalker)

Active member

Well-known member

Active member

Another White Walker (AWWalker)

Well-known member

Active member

Another White Walker (AWWalker)

Active member

Similar threads