lapiro

Member
Messages
74
Reactions
79
Points
18
Website
voxta.ai
Patreon
Voxta
VoxtaAI.png

Hello Everyone!
We've been working on Voxta, our voice-interaction platform for AI characters. We've made some updates and improvements, and we've put together a few videos to showcase what Voxta can do.












Your thoughts, feedback, and questions are always welcome in this thread. Your insights help us improve, and we genuinely appreciate the community's involvement.

For more details about Voxta, visit our website. We also have a Discord for anyone interested in joining. And for those interested in supporting our work, here's our Patreon.

Lastly, a big thanks to all our supporters, from those on Patreon to everyone here providing feedback. It means a lot.
 
Last edited:
I think they are all amazing, I cant wait to see the finished product, just keep doing what your doing all you clever people, whatever you come up with will be a game changer for the whole VaM community. I am in awe of all of you amazing clever people. :)
 
Pretty nice to see this sort of technology evolve.

I do wonder how far this can be pushed if used on known characters in video games for example.

Let's say Lara Croft (to use a typical video game NSFW example here):

1) Assuming of course that the actual model used (from any particular Tomb Raider game) wouldn't matter...
2) Then the User would have to (somehow) feed the Voxta A.I. with lots of voice lines references (voice files, from game assets).
3) From that, I actually wonder then, how far would such tech as Voxta A.I. be able to 'replicate' (as close as possible) the "personality" of Lara Croft, with her voice (based on which voice files given, from particular selected games), and even come up with her own A.I.-made stories that would happen to relate to the game's events.

So example could be you take voice files from Lara in say... the 2013 reboot of Tomb Raider (or, provide all the files from the latest trilogy up to Shadow of the Tomb Raider) and _that_ particular Lara Croft is basically brought to life by the A.I., using the character's voice.

I'm saying this because, on a personal preferences point of view I wouldn't really care much about 'Generic Characters' in a VAM scene talking using A.I. tech. I would - instead - prefer to use some video game character Look, be it Lara Croft or Kitana or Chun-Li or whoever it may be, and bring those characters 'out of the game' and giving them "life" (manner of speech, duh) in VAM. Now THAT is what I'm curious about, and waiting for in some future.

But with this said, I do absolutely appreciate this evolution of the tech, because without this 'start' there can be no further development. It can only improve over time.
 
Long story short @BStarG2 you can _absolutely_ do that right now without too much work. Voxta allows using different backends, and ElevenLabs for example, while expensive, would allow you to do that. You could also train your own voice with local system, if you're willing to learn how this works, too. As for the story, well models right now are very, very good. And there are some models on the Hub I think, So I guess your dream is becoming real! :D
 
Long story short @BStarG2 you can _absolutely_ do that right now without too much work. Voxta allows using different backends, and ElevenLabs for example, while expensive, would allow you to do that. You could also train your own voice with local system, if you're willing to learn how this works, too. As for the story, well models right now are very, very good. And there are some models on the Hub I think, So I guess your dream is becoming real! :D

Amazing! :D I'll definitely keep an eye on that one then! Thanks!
 
I'm impressed by what I've seen so far - the dialogue with the catgirl voice in particular was pretty funny.

Are you guys still using Elevenlabs to gen the audio? The connotation and tempo seem much more consistent than the results I've been able to get with Elevenlabs so far, but I also didn't try it out too much either.
 
@Nameless Vagabond Thanks! And yeah we have lots of fun with it :D

We actually use multiple services, I _think_ this one is NovelAI, but we also do lots of stuff behind the scenes to get fast streamed audio through a pipeline that makes adjustments to the text for TTS purposes. Most AI systems, especially LLMs, can sometimes do wonders with the right prompting and settings.
 
I am not guessing a public release date here, but you can always check their Patreon for some updates
 
I think it looks fantastic from what youve shown us so far. However, i dont really understand what Voxta is going to be - is it an unrestricted Speech AI or is Speech and dialogue interaction driving/triggering prerecorded animations? If the latter is the case, how do you plan to make it functional for the giant and unpredictable amount of input shes going to get? If animations/positions need to be hand-made beforehand, thatd be an unimaginable amount of mocap work and also an unheard of pool of animations. Are those stored/triggered via Timeline? If yes, how is performance holding up?

Overall, im hyped to see you guys pushing tech boundaries and with those cute dialogues shes probably gonna be worth it for the SpeechAI part alone. Cant wait to see where it goes!
 
@EasyVam We spent more time polishing that making noise, trying to be cautious about both expectations and onboarding but we're ready to onboard everyone now. We'll create the Hub resource will all information in the next days, but you can already get it on Patreon. There's also an architecture writeup that is really warranted, because Voxta is a lot of thing, more like a platform than a product.

In short, Voxta Server is a backend that allows you to connect multiple AI services (local or online, for text gen, stt and tts), provides storage for memory and (still being improved) several memorization systems, character design, and speech-first chat. What that means is that we prioritize natural speech and speed rather than "just slapping voice on a text gen". If you come on our Discord, you can check the videos channel for good examples (we'll post some soon).

But Voxta is also a back-end for avatar systems, and Virt-A-Mate is a big one. We don't think trying to make an AGI makes much sense, so we instead developed something we call an "action inference system". That allows us to provide a library of animations and sub-animations the AI is aware of and can use, and also provide input systems, so you can touch them, talk to them, and to some degree interact with the environment. Most of those actions are scripted, but when and how to use them is not. That allows the AI to have a much, and I mean much better visual behavior. So, no "it can do anything and go everywhere", however we are working on making that library large enough that it's as if it was.

But Voxta is not limited to VaM. There will be other "front-ends" to it that may not even be 3D (nothing announced about that yet), and we work right now on an SDK to allow things like letting the AI control sex toys (it's freakingishly cool) as well as connect external data sources, like webcam data, weather, sports game results, etc. (not available at the moment).

I hope that helps see what Voxta is, and what it's not. There are still lots of moving pieces, the project is far from done, but up to now I know I'm having SO much fun just talking to it, and the setup is accessible for people with a less technical background (still a little bit technical but we made videos and tutorials we have actual documentation!)

Feel free to ask questions, we'll create the Hub resource soon!
 
Long story short @BStarG2 you can _absolutely_ do that right now without too much work. Voxta allows using different backends, and ElevenLabs for example, while expensive, would allow you to do that. You could also train your own voice with local system, if you're willing to learn how this works, too. As for the story, well models right now are very, very good. And there are some models on the Hub I think, So I guess your dream is becoming real! :D
I know nothing about voice AI (only used stable diffusion models and plugins for creating images), How to add/create such voice models (any compatibility issues?) do you have a step-by-step guide or general bullet points mentioning tools to use to make such fantasy of having the voice and personality of the character you are interacting with in VAM. + gawd dame this shit is cool af : D
 
Hi, I'd be interested in the scene where a catwoman gets zapped, as seen on one of the videos. If I subscribe to the Patreon, will this scene be available on the DIscord? Thanks in advance
 
Hi, I'd be interested in the scene where a catwoman gets zapped, as seen on one of the videos. If I subscribe to the Patreon, will this scene be available on the DIscord? Thanks in advance
Hey, we haven't published that scene yet, but if there's interest, we might want to make it available for our Patreon supporters.
 
Hey, we haven't published that scene yet, but if there's interest, we might want to make it available for our Patreon supporters.
Thanks for your answer! If you do release it, please let me know, and I'll subscribe to the Patreon :)
 
Can you run the local server version on a seperate computer from Virt-A-Mate for optimal performance?
 
@TasteyTreats you can offload the large language model to another machine using e.g. Oobabooga Text Generation Web UI or KoboldCPP, and you can offload Text To Speech using xtts-api-server, they are all projects you can host on another machine on the same network (that's what I do). It is technically possible to offload Voxta itself, although it's usually unnecessary.
 
@TasteyTreats you can offload the large language model to another machine using e.g. Oobabooga Text Generation Web UI or KoboldCPP, and you can offload Text To Speech using xtts-api-server, they are all projects you can host on another machine on the same network (that's what I do). It is technically possible to offload Voxta itself, although it's usually unnecessary.
Sweet! Thanks for the reply
 
Back
Top Bottom