• Hi Guest!

    We are extremely excited to announce the release of our first Beta for VaM2, the next generation of Virt-A-Mate which is currently in development.
    To participate in the Beta, a subscription to the Entertainer or Creator Tier is required. Once subscribed, download instructions can be found here.

    Click here for information and guides regarding the VaM2 beta. Join our Discord server for more announcements and community discussion about VaM2.

Suggestions for AI driven plugin

UnknownPeasant

New member
Joined
Jun 17, 2025
Messages
2
Reactions
2
Hi! Im a long time lurker but have finally decided to make some contributions.
I have been working on a AI driven interaction plugin and are curious about what features the community would like to have in such a plugin.

This is the current features and requirements that I'm been working towards (and that's now mostly working)

Requirements:
* 100% locally running
* speach recognition
* models can talk back
* interaction with local ollama model
* support for mid-end systems, amd and nvidia
* easy to setup
* locally running server with options to distribute processing to other devices (eg other computers on the network)
* less than 2 sec response times
* ability to send commands that interact with vam

Implemented features
* speach to text integration (100%)
* speach processing and filters (100%)
* local llm integration (ollama) on the server) (100%)
* text to speach integration (100%)
* sending commands as 32bit int (100%)
* accessing the built in animations for a person (play anim/stop anim) (100%)
* accessing available clothing (~75%)
* accessing morphs (100%)
* gui in vam (~75%)
* server running locally (100%>

Left to implement:
*gui in vam needs some love
*server gui or at least console commands
*server options (set llm paths/set tts engine/more)
* dynamic commands set by vam gui to interact with other pluggins/triggers


Whats currently working:
The user can have a conversation with an vam character and will get voice responses with a basic lipsync (using amplitude atm). Responses takes between 0.5 to 3 sec to get, this delay actually don't feel that bad. The user can request that the model does stuff and the llm will, if it decides to, do those actions. The llm have access to the following actions as of now: clothes off, clothes on, play specific animation depending on mood/requests, talk back and reason with the user.

What essential is implement is a way to talk to a llm from within vam and get ai response back with possible actions.

I do know that there are other solutions avaliable out there for this but from what I can tell my implementation is more straight forward and is running way better on mid-end systems. My solution is also made to run 100% locally but can easily be distributed to other devices on the network.
(for example, some old laptop is running the server, an raspberry pi is running the tts engine, another machine is running the llm and finally your gaming computer is running vam, or you just run everything on your gaming computer)

What features would you like to see in this kind of plugin?

I will update this post with videos of it all running when i find the time to do so.
 
That sounds very exciting. Things I would celebrate:

1.) respond as quickly as possible
2.) Setting options SFW/NSFW answers
3.) multilingual
4.) emotions in the voice (not a robot voice)
5.) simple and easy UI
6.) Dialogue storage for further conversations

These are the things that I have in mind when I want to connect ki with vam. I'm really excited to see what you create and it's great that you're working on it. good luck.
 
That sounds very exciting. Things I would celebrate:

1.) respond as quickly as possible
2.) Setting options SFW/NSFW answers
3.) multilingual
4.) emotions in the voice (not a robot voice)
5.) simple and easy UI
6.) Dialogue storage for further conversations

These are the things that I have in mind when I want to connect ki with vam. I'm really excited to see what you create and it's great that you're working on it. good luck.
Ty 4 your feedback, focus is always on fast responses. The sfw/nsfw setting will highly depend on the selected llm and it's set context, this is a user configuration. As of now there's support for 10+ languages but this also highly depend on the models used. Will do my best to make it easy for the user to configure.
Emotions are a hard criteria to tackle as almost no existing models with emotion support also support streaming. One can however choose a model with trained voices to make it sound more human.
Im a designer at core and will not release a unusable ui.
Storing session data is something that I'm planing to add but it will not be part of my first release.

Thanks for your feedback!
 
Back
Top Bottom