Meshed as been fully transparent about what VAM 2 brings, which is a solid game/plateform with a toolkit and an api.
Anything deeper than the base (and probably very long and complete) set of features will be up (most of the time) to people to implement and create.
But, for the sake of the argument, let's assume for one second there was some LLM in there : LLM is nothing without data. AI companies did a good job at making people think it's magic, but for the two examples you gave :
Scene transitioning :
- How do you transition?
- From what context?
- To what context?
- What character is what?
- What character does what?
- ...
User interaction :
- What is an interaction?
- When interaction is defined, how to react? (anim)
- When and why to react? (contraints)
- To what is reacts? (awareness)
- ...
What I'm trying to show you here is the reason why games that tried to implement "AI" are pretty much only doing a "chatbot" is because you'd end up doing the exact same process as scripting a scene.
You apparently recall Voxta (which Acid made/is making), and if you had tried it, you'd realize that beyond the chat feature, anything that involves everything else (interaction, motions/pathfinding, behaviors...) have to be manually scripted and authored, because "AI" is not intelligent, nor smart, nor can invent data (content) out of thin air.
To be clearer, data (content) or : animations, behavior trees, logic and the relationship of this to a basic LLM chat is impossible to generate dynamically. Let's say you create a room with a couple of chairs, a couch and you put a girl in there. Even tho Voxta can chat, it has zero clue what is a chair, what is a couch and what space it's in... unless you feed a complex set of informations and assets (for instance the animation) in relationship with some triggers during the discussion.
It means, defining the space (nav mesh), explaining what asset is a chair, where to go to approach the chair, what can the character do close to a chair ( sit? jump on? kick it ), what animation is tied to an action ( sitting animation, jumping animation )... and when you did aaaaaaaaallll this, for all your assets and contexts in your scene, then you can tie them to a discussion/behavior tree.
If you prefer, if you'd take any Voxta demo scene right now, and go outside of the "script" defined by the creator(s), like for instance ask "make a sad face" or "jump out of the window on your right", if that is not scripted (by a human) to feed it to the voxta scenario and LLM to understand that request, you'd hit a brickwall and the character will deny the action.
Now for something more "algorithmic" like animation/pose transitions, this could be a thing. But knowing some of the tech game studios use, like for instance JALI... this kind of implementation requires a ton of data to train it and an extremely solid code to handle all that. And even with that, it's not something you could use to go "hey let the girl do this anim, then that, then this" click a button and work out of the box. It would be be a tool which would still need human interaction to script... just like Gaze or whatever tool in VAM 1 at the moment
If you prefer, and that's the reason why the LLM bubble is about to burst ( for generative use ) : LLM is nothing without data since it "knows" nothing. Even if we assume that some LLM tool would be available for VAM 2, the data/knowledge would only be up to the users/creators to feed into it. And if out of curiosity you check Voxta's dependencies : only a single scene has been released by someone else other than the team members.
So with all that context, there is a place for LLM in VAM 2, everybody is welcome to create tools for it. But LLM is not something that will make VAM 2 a "one click fantasy creator". Complex scenarios and interactions in games requires a massive amount of human input. And since LLMs are at the point of diminishing return, until companies have an eureka moment with a new tech that would make this actually "smart", it's probably gonna stay that way for a long time
