Pipeline for high quality viseme animation (question)

robbie85s · Jun 21, 2023

So one of the things I find most interesting these last couple years is how accessible the AI models are becoming and how helpful they already are at generating immersive, custom voice audio. There are a few text-to-speech models out there that give surprisingly good baseline results, and also a few applications that can brilliantly change your own voice acting into a variety of same or opposite gender characters. Properly embedded, this kind of thing can really shift the immersion level in VAM (or anything, really).

I'm not a 3d dev / artist, but I'm trying to see about making a fast, quality development path for matching viseme animation rigs to these audio clips. I've read from meshed that he's well ahead of the curve on this type of thing with 2.x, where these kinds of links and imports can be managed by all kinds of clever external models (ones that aren't invented yet). But since I don't know when the beta for that kind of thing will be available, I'm playing around with it in 1.2x.

Right now I'm using an AI portal to generate videos of my characters speaking, with my custom audio as the soundtrack. d-id studios has an impressive service that does this, including the ability to upload a photo of your VAM character, so the result is useful and the animation is generally a very good baseline. I'm then going into acidbubbles' timeline manually and generating the animations using viseme / phoneme morphs. Since timeline is so versatile, and has the clever ability to link up head audio timing and frames, regardless of machine performance, this actually produces a shockingly good result.

All that said... it's really, really slow doing it that way. As I mentioned, I'm not a keen developer, so I wanted to ask the community for any suggestions about how to get this further. The goal would be to take these well centered, straight-on 2d videos of the avatars and generate animation rigs from them, and then port the facial animations directly into VAM. I know we have lfe's facial motion capture plugin for the iPhone, so maybe this idea of mine isn't too far off.

Does anyone know of a service available that can do something similar to this, or have any suggestions about how to ramp up throughput on this kind of thing? I'm interested in making hundreds of lines of dialogue eventually.

SPQR · Jun 22, 2023

not what you're look for likely but if the end goal is video format, wouldn't it be easier & better quality to enhance the lipsync through deepfake?

RiftWind · Jun 30, 2023

Do we not have a plugin that will generate audio from text? Something that you could feed custom string / text info into? I thought we had this? I remember seeing a chatbot type scene a few weeks ago but i didnt try it - i presumed it did this already (recieved txt line and fed to model).

Pipeline for high quality viseme animation (question)

robbie85s

New member

SPQR

Guest

RiftWind

Member

Similar threads