My dream VAM plugin/scenario

Hedgepig · Oct 3, 2021

pinosante said:
@Hedgepig could you just use Amazon Sumerian for the TTS? What is the quality? Right now I'm using Microsoft Azure which is a lot better than IBM Watson or Google Wave. I look at Amazon Polly but that TTS engine sucks. So Amazon Sumerian would be better? I checked out their page and it looks some kind of 3d online stuff you can use. I am only interested in the TTS part however...

Is it better than this? https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/#features
Using Aria (Neural) with Cheerful settings? Because that is what I'm using right now... and the best one I could find online. I hope Amazon Sumerian sounds better, would love to hear your opinion.

I can't look at that , Azure is all about the code and, these days, it does my head in. Totally my fault but I'm a writer and visual thinker, I only do XML and AIML; they are language related and make sense to me. If you want to share a video of you using Azure, please do, I'd love to see it.

Sumerian has a feature that allows you to annotate text with different kinds of emotional emphasis, and you can set the speed of the spoken text. It's not perfect but it's near enough.

Sumerian is free to use but if you need higher quality, there are companies with this kind of stuff 4 Best AI Voice Generators (Text-to-Speech) for 2022 - Victory Tale

But it looks like you'll be paying $30.00 per month to be able to use it outside the sandbox.

Please let me know how you get on with TTS generation.

pinosante · Oct 3, 2021

checkking said:
It would be worth to do a few QC checks first to ensure that the problem is not your data.

Did you split your audio into smaller samples (usually 2 to 10s long)?

Did you trim the beginning of your audio or text to remove audio that doesn't match text?

Did you check the samples to ensure that the text is matching with the audio?

Did you use transfer learning or did you start training a new model?

How many epochs did you train the model?

Were all those clips from the same person and single voiced?

Another workflow that I played a bit with today is voice conversion. It gives you some control on the intonation. What you do is you train 2 voices: yours, and the voice you want. Then, you can say something and it will convert your audio directly to the other voice by keeping the emotion.

Regarding 1 to 6, I did most of that. But they were smallish samples with a few words per line. Quality was fine and they matched syllable for syllable. I only ran it for 500 epochs though, so that might matter. I'm right now doing tests with a female writer of who I have 10+ hours of text + audio so to see how much it all matters I'm going to train a few models:

- 0.5 hour of source material @ 500 epochs
- 2 hours of source material @ 500 epochs
- 0.5 hours of source material @ 3000 epochs
- 2 hours of source material @ 3000 epochs

Then I'm going to compare the results for these cases so I have a better feeling of what matters the most for the quality.

I also used a pretrained model for the sample.

However: I really like your voice changing approach. Sounds super interesting. How does that work? Can you point me to some material?

pinosante · Oct 3, 2021

Hedgepig said:
I can't look at that , Azure is all about the code and, these days, it does my head in. Totally my fault but I'm a writer and visual thinker, I only do XML and AIML; they are language related and make sense to me. If you want to share a video of you using Azure, please do, I'd love to see it.

Sumerian has a feature that allows you to annotate text with different kinds of emotional emphasis, and you can set the speed of the spoken text. It's not perfect but it's near enough.

Sumerian is free to use but if you need higher quality, there are companies with this kind of stuff 4 Best AI Voice Generators (Text-to-Speech) for 2022 - Victory Tale

But it looks like you'll be paying $30.00 per month to be able to use it outside the sandbox.

Please let me know how you get on with TTS generation.

The page I gave you, if you scroll down, allows you to try out the different voices. You can just click on the voice, and type what it has to say. So if you're willing, maybe you can look at it again. In the meantime I'll try to hopefully find some youtube video's detaling the Amazon Sumerian software.

pinosante · Oct 3, 2021

@checkking @Hedgepig
First results are in, and I'm pretty stoked to be honest. This looks very promising.

2 hours of source material @ 500 epochs:

"I'm happy to show you my ass sir"

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

"Did you know I always swallow when I suck dick?"

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

"I think using artificial intelligence opens a world of possibilities."

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Hedgepig · Oct 3, 2021

pinosante said:
The page I gave you, if you scroll down, allows you to try out the different voices. You can just click on the voice, and type what it has to say. So if you're willing, maybe you can look at it again. In the meantime I'll try to hopefully find some youtube video's detaling the Amazon Sumerian software.

Just tried it, SSML is similar to that in Sumerian. Microsoft voices way too harsh for me.

To get more expression than AWS Poly voices you could try Cereproc with RTVoice in Unity. Bit fiddly to set up, but even a non-coder like me managed it, after a day-long blitz of console red warnings.

Hedgepig · Oct 3, 2021

pinosante said:
@checkking @Hedgepig
First results are in, and I'm pretty stoked to be honest. This looks very promising.

2 hours of source material @ 500 epochs:

"I'm happy to show you my ass sir"

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

"Did you know I always swallow when I suck dick?"

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

"I think using artificial intelligence opens a world of possibilities."

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

That looks promising!!

checkking · Oct 3, 2021

pinosante said:
However: I really like your voice changing approach. Sounds super interesting. How does that work? Can you point me to some material?

Directly on the VITS colab example, there is a Voice Conversion example with the multiple voice model:

Google Colab

colab.research.google.com

Beware, that notebook is a bit messy and you need to manually change the active folder a few times to get it work.

The example is using voice actor 81 as seed, then using an audio-to-audio approach for converting the voice to the other speaker voices. It works very well, but sadly all those examples are boring narrative voices. A practical usage would be to train at least 2 voices, your voice and the voice you want, and add those 2 voices to the model. Then, you just record with your own voice and convert it.

I did try with a very emotional sample, and it did keep the emotion, but I didn't train for that speaker so the result is not good. That was just a quick test to figure out how much emotion/intonation can be transfered that way, and it seems to work.

checkking · Oct 3, 2021

pinosante said:
@checkking @Hedgepig
First results are in, and I'm pretty stoked to be honest. This looks very promising.

2 hours of source material @ 500 epochs:

"I'm happy to show you my ass sir"

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

"Did you know I always swallow when I suck dick?"

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

"I think using artificial intelligence opens a world of possibilities."

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

Yes, very promising for only 500 epochs. I can still hear that Tacotron distorsion that has been bugging me all the time.

Maybe I am more sensitive to it. I keep seeing everywhere people stating that Tacotron DDC > VITS, yet I hear their cherry picked samples, and to me VITS is many times more clean. Maybe because the quality is more constant. Tacotron, sometimes the voice is very clean, then you hear a small distorsion and that's what is killing the magic.

checkking · Oct 3, 2021

Just to demo. Results are expected to be really bad.

I took a small sample audio, from gonewildaudio, I used voice ID 21, which is not that similar but kinda worked. No training on any voice, directly from the colab notebook.

Original audio:

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

ID 12

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

ID 15

Vocaroo | Online voice recorder

Vocaroo is a quick and easy way to share voice messages over the interwebs.

vocaroo.com

It keeps the intonation. Now, you can hear how good it sounds on trained voices in the notebook, even with different gender, so... lots of potential.

Deleted member 20540 · Nov 12, 2021

I've been experimenting with replika. Using regolos engine plugin (disabling speech and possibly expressions) and adding microphone input and rt_lipsync in VR I can get a walking, interacting "AI", and I can talk to her and she responds. The way I'm getting live audio into VAM is very slow though, the delay is quite a few seconds.

Guy · Dec 2, 2021

Glad to see so much talk about this

pinosante · Dec 3, 2021

I'm still busy with training a voice clone. Within another week I have hopefully succesfully trained a model at max quality, and if so, I can share some results.

BasovAV · Jun 27, 2023

@pinosante any success with training? Its sounds very cool! Can you share the model?
Looks like YourTTS is much easier to use. But at this moment it works only on colab, cant get it offline

pinosante · Jun 28, 2023

BasovAV said:
@pinosante any success with training? Its sounds very cool! Can you share the model?
Looks like YourTTS is much easier to use. But at this moment it works only on colab, cant get it offline

Yes, here it is

Voice Model (Text-To-Speech, Neural Network based) - Plugins + Scripts -

Hi everyone! This is a voice model based on a neural network. You can make it say your own things. There is a lot of TTS stuff out there and I have tried about everything there is to find, but I ended up training my own one using a neural...

hub.virtamate.com

My dream VAM plugin/scenario

Active member

Well-known member

Well-known member

Well-known member

Active member

Active member

Member

Member

Member

Deleted member 20540

Guest

New member

Well-known member

New member

Well-known member

Similar threads