Free neural text to speech with emotion!

Guides Free neural text to speech with emotion!

Don't you wish VAM had more story based content? I think the major thing holding this back is the lack of voice. Are you sick of using speech bubbles in VAM? Don't you wish there was an easy way to generate excellent TTS (text to speech) that sounded decent and even included emotions? Even better, don't you wish you could do it for free? Well, you're in luck!

Microsoft Azure has amazing AI neural speech text to speech, and if you use their Free tier, you can do 500,000 characters per month! This includes ALL features, such as many voices in virtually all languages, having voices have accents in different languages (for example, speaking English with a Spanish accent), and using emotions from the list below;

1662402346333.png


If you want to see how the voices sound without going through any setup (you won't be able to download any of the files though) click this link;

Text to Speech – Realistic AI Voice Generator | Microsoft Azure

Start by setting up a free Azure trial. It's completely free for the first year, and once that first year is up, just set your subscription to the 'no support' option and it will remain free!

Azure Free Trial | Microsoft Azure

Then add Azure speech services.

Create Speech Services - Microsoft Azure

1662476676680.png


Make sure you select the Free Tier (F0)

Once you have that, just go to Speech Studio and get creating! See below;

Speech Studio - Microsoft Azure

1662402697080.png


Click 'Audio Content Creation' (the one the red arrow is pointing at.)

Below is what the editor looks like. Select a voice and just start typing!

1662401810055.png


I've attached the audio samples so you can hear how it sounds, along with the actual transcript in the screenshot above. It's all point and click, and very simple! Just highlight the text that you want to use a different emotion, then select the voice from the drop down on the right by 'speaking style.'

What's also great is it can store your projects, so you can always have your written text backed up, and even backup the generated audio files!

If you break your writing up into paragraphs (as distinguished by the numbers on the left) then when you do your audio download export, you will have the option of each paragraph being a separate audio file. So, each character's lines in a separate paragraph, do your export and bam, you've got all of it in a single shot!

1662402014103.png


1662402044114.png


Also, this is all API based... So, if any of you devs out there would like to write a VAM plugin.... :)

I hope this is helpful!

1 minute long video clip of a VAM character using the Azure AI generated audio;


Part 2;
  • 1662402481353.png
    1662402481353.png
    42.4 KB · Views: 0
  • 1662476391039.png
    1662476391039.png
    104.6 KB · Views: 0
  • 1662476596060.png
    1662476596060.png
    119.1 KB · Views: 0
Author
Bob Nothing
Downloads
1,077
Views
13,296
Version
2
First release
Last update
Rating
5.00 star(s) 14 ratings

More resources from Bob Nothing

Latest updates

  1. Added video

    I've updated this guide with a 1 minute long video of a VAM character doing the Azure AI...

Latest reviews

brilliant piece of info
Upvote 0
I still need some questions to be answered: Did you just save the custom audio clips as oggs and played it on AudioSource? Or did you used something else, i think we need another guide on how to implement this into the game with model emotions
Bob Nothing
Bob Nothing
Typically, I will write up an entire script with various emotions used. (You can even use different emotions within the same line.) Then I do an export of .wav files and I select the 'each paragraph is it's own file' option. Then in animation triggers I use HeadAudio on the person atom to play the audio file using RT Lip Sync plugin. It works really well. If you're going to have a conversation back and forth, I'll have my first audio file play 5 seconds in for example, hit play, then hit stop once the audio file is done playing. Then I hit play again for a second or so, then stop. Then on the next audio trigger click the 'current time' button, and so on and so on so the audio spacing is correct.
Upvote 0
Excellent!
Upvote 0
Thanks for the tip !
Upvote 0
Great guide. I'm going to try it right now! I've been using ttsfree.com. They source from various places (IBM, Microsoft, etc I think) They have a pretty good number of voices but no emotion/inflection options. I agree 100% voiced scenes are much better than speech bubbles and it's incredible how good the voices are these days. Death to speech bubbles!!!
Upvote 0
After going through all those sign-ups I realized I could just record the voices from the demo page into my daw and make my own audio files for the same result. Thanks for sharing this, really gonna make my scenes more interesting!
Upvote 0
Thanks soo much for pointing this out!
Now my girls never stop talking ;)

I hope they add some emotions to other languages than US soon, since my girls are from Germany, lol.

Do you know if there’s some kind of community sharing custom voices?

Anyway, you are my hero!
Bob Nothing
Bob Nothing
Well, hopefully I'm about to make your day. Some of the US English voices are multilingual. There is a "Jenny multilingual" for example that speaks German. Also, if it helps, you can give the English speaking voices that aren't multilingual German accents, if that helps.
Upvote 0
Thank you for taking the time to make a detailed tutorial. I've been looking for a solution like this for a long time. Google offers a similar function. But I didn't have Microsoft on the screen. Thanks for that!
Bob Nothing
Bob Nothing
I haven't tested Google's yet, but Azure is lightyears ahead of AWS Polly, it's free, and there's a lot more options in voices plus emotions. Honestly it crushes AWS. I'm really looking forward to seeing some content from you that uses it! :-)
Upvote 0
Thanks for the guide! Microsoft Azure is very realistic text to speech. Not robot like at all.
Bob Nothing
Bob Nothing
I agree, it's the best I've tested. I was using AWS Polly for a bit since I had a free year to try it out. It was only costing me like $.35 a month with my usage but I thought hey, why not see if there's anything else out there free, and the Azure stuff is SOOOO much better!
Upvote 0
God i wish this was able to be standalone/portable.
Bob Nothing
Bob Nothing
Well, you generate your audio files and then it would be stand alone/portable. It creates audio files that you download and then use in VAM. Via API it could be doing in real time/live, but I doubt anyone would want to front that, and I don't really see the advantage.
Upvote 0
Back
Top Bottom