Voice Model (Text-To-Speech, Neural Network based)

Other Voice Model (Text-To-Speech, Neural Network based)


Hi everyone!

This is a voice model based on a neural network. You can make it say your own things. There is a lot of TTS stuff out there and I have tried about everything there is to find, but I ended up training my own one using a neural network. After about 700h of training I'm sharing the model.

First time use (installation):
  • Download the latest release from https://github.com/BenAAndrew/Voice-Cloning-App/releases/tag/v1.1.1
    • You can choose the one with or without GPU support. If you only use this to generate voices, you can download the CPUonly version
  • Put the .exe in a directory you like, but IMPORTANT: NOT in C:\Program Files or C:\Program Files (x86). Choose a directory like C:\Voice or C:\Games\Voice or C:\VAM\Voice.
  • Run the .exe (IMPORTANT: for about a minute or so, be patient, it looks like it's doing nothing, but it needs time to create some directories!) until it opens a web browser, and then close it again
  • The app will have created some directories ('data' with subdirectories 'datasets, hifigan, languages, models, results, training')
  • Download these two files: g_02500000 and config.json and save them somewhere (not important where)
  • Now download the "VAM Voice Model v2.zip" from mega.nz and unzip it in the "models" directory. You should end up with YOUR_PATH/data/models/VAM Voice Model v2.0/checkpoint_510426
  • IMPORTANT: delete the original zip file after unzipping!
  • Start the app and wait until a web browser appears
  • Click on Synthesis
  • Last row "Vocoder" click next to it on "Add more", this opens a new menu
    • In the section "Add a Hifi_gan Vocoder" click next to "Hifi-gan model" and select the g_02500000 file you downloaded earlier
    • Click next to "Hifi-gan config" and select the config.json file you downloaded earlier
    • As a name, choose whatever you like, but "g_model" is probably a smart choice
    • Click on "Submit" directly below the Hifi-gan config
    • Click on back (on the top left)
  • Done!
Generating voices:
  • Run the app
  • Go to "Synthesis"
  • Click on Submit
  • Write a sentence in the text box, and click Submit
  • Click on the play button next to the linegraph to hear what is being said
  • Click on the three dots on the right, to download the clip if you like it
  • IMPORTANT: always end your sentences with a dot (.)
Caveats:
  • The voice can sound a bit "raspy" / "tinny", I solve this by having music on the background. v2.0 improved the sound quality of the voice a lot!
  • Trying to combine two sentences can end up with the voice model tripping up and generating garbage. v2.0 of the model should solve this (for 95% of the cases).
  • Sometimes the voices sound very low (not girly at all), this is due to the source material I used (a female also voicing the male characters in the audiobook I used). Solution: add a "You are the best," or "I love you,", or "Did you know," before the sentence and it will often raise the pitch of the generated voice. (You can experiment a little). v2.0 of the model should solve this!
  • Not ending your sentence with a dot (.) will generate garbage most of the time. The model needs to know when the sentence ends.
Some tips:
  • If you don't like how the text is spoken, just submit the same text again. Even if the text is exactly the same every time, the generated speech will be different, with different accents and tone. Sometimes I run a line a few times, to pick the generated speech I like best.
  • If the generated speech doesn't sound how you want it, consider breaking it up in smaller sentences.
  • Sometimes you can also combine different sentences to make a good one: let's say you want to have the speech for "I just love doing some programming" and it doesn't come out right. You can then generate "I just love you". And "I am doing some programming". And then use Audacity to cut the 'you' and 'I am' and connect the two sentences.
  • Experiment with word order, comma's.
  • The tone of a sentence can change by what is being said. "I love apples" can sound different from "I love what you are doing."
Training your own voice:
  • Disclaimer: it is a lot of work and not for the faint of heart!
  • The general approach is this: find an audiobook you like, and use that to train a voice on.
  • You will need a very good (nvidia) GPU or do this online with Google Collab (which is a pain in the ass).
  • You will need at least 4 hours of good quality material to work with.
  • Training will take at least 30 full days (30 x 24h) to get some reasonable result.
  • If you are interested in training your own voice, this is the Discord channel where you can ask for help: https://discord.com/invite/wQd7zKCWxT
  • You can also ask me here or do a PM.
Resources:
A small tutorial below (showing the v1.0 installation, but v2.0 works the same way):
Author
pinosante
Downloads
2,010
Views
15,288
Version
2.0
First release
Last update
Rating
4.78 star(s) 18 ratings

More resources from pinosante

Latest updates

  1. Very minor update, updated the instructions

    Thanks to some good feedback from @JimjackS0N I have updated the instructions. This will avoid...
  2. Improved the model (a lot!) and retrained it, please download this improved model!

    Hi everyone, It took me another long time, but I've been cleaning up the source audio, using...

Latest reviews

Very good
Upvote 0
Wow, I was very impressed seeing that... I hope to see content in near future from some talented creators using this. That would gain another level of reality in VaM. Thanks for bringing this one to our crowd.
pinosante
pinosante
I hope the same! Luckily there already some people even training their own voices, so that is more than I was hoping for!
Upvote 0
works good to me , need to do alot of time tho till the speech goes smother . but over all great work .
can be nice to have male voice too ... just sayn ...
pinosante
pinosante
Yes, it definitely pays off to cherry pick the best sentences from a bunch of tries. Male voice is a good point, in the future I'd be willing to do that. Unfortunately I have limited GPU time available so this will be something I might look into next year.
Upvote 0
Didn't work for me. Get this error during this step:
Run the app
Go to "Synthesis"
Click on Submit


OOPS! AN ERROR OCCURED
Please share the following error in an issue at https://github.com/BenAAndrew/Voice-Cloning-App

Type: PermissionError
Text: [Errno 13] Permission denied: 'C:\\Users\\...\\Desktop\\data\\models\\VAM Voice Model\\VAM Voice Model'
Full: Traceback (most recent call last): File "flask\app.py", line 1950, in full_dispatch_request File "flask\app.py", line 1936, in dispatch_request File "application\views.py", line 291, in synthesis_setup_post File "synthesis\synthesize.py", line 45, in load_model File "torch\serialization.py", line 594, in load with _open_file_like(f, 'rb') as opened_file: File "torch\serialization.py", line 230, in _open_file_like return _open_file(name_or_buffer, mode) File "torch\serialization.py", line 211, in __init__ super(_open_file, self).__init__(open(name, mode)) PermissionError: [Errno 13] Permission denied: 'C:\\Users\\...\\Desktop\\data\\models\\VAM Voice Model\\VAM Voice Model'
pinosante
pinosante
Hi, I DM’d you, but you haven’t replied yet. The problem you are having looks solvable to me.
Upvote 0
I get this error
[6420] WARNING: file already exists but should not: C:\Users\###\AppData\Local\Temp\_MEI64202\torch\_C.cp36-win_amd64.pyd
but then nothing happens nothing opens just nothing.
pinosante
pinosante
Same for me, doesn’t matter!
Upvote 0
Finally, someone with a bit of sense to make a tutorial easy to understand and follow. I tripped up on the "start the app" step, not realizing I'm supposed to "run the exe from before" and definitely didn't expect the thing to run in my browser. But I'm a dummy.
And even a dummy like me managed to get this working. THAT'S how good this person's tutorial is!
Also, thanks for this wonderful feature ^^
pinosante
pinosante
Hah, great feedback, thanks! I am happy that the guide was clear enough! Have fun with the voice :).
Upvote 0
This is a game changer. The free online TTS are limiting and I can't afford the subscription ones. Now I can create dialog to my heart's content. Thank you!
pinosante
pinosante
Great to hear!
Upvote 0
Love the idea, I currently use several tts engines now to voice my scenes. but will defiantly give this a try too. always open to new stuff.
pinosante
pinosante
Awesome, I am curious how you will like it.
Upvote 0
Fantastic. I got it set up and working in 3 minutes. It will take some time to get lines to sound the way I want them to. I would be very interested to see it refined further!
pinosante
pinosante
Awesome, I’m glad you got it to work! I will look how I can improve the model.
Upvote 0
Thanks for the great guide! I've got an Nvidia 3090, I'm gonna take a crack at the scrappy little nobody :-)
pinosante
pinosante
Nice! My 3080 had to work a little bit harder :).
Upvote 0
Back
Top Bottom