Hi everyone,
It took me another long time, but I've been cleaning up the source audio, using some custom scripts I created myself. Long story short, I fixed the "pitch' of the source material to all be in the same range, and the "breath" noises due to inhaling air before speaking have been removed. After that I retrained the model on this improved source data. It improved the quality of the model by a lot.
Improvements:
- Less raspy, cleaner audio
- No more sudden pitch drops (male sounding voice)
- Doing two sentences in one prompt (sentences separated by a . ) works a lot better (the old model would generate garbage if you did that)
Installation instructions:
- See the original instructions
- Download the "VAM Voice Model v2.zip" from mega unzip it in the "models" directory. You should end up with YOUR_PATH/data/models/VAM Voice Model v2.0/checkpoint_510426