I was curious about this as well given the recent development in both decent text generation models and voice generation models that you can run locally:
https://github.com/KoboldAI/KoboldAI-Client
https://github.com/neonbjb/tortoise-tts
You'd need a beefy computer to run both of these and VAM...