StyleTTS 2
Text-to-speech model using style diffusion and adversarial training
About
StyleTTS 2 is a text-to-speech model that generates human-level speech by modeling styles as latent random variables through diffusion. It uses large speech language models as discriminators to improve naturalness without requiring reference speech.