Skip to main content
StyleTTS 2 logo

StyleTTS 2

Text-to-speech model using style diffusion and adversarial training

About

StyleTTS 2 is a text-to-speech model that generates human-level speech by modeling styles as latent random variables through diffusion. It uses large speech language models as discriminators to improve naturalness without requiring reference speech.