https://ai.facebook.com/blog/voicebox-generative-ai-model-speech/

Introducing Voicebox: The first generative AI model for speech to generalize across tasks with state-of-the-art performance

🔊🤖 Meet Voicebox: The groundbreaking generative AI model for speech that outshines single-purpose tools. From high-quality audio clips to noise removal and style conversion, Voicebox raises the bar for speech tech! 🎙️🚀 #AI #SpeechRecognition #Innovation

Voicebox is a generative AI model for speech that can generalize across tasks with state-of-the-art performance.
It creates high-quality audio clips, synthesizes speech across six languages, and performs tasks like noise removal, content editing, and style conversion.
Voicebox is based on a method called Flow Matching and learns from raw audio and transcription, unlike previous models that require task-specific training data.
It outperforms existing models in terms of word error rates and audio similarity, making significant advancements in generative speech technology.
Voicebox is trained on over 50,000 hours of diverse speech data from public domain audiobooks and can perform tasks like text-to-speech synthesis and cross-lingual style transfer.
The model's ability to generate speech in context enables tasks like speech denoising, editing, and diverse speech sampling.
Synthetic speech generated by Voicebox shows promising results for training speech recognition models, with minimal error rate degradation compared to real speech.
Meta AI researchers are sharing their research responsibly, addressing potential risks of misuse by developing effective classifiers to distinguish between authentic speech and Voicebox-generated audio.
Voicebox represents a significant advancement in generative AI for speech and paves the way for future innovations and responsible AI development.
The blog post acknowledges the collaborative efforts of the researchers involved and invites further exploration and application of Voicebox technology.