Python Encoder Preprocess.py

Scaling Multilingual Visual Speech Recognition

We introduce MultiVSR - a large-scale dataset for multilingual visual speech recognition. MultiVSR comprises ~12,000 hours of video data paired with word-aligned transcripts from 13 languages. We ...

GitHub

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Scaling Multilingual Visual Speech Recognition

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Trending now