Corpus : LIA Commonvoice
This is the LIA Commonvoice corpus dedicated for Automatic Speech Recognition (ASR) and Speaker Verification (SV). This corpus is based on French Common voice where there was a text and speaker re-annotation in order to fit more on the data reality.
- 152.361 audio segments (173 hours of audio)
- Text and speaker re-annotation in order to correct the data given by Mozilla
- Dictionary with pronunciations
- Acoustic and language models
- Recipe on Kaldi in order to reproduce the models