The National Research Council (NRC) is currently involved in a three-year research project on speech synthesis (text-to-speech) technology. In an effort to support Indigenous language revitalization efforts, they have developed their own custom version of the open-source language collection tool Common Voice.
Common Voice is an open-source project that aims to create a freely available database of human voices that can be used to train machine learning models for speech technology. The project is designed to help increase the diversity of voices available for these models, particularly for underrepresented languages and accents – something that could be vital in supporting Indigenous languages, who see fewer and fewer native speakers each year.
With this custom version, communities involved in the project are able to determine their own data management policy instead of adopting Mozilla’s policy to make all voices freely available. We were brought on board to build a reproducible deployment process and assist in hosting the application.