About the sample rate

Adal93 · June 25, 2022, 3:58am

Hello everybody

The audio files sample rates in each language of Mozilla, Are they equal value? And; What is the value?

bozden · May 21, 2024, 6:00am

Old version: They are all 48 kHz.
New: Actually, we recently found out it was changed to 32 KHz in 2020 - to fix some server problems.

Adal93 · June 25, 2022, 3:51pm

Thanks

How can I lower this sample rate to 16 khz?

bozden · May 21, 2024, 6:01am

You can lower it easily with offline tools like ffmpeg or any other sound processing utility.
The dataset set is [old: 48 kHz] and that cannot be changed. NEW: In 2020, it was changed to 32 KHz for new recordings.

48 kHz is the highest commonly available sampling rate on common devices. You can downsample without much information loss. If it were e.g. 16 kHz, you would have trouble upsampling due to Nyquist Theorem.

bozden · June 25, 2022, 4:09pm

BTW, most Voice AI libraries work on WAV files. The CV datasets include MP3 files. So they must already be preprocessed. This is the best place to down-sample also…

For example, in Coqui STT (the new Deepspeech), it is done here, using sox:

github.com

coqui-ai/STT/blob/main/bin/import_cv2.py

#!/usr/bin/env python
"""
Broadly speaking, this script takes the audio downloaded from Common Voice
for a certain language, in addition to the *.tsv files output by CorporaCreator,
and the script formats the data and transcripts to be in a state usable by
train.py
Use "python3 import_cv2.py -h" for help
"""
import csv
import os
import subprocess
import unicodedata
from multiprocessing import Pool

import progressbar
import sox
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR
from coqui_stt_training.util.importers import (
    get_counter,

This file has been truncated. show original

Adal93 · June 25, 2022, 4:32pm

Thank you so much …

Adal93 · June 25, 2022, 7:22pm

Do you know how this is possible using the MATLAB program ?

bozden · June 25, 2022, 7:52pm

No, but found this:
https://www.mathworks.com/help/signal/ug/changing-signal-sample-rate.html

Adal93 · June 26, 2022, 3:33am

Thank you
It’s a good reference.