Training on Common Voice Error - Error after importing

An RTX is useless if it’s not able to be fed at pace. We’ve got reports from people using specific PCIe configurations with huge slowdown depending on TensorFlow / CUDA / our model changes (we don’t know what impacts). So i’m just warning people.

Also, a real model really needs more than just one RTX GPU if you have a serious amount of data. WIth ~250h of French, it takes ~4h to train a model on 2x RTX 2080 Ti.

Please make an effort and use proper code formatting. Some important informations might be mangled by the markdown parsing.

That’s indeed the case, your error is incomplete, I cannot help you.

@lissyx Thanks for the warning. I think my PCIe configuration is fine. For the moment I have ~250h and an RTX 2080 Super.

WIth ~250h of French, it takes ~4h to train a model on 2x RTX 2080 Ti.

Do you mean training a model or training an epoch?

One model. Please check https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train.fr for details.

1 Like

Apologies for that:

The code:
(deepspeech-train-venv) chabani@chabani-VirtualBox:~/DeepSpeech/DeepSpeech$ bin/import_cv2.py --filter_alphabet alphabet.txt /media/sf_en/

/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint8 = np.dtype([(“qint8”, np.int8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint8 = np.dtype([(“quint8”, np.uint8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint16 = np.dtype([(“qint16”, np.int16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint16 = np.dtype([(“quint16”, np.uint16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint32 = np.dtype([(“qint32”, np.int32, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
np_resource = np.dtype([(“resource”, np.ubyte, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint8 = np.dtype([(“qint8”, np.int8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint8 = np.dtype([(“quint8”, np.uint8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint16 = np.dtype([(“qint16”, np.int16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint16 = np.dtype([(“quint16”, np.uint16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint32 = np.dtype([(“qint32”, np.int32, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
np_resource = np.dtype([(“resource”, np.ubyte, 1)])
Loading TSV file: /media/sf_en/train.tsv
Saving new DeepSpeech-formatted CSV file to: /media/sf_en/clips/train.csv
Importing mp3 files…

Traceback (most recent call last):
File “bin/import_cv2.py”, line 166, in
_preprocess_data(PARAMS.tsv_dir, AUDIO_DIR, label_filter_fun, PARAMS.space_after_every_character)
File “bin/import_cv2.py”, line 43, in _preprocess_data
_maybe_convert_set(input_tsv, audio_dir, label_filter, space_after_every_character)
File “bin/import_cv2.py”, line 100, in _maybe_convert_set
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
TypeError: init() got an unexpected keyword argument ‘max_value’

Hope the posting is appropriate. This is the output I receive from terminal

Please use proper code formatting, this is unreadable and Markdown parser is eating important Python informations.

@JohnWayne Please use ``` your code ```.

Okay, hope its readable now.

(deepspeech-train-venv) chabani@chabani-VirtualBox:~/DeepSpeech/DeepSpeech$ bin/import_cv2.py --filter_alphabet alphabet.txt /media/sf_en/

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Loading TSV file:  /media/sf_en/train.tsv
Saving new DeepSpeech-formatted CSV file to:  /media/sf_en/clips/train.csv
Importing mp3 files...
Traceback (most recent call last):
  File "bin/import_cv2.py", line 166, in <module>
    _preprocess_data(PARAMS.tsv_dir, AUDIO_DIR, label_filter_fun, PARAMS.space_after_every_character)
  File "bin/import_cv2.py", line 43, in _preprocess_data
    _maybe_convert_set(input_tsv, audio_dir, label_filter, space_after_every_character)
  File "bin/import_cv2.py", line 100, in _maybe_convert_set
    bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
TypeError: __init__() got an unexpected keyword argument 'max_value```

Placed the inverted commas as @reyxuan suggested. Quite different on linux from mac. Hope it works now

No it’s still the same, you have used the wrong ones.

@JohnWayne you may have missed it, but you can also typically edit a post.
That way you avoid a whole repost; just need to go in and type the correct character. My guess is that you’ve somehow got inverted commas that are “smart” (ie adjusted for opening and closing quotes) and those are not the ones to use :slightly_smiling_face:

Oh, thanks for the help and @reyxuan. Got the formatting done correctly

1 Like

Have you properly setup your virtualenv? It looks like you have an incompatible progressbar

I tried to set up my virtualenv again. Still had the same issue. Some user on stackoverflow pointed out Progressbar2 deals with max_value while Progressbar in GNU/Linux accepts maxval.

It works now after install Progressbar2.

Thank you for the help

Exactly like I said: requirements.txt:progressbar2, which confirms you wrongly installed your virtualenv.

my apologies, didnt read that properly in the documentation.

Lastly, my terminal hangs when importing the mp3. Have tried the process multiple times and hangs at the same point.

Any idea why this would happen? Didnt have an issue on the MacOS terminal.

Loading TSV file:  /media/sf_en/train.tsv
Saving new DeepSpeech-formatted CSV file to:  /media/sf_en/clips/train.csv
Importing mp3 files...
Progress |#################################################### |  98% completedWriting CSV file for DeepSpeech.py as:  /media/sf_en/clips/train.csv
Progress |#####################################################| 100% completed
Imported 12123 samples.
Skipped 103 samples that failed on transcript validation.
Skipped 12 samples that were longer than 10 seconds.
Final amount of imported audio: 14:52:21.
Loading TSV file:  /media/sf_en/test.tsv
Saving new DeepSpeech-formatted CSV file to:  /media/sf_en/clips/test.csv
Importing mp3 files...
Progress |#################################################### |  98% completedWriting CSV file for DeepSpeech.py as:  /media/sf_en/clips/test.csv
Progress |#####################################################| 100% completed
Imported 6810 samples.
Skipped 360 samples that failed on transcript validation.
Skipped 206 samples that were longer than 10 seconds.
Final amount of imported audio: 10:21:17.
Loading TSV file:  /media/sf_en/dev.tsv
Saving new DeepSpeech-formatted CSV file to:  /media/sf_en/clips/dev.csv
Importing mp3 files...
Progress |###########################################          |  82% completed```

Stops at the same % all the time.

No idea, sorry, this is not something we experienced.

1 Like

@JohnWayne Try to print which mp3 is being read so you know which on one the script is failing. Just to make sure that the problem is not the file.

Thank you, issue is sorted now.

Could you please share more details ? It can be useful if others run into the same issue.

I rebooted the system and ran the code again. It worked properly afterwards.