Training on Common Voice Error - Error after importing

Hi

Hope you are well

I am currently having an error of training the model based on common voice.

Not sure why I cant pick up the extracted csv files after loading the model.

Please can I ask as to where this is going wrong:

Loading Code
Loading TSV file: /users/chabani/Desktop/Deepspeech/en/train.tsv
Saving new DeepSpeech-formatted CSV file to: /users/chabani/Desktop/Deepspeech/en/clips/train.csv
100% completed
Imported 12123 samples.
Skipped 103 samples that failed on transcript validation.

Loading TSV file: /users/chabani/Desktop/Deepspeech/en/test.tsv
Saving new DeepSpeech-formatted CSV file to: /users/chabani/Desktop/Deepspeech/en/clips/test.csv
100% completed
Imported 6810 samples.
Skipped 360 samples that failed on transcript validation.
Skipped 206 samples that were longer than 10 seconds.
Final amount of imported audio: 10:21:17.

Loading TSV file: /users/chabani/Desktop/Deepspeech/en/dev.tsv
Saving new DeepSpeech-formatted CSV file to: /users/chabani/Desktop/Deepspeech/en/clips/dev.csv
100% completed
Imported 6940 samples.
Skipped 3 samples that failed upon conversion.
Skipped 326 samples that failed on transcript validation.
Skipped 73 samples that were longer than 10 seconds.
Final amount of imported audio: 9:25:57.

image Training Code
(deepspeech-train-venv) Chabanis-MBP:DeepSpeech chabani$ ./DeepSpeech.py --train_files …/users/chabani/Desktop/Deepspeech/en/clips/train.csv --dev_files …/users/chabani/Desktop/Deepspeech/en/clips/dev.csv --test_files /users/chabani/Desktop/Deepspeech/en/clips/test.csv

Error:
FileNotFoundError: [Errno 2] File b’…/users/chabani/Desktop/Deepspeech/en/clips/train.csv’ does not exist: b’…/users/chabani/Desktop/Deepspeech/en/clips/train.csv’

Kind Regards

@JohnWayne It seems your paths are wrong. Why do you have the leading ../ ? Also, please note you won’t be able to do any seriously intensive training on macOS, since there is no TensorFlow CUDA support on that platform.

Try to make an os.path.exists to make sure that those paths are written correctly.

@lissyx and @reyxuan , okay will get the paths correct.

With regards to training, I got linuxOS on virtual machine. I currently have an issue with hardware cause I have 8gb of RAM -
Wanted to ask whether to upgrade the RAM to 16 or get an external GPU? Which is the better option.

Thanks

I guess that if you are training a real model, neither are good options. I fear external GPU might behave erratically, since we’ve got reports of exotic hardware configuration (related PCIe) that impacts a lot performances. Also, 16GB is not a lot as well, but the more RAM the better.

@lissyx Isn’t an RTX for example powerful enough for a real model?

Okay, thanks for the suggestion. Will also take into consideration the question posed if an RTX is also suitable.

Lastly, I have attempted to import the model for training on Ubuntu 19.04 however I have this error appear, been trying to correct it but to no avail:

Code

(deepspeech-train-venv) chabani@chabani-VirtualBox:~/DeepSpeech/DeepSpeech$ bin/import_cv2.py --filter_alphabet alphabet.txt /media/sf_en/

Loading TSV file: /media/sf_en/train.tsv
Saving new DeepSpeech-formatted CSV file to: /media/sf_en/clips/train.csv
Importing mp3 files…

Traceback (most recent call last):
File “bin/import_cv2.py”, line 166, in
_preprocess_data(PARAMS.tsv_dir, AUDIO_DIR, label_filter_fun, PARAMS.space_after_every_character)
File “bin/import_cv2.py”, line 43, in _preprocess_data
_maybe_convert_set(input_tsv, audio_dir, label_filter, space_after_every_character)
File “bin/import_cv2.py”, line 100, in _maybe_convert_set
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)

An RTX is useless if it’s not able to be fed at pace. We’ve got reports from people using specific PCIe configurations with huge slowdown depending on TensorFlow / CUDA / our model changes (we don’t know what impacts). So i’m just warning people.

Also, a real model really needs more than just one RTX GPU if you have a serious amount of data. WIth ~250h of French, it takes ~4h to train a model on 2x RTX 2080 Ti.

Please make an effort and use proper code formatting. Some important informations might be mangled by the markdown parsing.

That’s indeed the case, your error is incomplete, I cannot help you.

@lissyx Thanks for the warning. I think my PCIe configuration is fine. For the moment I have ~250h and an RTX 2080 Super.

WIth ~250h of French, it takes ~4h to train a model on 2x RTX 2080 Ti.

Do you mean training a model or training an epoch?

One model. Please check https://github.com/Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train.fr for details.

1 Like

Apologies for that:

The code:
(deepspeech-train-venv) chabani@chabani-VirtualBox:~/DeepSpeech/DeepSpeech$ bin/import_cv2.py --filter_alphabet alphabet.txt /media/sf_en/

/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint8 = np.dtype([(“qint8”, np.int8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint8 = np.dtype([(“quint8”, np.uint8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint16 = np.dtype([(“qint16”, np.int16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint16 = np.dtype([(“quint16”, np.uint16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint32 = np.dtype([(“qint32”, np.int32, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
np_resource = np.dtype([(“resource”, np.ubyte, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint8 = np.dtype([(“qint8”, np.int8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint8 = np.dtype([(“quint8”, np.uint8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint16 = np.dtype([(“qint16”, np.int16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_quint16 = np.dtype([(“quint16”, np.uint16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
_np_qint32 = np.dtype([(“qint32”, np.int32, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or ‘1type’ as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / ‘(1,)type’.
np_resource = np.dtype([(“resource”, np.ubyte, 1)])
Loading TSV file: /media/sf_en/train.tsv
Saving new DeepSpeech-formatted CSV file to: /media/sf_en/clips/train.csv
Importing mp3 files…

Traceback (most recent call last):
File “bin/import_cv2.py”, line 166, in
_preprocess_data(PARAMS.tsv_dir, AUDIO_DIR, label_filter_fun, PARAMS.space_after_every_character)
File “bin/import_cv2.py”, line 43, in _preprocess_data
_maybe_convert_set(input_tsv, audio_dir, label_filter, space_after_every_character)
File “bin/import_cv2.py”, line 100, in _maybe_convert_set
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
TypeError: init() got an unexpected keyword argument ‘max_value’

Hope the posting is appropriate. This is the output I receive from terminal

Please use proper code formatting, this is unreadable and Markdown parser is eating important Python informations.

@JohnWayne Please use ``` your code ```.

Okay, hope its readable now.

(deepspeech-train-venv) chabani@chabani-VirtualBox:~/DeepSpeech/DeepSpeech$ bin/import_cv2.py --filter_alphabet alphabet.txt /media/sf_en/

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/chabani/tmp/deepspeech-train-venv/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Loading TSV file:  /media/sf_en/train.tsv
Saving new DeepSpeech-formatted CSV file to:  /media/sf_en/clips/train.csv
Importing mp3 files...
Traceback (most recent call last):
  File "bin/import_cv2.py", line 166, in <module>
    _preprocess_data(PARAMS.tsv_dir, AUDIO_DIR, label_filter_fun, PARAMS.space_after_every_character)
  File "bin/import_cv2.py", line 43, in _preprocess_data
    _maybe_convert_set(input_tsv, audio_dir, label_filter, space_after_every_character)
  File "bin/import_cv2.py", line 100, in _maybe_convert_set
    bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
TypeError: __init__() got an unexpected keyword argument 'max_value```

Placed the inverted commas as @reyxuan suggested. Quite different on linux from mac. Hope it works now

No it’s still the same, you have used the wrong ones.

@JohnWayne you may have missed it, but you can also typically edit a post.
That way you avoid a whole repost; just need to go in and type the correct character. My guess is that you’ve somehow got inverted commas that are “smart” (ie adjusted for opening and closing quotes) and those are not the ones to use :slightly_smiling_face:

Oh, thanks for the help and @reyxuan. Got the formatting done correctly

1 Like

Have you properly setup your virtualenv? It looks like you have an incompatible progressbar

I tried to set up my virtualenv again. Still had the same issue. Some user on stackoverflow pointed out Progressbar2 deals with max_value while Progressbar in GNU/Linux accepts maxval.

It works now after install Progressbar2.

Thank you for the help