Make your own simple model

Freegs_Box · February 23, 2021, 3:30pm

should i pull the image from here https://hub.docker.com/r/mozilla/deepspeech-train/tags?page=1&ordering=last_updated

kreid · February 23, 2021, 9:53pm

So taking things one step at a time,

The DeepSpeech version that is in Docker is the same, but it’s packaged with other dependencies required for training.
In the PlayBook you are instructed to run the command $ docker pull mozilla/deepspeech-train:latest. This pulls down a Docker image.
In the PlayBook, there is a line that states:

You will now see the mozilla/deepspeech-train image when you run the command docker image ls:

$ docker image ls
REPOSITORY                             TAG              IMAGE ID       CREATED         SIZE
mozilla/deepspeech-train               latest           7cdc0bb1fe2a   7 days ago      4.77GB

In the PlayBook, this command shows that the IMAGE_ID of this image is 7cdc0bb1fe2a. Your image will have a different IMAGE_ID and that is why you’re receiving the error message

Unable to find image '7cdc0bb1fe2a:latest' locally.

You need to run the command docker image ls to find the IMAGE_ID of your image and substitute it. For example, if the IMAGE_ID of your image is 501979231a7b then you would run:

$ docker run  -it \
  --entrypoint /bin/bash \
  --name deepspeech-training \
  --gpus all \
  --mount type=bind,source="$(pwd)"/deepspeech-data,target=/DeepSpeech/deepspeech-data \
  501979231a7b

Freegs_Box · February 24, 2021, 8:36am

i have pulled the Docker image.

root@localhost:~# docker image ls
REPOSITORY                 TAG                 IMAGE ID            CREATED             SIZE
mozilla/deepspeech-train   latest              8cdc37f75f1d        11 days ago         4.77GB

that is my image ID : 8cdc37f75f1d

$ docker run  -it \
      --entrypoint /bin/bash \
      --name deepspeech-training \
      --mount type=bind,source="$(pwd)"/deepspeech-data,target=/DeepSpeech/deepspeech-data \
      8cdc37f75f1d

as you noticed I have removed --gpus all \ as I don’t have a GPU on this server.

root@localhost:~# docker run  -it \
>   --entrypoint /bin/bash \
>   --name deepspeech-training \
>   --mount type=bind,source="$(pwd)"/deepspeech-data,target=/DeepSpeech/deepspeech-data \
>   8cdc37f75f1d

________                               _______________
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ /
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...

root@6accbb76ff97:/DeepSpeech#

what should I do now?

kreid · February 24, 2021, 10:48am

Follow the next steps in the PlayBook.

You are up to here

Freegs_Box · February 24, 2021, 3:47pm

i have downloaded the german language dataset containing of 4Gb, and unzipped it in the deepspeech-data directory and it was visible in the docker container:

root@294bb9e108fd:/DeepSpeech# ls -la deepspeech-data/
total 72144
drwxr-xr-x 3 root root    36864 Feb 24 15:31 .
drwxr-xr-x 1 root root     4096 Feb 24 15:04 ..
drwxrwxr-x 2 1000 1000 27156480 Feb 24 14:38 clips
-rw-rw-r-- 1 1000 1000   735847 Feb 25  2019 dev.tsv
-rw-rw-r-- 1 1000 1000  1837461 Feb 25  2019 invalidated.tsv
-rw-rw-r-- 1 1000 1000       62 Feb 25  2019 other.tsv
-rw-rw-r-- 1 1000 1000   725953 Feb 25  2019 test.tsv
-rw-rw-r-- 1 1000 1000   868206 Feb 25  2019 train.tsv
-rw-rw-r-- 1 1000 1000 42490887 Feb 25  2019 validated.tsv
root@294bb9e108fd:/DeepSpeech#

when I run the importer [import_cv2.py] I run into this issue:

    root@294bb9e108fd:/DeepSpeech# bin/import_cv2.py deepspeech-data/
/bin/sh: 1: sox: not found
SoX could not be found!

    If you do not have SoX, proceed here:
     - - - http://sox.sourceforge.net/ - - -

    If you do (or think that you should) have SoX, double-check your
    path variables.

Loading TSV file:  /DeepSpeech/deepspeech-data/test.tsv
Importing mp3 files...
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
This install of SoX cannot process .mp3 files.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "bin/import_cv2.py", line 65, in one_sample
    _maybe_convert_wav(mp3_filename, wav_filename)
  File "bin/import_cv2.py", line 185, in _maybe_convert_wav
    transformer.build(mp3_filename, wav_filename)
  File "/usr/local/lib/python3.6/dist-packages/sox/transform.py", line 594, in build
    input_filepath, input_array, sample_rate_in
  File "/usr/local/lib/python3.6/dist-packages/sox/transform.py", line 496, in _parse_inputs
    input_format['channels'] = file_info.channels(input_filepath)
  File "/usr/local/lib/python3.6/dist-packages/sox/file_info.py", line 82, in channels
    output = soxi(input_filepath, 'c')
  File "/usr/local/lib/python3.6/dist-packages/sox/core.py", line 149, in soxi
    stderr=subprocess.PIPE
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'sox': 'sox'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "bin/import_cv2.py", line 221, in <module>
    main()
  File "bin/import_cv2.py", line 216, in main
    _preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)
  File "bin/import_cv2.py", line 172, in _preprocess_data
    set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, space_after_every_character)
  File "bin/import_cv2.py", line 127, in _maybe_convert_set
    for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
FileNotFoundError: [Errno 2] No such file or directory: 'sox': 'sox'
root@294bb9e108fd:/DeepSpeech#

lissyx · February 24, 2021, 3:48pm

this container is really a bare version, you need to add dependencies for import_cv2.py, like sox.

Freegs_Box · February 24, 2021, 4:01pm

great, it works after i installed sox.
but I see this warning message :

WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.

should i abort or keep runing?

root@294bb9e108fd:/DeepSpeech# bin/import_cv2.py deepspeech-data/
Loading TSV file:  /DeepSpeech/deepspeech-data/test.tsv
Importing mp3 files...
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Progress |################################################################################################################################################################# |  99% completed
mported 2267 samples.
Skipped 2 samples that were longer than 10 seconds.
Final amount of imported audio: 2:51:31 from 2:51:51.
Saving new DeepSpeech-formatted CSV file to:  /DeepSpeech/deepspeech-data/clips/test.csv
Writing CSV file for DeepSpeech.py as:  /DeepSpeech/deepspeech-data/clips/test.csv
Progress |##################################################################################################################################################################| 100% completed
Loading TSV file:  /DeepSpeech/deepspeech-data/dev.tsv
Importing mp3 files...
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Progress |################################################################################################################################################################# |  99% completed
mported 2269 samples.
Final amount of imported audio: 2:41:06 from 2:41:06.
Saving new DeepSpeech-formatted CSV file to:  /DeepSpeech/deepspeech-data/clips/dev.csv
Writing CSV file for DeepSpeech.py as:  /DeepSpeech/deepspeech-data/clips/dev.csv
Progress |##################################################################################################################################################################| 100% completed
Loading TSV file:  /DeepSpeech/deepspeech-data/train.tsv
Importing mp3 files...
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.

lissyx · February 24, 2021, 4:02pm

You should read the doc and understand what you are doing.

Freegs_Box · February 24, 2021, 4:02pm

it stopped by itself!!!

root@294bb9e108fd:/DeepSpeech# bin/import_cv2.py deepspeech-data/
Loading TSV file:  /DeepSpeech/deepspeech-data/test.tsv
Importing mp3 files...
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Progress |################################################################################################################################################################# |  99% completed
mported 2267 samples.
Skipped 2 samples that were longer than 10 seconds.
Final amount of imported audio: 2:51:31 from 2:51:51.
Saving new DeepSpeech-formatted CSV file to:  /DeepSpeech/deepspeech-data/clips/test.csv
Writing CSV file for DeepSpeech.py as:  /DeepSpeech/deepspeech-data/clips/test.csv
Progress |##################################################################################################################################################################| 100% completed
Loading TSV file:  /DeepSpeech/deepspeech-data/dev.tsv
Importing mp3 files...
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Progress |################################################################################################################################################################# |  99% completed
mported 2269 samples.
Final amount of imported audio: 2:41:06 from 2:41:06.
Saving new DeepSpeech-formatted CSV file to:  /DeepSpeech/deepspeech-data/clips/dev.csv
Writing CSV file for DeepSpeech.py as:  /DeepSpeech/deepspeech-data/clips/dev.csv
Progress |##################################################################################################################################################################| 100% completed
Loading TSV file:  /DeepSpeech/deepspeech-data/train.tsv
Importing mp3 files...
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Progress |####################################################################################################################################################             |  92% completedmultiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "bin/import_cv2.py", line 65, in one_sample
    _maybe_convert_wav(mp3_filename, wav_filename)
  File "bin/import_cv2.py", line 185, in _maybe_convert_wav
    transformer.build(mp3_filename, wav_filename)
  File "/usr/local/lib/python3.6/dist-packages/sox/transform.py", line 594, in build
    input_filepath, input_array, sample_rate_in
  File "/usr/local/lib/python3.6/dist-packages/sox/transform.py", line 493, in _parse_inputs
    file_info.validate_input_file(input_filepath)
  File "/usr/local/lib/python3.6/dist-packages/sox/file_info.py", line 249, in validate_input_file
    "input_filepath {} does not exist.".format(input_filepath)
OSError: input_filepath deepspeech-data/clips/8d31fe1fe37219527930900426bf0614eb87342f3443ab33540b47900fa6f993f2a78e3699ef73714272fa93b9d90854c010f9d1ad7977ec97c7f2aa2619e64d.mp3 does not exist.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "bin/import_cv2.py", line 221, in <module>
    main()
  File "bin/import_cv2.py", line 216, in main
    _preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)
  File "bin/import_cv2.py", line 172, in _preprocess_data
    set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, space_after_every_character)
  File "bin/import_cv2.py", line 127, in _maybe_convert_set
    for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
OSError: input_filepath deepspeech-data/clips/8d31fe1fe37219527930900426bf0614eb87342f3443ab33540b47900fa6f993f2a78e3699ef73714272fa93b9d90854c010f9d1ad7977ec97c7f2aa2619e64d.mp3 does not exist.
Progress |##################################################################################################################################################################| 100% completed
Progress |##################################################################################################################################################################| 100% completed
Progress |#################################################################################################################################################################| 100% completed
root@294bb9e108fd:/DeepSpeech#**strong text**

lissyx · February 24, 2021, 4:03pm

Read your error messages, please.

And search on our docs: Search — DeepSpeech 0.9.3 documentation

Freegs_Box · February 24, 2021, 4:03pm

i did read the docs, and im trying to test to have a better understanding of how this works. and with your help that would be great

lissyx · February 24, 2021, 4:05pm

So explain to us what is not clear in the docs about validate_label_locale. If you don’t share feedback, we can’t know that you found but failed to understand and we can’t improve. This is not good for either you or the project.

Utpal_Rudra · July 12, 2021, 5:58am

@lissyx hello
I am working with Bengali speech to text. I have a data set which is not similar to the common voice data set. The dataset contain flac files and corresponding text in a csv. I took some (2518) flac files, convert them to mp3 and created three separate train(85%), test(5%) and dev(10%) tsv files. The tsv files contain path(filenames), client_id(some random numbers), sentence(the actual text) as of commonvoice tsv. The directory is as like as commonvoice directory.
Then, when I try to create the corresponding train, test and dev files using import_cv2.py(that also responsible for converting flac files to mp3), it says,
OSError: input_filepath dataset/clips/(a_random_filename).mp3 does not exist.
I check in the directory and there the file is located.
So, I remove the file from both the directory and the tsv. Then it shows the same problem for another file that is also located in the directory.
Would please give a hand?

lissyx · July 12, 2021, 8:33am

This is a problem in your dataset. I cant help and I don’t have time (I’m nor working on this anymore).

othiele · July 12, 2021, 1:01pm

If you want help, check out the successor to this project: coqui.

Check this post.