Using DeepSpeeach with 1 Epoch

kromox · August 18, 2020, 10:31am

Hello,

I did my own training using the Spanish language of Common Voice (https://commonvoice.mozilla.org/es/datasets). Version es_521h_2020-06-22.

I followed all the steps indicated in the documentation of the “Training Your Own Model” section (https://mozilla-voice-stt.readthedocs.io/en/latest/TRAINING.html)

So I started the training with the following command:

python3 DeepSpeech.py --train_files es/clips/train.csv --dev_files es/clips/dev.csv --test_files es/clips/test.csv --export_dir es/export --checkpoint_dir es/checkpoint

After about 20 days, the process stopped at Epoch 1 (frozen). I am not using GPU.

After Interrupting the process in terminal, I get these errors:

$ python3 DeepSpeech.py --train_files es/clips/train.csv --dev_files es/clips/dev.csv --test_files es/clips/test.csv --export_dir es/export --checkpoint_dir es/checkpoint
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 13 days, 4:35:14 | Steps: 48800 | Loss: 153.517502                                                                                                                           
Epoch 0 | Validation | Elapsed Time: 2:44:46 | Steps: 5378 | Loss: 153.074079 | Dataset: es/clips/dev.csv                                                                                                         
I Saved new best validating model with loss 153.074079 to: es/checkpoint/best_dev-48800
--------------------------------------------------------------------------------
Epoch 1 |   Training | Elapsed Time: 8 days, 17:19:17 | Steps: 37285 | Loss: 135.934994                                                                                                                           ^CProcess ForkPoolWorker-20:
Process ForkPoolWorker-21:
Process ForkPoolWorker-24:
Process ForkPoolWorker-23:
Process ForkPoolWorker-22:
Process ForkPoolWorker-19:
Process ForkPoolWorker-17:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Process ForkPoolWorker-18:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 335, in get
    res = self._reader.recv_bytes()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

The “es” directory contains the Common Voice files, the Export and Checkpoint folders are located there.

These are the files contained in the “es” folder:

es/
├── checkpoint
├── clips
├── dev.tsv
├── export
├── invalidated.tsv
├── other.tsv
├── reported.tsv
├── test.tsv
├── train.tsv
└── validated.tsv

The “export” directory is empty, but “checkpoint” contains:

es/checkpoint/
├── best_dev-48800.data-00000-of-00001
├── best_dev-48800.index
├── best_dev-48800.meta
├── best_dev_checkpoint
├── checkpoint
├── flags.txt
├── train-85997.data-00000-of-00001
├── train-85997.index
├── train-85997.meta
├── train-86019.data-00000-of-00001
├── train-86019.index
├── train-86019.meta
├── train-86041.data-00000-of-00001
├── train-86041.index
├── train-86041.meta
├── train-86063.data-00000-of-00001
├── train-86063.index
├── train-86063.meta
├── train-86085.data-00000-of-00001
└── train-86085.index

In the Deepspeech usage documentation (https://mozilla-voice-stt.readthedocs.io/en/latest/USING.html), I see that it uses as follows:

deepspeech --model deepspeech-0.8.1-models.pbmm --scorer deepspeech-0.8.1-models.scorer --audio my_audio_file.wav

My question is, Can I generate the necessary files to do a DeepSpeech test with my Spanish model using Epoch 0, which ended correctly? How can I do this?

For this I would like to use the checkpoint to not wait 13 days again.

Thank you very much…

othiele · August 18, 2020, 11:23am

You probably stopped the process yourself?

Export happens only if the training finishes normally, but the checkpoint saves the current best training. Search the forum/docs for how to export directly from a checkpoint, but I think it works by just giving the checkpoint and an export dir, not train/dev/test. If you don’t find anything, ask again.

Is the model from the export dir that has been modified to work quicker, but that from the export works.

You have to build your own scorer or search the forum for a Spanish scorer. The English one won’t work.

But you should run the training for 10-20 epochs to get somewhat good results. I guess you don’t get much after just one.

lissyx · August 18, 2020, 1:17pm

You won’t be able to train without GPUs on that volume of data.
Please be patient, we might be able to help on that matter soon.

kromox · August 18, 2020, 2:44pm

Hello, thank you very much for your answer.

No, I started the process on 2020-07-16 and it stopped (as in the image) 22 days later.

Today, after waiting so long for the progress time to update or do something, I stopped the process.

I understand, for now I’m just testing that Deepspeech works for Spanish, even a little bit. That’s why I do it without GPU. In case this works for what I need, I could buy a GPU to speed up the process.

Do you think there are already trained Common Voice models to download? in the same way that I downloaded the example in English

Best.

kromox · August 18, 2020, 2:46pm

Thank you very much Lissyx, I’ll be watching.

othiele · August 18, 2020, 3:56pm

I know that using the search function is not that popular, but if you did that, you would find dan.bmh’s excellent models: