Fine Tuning on a small dataset containing different alphabet

Mo3geza · July 21, 2019, 9:06am

I am working on a specific dataset containing numbers and . and ’ so i made the alphabet.txt contains on the characters a->z and 0->9 and . and ’ then i tired to run on the pre-trained model but it gave me that output with this command

python DeepSpeech.py --train_files …/tts/train.csv --train_batch_size 24 --test_files …/tts/test.csv --test_batch_size 48 --dev_files …/tts/dev.csv --dev_batch_size 48 --checkpoint_dir …/models/checkpoint/ --export_dir models/ --epoch -3 --learning_rate 0.0001 --dropout_rate 0.15 --lm_alpha 0.75 --lm_beta 1.85

Output:
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E
E Assign requires shapes of both tensors to match. lhs shape= [2048,40] rhs shape= [2048,29]
E [[node save/Assign_32 (defined at DeepSpeech.py:418) ]]
E [[node save/restore_all/NoOp_1 (defined at DeepSpeech.py:418) ]]
E
E The checkpoint in …/models/checkpoint/model.v0.5.1 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of …/models/checkpoint/model.v0.5.1.

so i removed the --checkpoint_dir from the command and tried to run it again and it give me that error

Output
WARNING:tensorflow:From /home/reasearch/anaconda3/envs/tf13/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
tf.py_function, which takes a python function which manipulates tf eager
tensors instead of numpy arrays. It’s easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means tf.py_functions can use accelerators such as GPUs as well as
being differentiable using a gradient tape.

WARNING:tensorflow:From /home/reasearch/anaconda3/envs/tf13/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:358: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/reasearch/anaconda3/envs/tf13/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/lstm_ops.py:696: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
I Initializing variables…
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 Segmentation fault (core dumped)

I don’t know why this happening after i tried more than once De-buging i feel lost

Any ideas ?

alchemi5t · July 22, 2019, 2:18am

The first error happens because the pretrained models are trained to predict 29 labels and you changed the alphabets and want the same model to predict 40 labels. That’s not gonna work.

Second, if you remove the checkpoint dir from the command line args,doesn’t mean you didn’t give it a checkpoint dir for loading a checkpoint.Check util/flags.py. you’ll find its default value there. See if there are any problems there. Also, checkpoint dir is where you’ll save your checkpoints too. this might not be the issue, just wanted to point out for clarity

Later when you manage to get it working, you’ll have issues with the output being garbage because you didn’t regenerate the trie. Heads up.
this here might be the issue, since the decoder requires trie. Try generating a trie with your new language model and alphabet.txt.

Mo3geza · July 22, 2019, 2:02pm

I solved this problem by converting any number digit to words like 29 to Twenty nine , and i am trying to fine tune the pretrained model with my own data and the libre speech dev-clean data as training files but i got worse WER 0.88 on libre speech test-clean data , it’s almost 10 times
P.S the pretrained model without any fintuning got WER 0.082 on LibreSpeech test-clean data
should i train from scratch ?

alchemi5t · July 22, 2019, 3:51pm

Did you fine tune a pretrain model with 10^-4 LR? That’s not good. Go with a lower LR(10^-6).

Regenerate the trie so you dont have to manipulate the data mate.