Python error: Segmentation fault when training

anas9011 · December 22, 2019, 2:44pm

I’m sorry, but I can’t just explore your fork to deduce and help you

That’s valid. The TL;DR of that script is:

Get all the audio files in a directory and iterate through them.
Calculate the audio file size.
Parses the first two numbers in the file name: <Chapter>_<verse>_<hash>.wav.
Get the transcript of the Chapter/Verse.
Create csv entry.

I tested out a single bad file and got the following error:

I Initializing variables...
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000                                                                                                                   Traceback (most recent call last):
  File "/Users/allabana/.virtualenvs/test-ds-1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/Users/allabana/.virtualenvs/test-ds-1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/Users/allabana/.virtualenvs/test-ds-1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Header mismatch: Expected fmt  but found JUNK
	 [[{{node DecodeWav}}]]
	 [[tower_0/IteratorGetNext]]

Which led me to a previous solution by you here

That and the stack of the graph would suggest a bogus WAV file.

I was able to run DeepSpeech successfully with a single file.

python3 -u DeepSpeech.py \
  --alphabet_config_path "$ALPHABET_PATH" \
  --lm_binary_path "$LM_BINARY_PATH" \
  --lm_trie_path "$LM_TRIE_PATH" \
  --train_files "$TRAIN_CSV_FILE" \
  --test_files "$TEST_CSV_FILE" \
  --train_batch_size 1 \
  --test_batch_size 1 \
  --n_hidden 100 \
  --epochs 35 \
  --checkpoint_dir "$checkpoint_dir" \
  "$@"

However, I get the following error when it finishes:

[scorer.cpp:77] FATAL: "(access(filename, (1<<2))) == (0)" check failed. Invalid language model path

and I’m assuming that’s because I need to provide the binary and trie path.