KeyError: 'wav_filename' on training DeepSpeech

karthikeyank · December 5, 2018, 5:47am

hi I am getting KeyError: “wav_filename” error while trying to train / tune the DeepSpeech 0.3.0 model
below is the screenshot of the issue…

and the CSV file looks like this and i have all wav files in their respective directories… the directories looks like train → train.csv, wav files || test->test.csv,wav files || dev->dev.csv,wav files…

can anyone please help me with this…
thanks…

karthikeyank · December 5, 2018, 6:20am

sorry…it was my mistake…
i didn’t add the ‘wav_filename,wav_filesize,transcript’ on top of the csv file now its fine…

karthikeyank · December 5, 2018, 6:48am

but i got a new error… AttributeError: ‘Series’ object has no attribute ‘transcript’…

can you please have a look at this @lissyx @reuben

karthikeyank · December 5, 2018, 12:56pm

the above one is also solved and got a new error…
undefined symbol: _ZN10tensorflow12OpDefBuilderC1EN4absl11string_viewE
this is my code on training the model.

python DeepSpeech.py --train_files /home/userk/DeepSpeechPro/datasets/train/train.csv --dev_files /home/userk/DeepSpeechPro/datasets/dev/dev.csv --test_files /home/userk/DeepSpeechPro/datasets/test/test.csv --n_hidden 2048 --epoch -3 --export_dir /home/userk/DeepSpeechPro/tuned_model/models/ --lm_binary_path /home/userk/DeepSpeechPro/native_client/models/lm.binary --checkpoint_dir /mnt/c/users/karthikeyan/downloads/modelss/DeepSpeech/ --decoder_library_path /home/userk/DeepSpeechPro/native_client/bin/libctc_decoder_with_kenlm.so --alphabet_config_path /home/userk/DeepSpeechPro/native_client/models/alphabet.txt --lm_trie_path /home/userk/DeepSpeechPro/native_client/models/trie --learning_rate 0.0001

and the issue i am facing is.

Traceback (most recent call last):
File “DeepSpeech.py”, line 1959, in
tf.app.run(main)
File “/home/userk/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1910, in main
initialize_globals()
File “DeepSpeech.py”, line 330, in initialize_globals
custom_op_module = tf.load_op_library(FLAGS.decoder_library_path)
File “/home/userk/.local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py”, line 60, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/userk/DeepSpeechPro/native_client/bin/libctc_decoder_with_kenlm.so: undefined symbol: _ZN10tensorflow12OpDefBuilderC1EN4absl11string_viewE

system config:
windows subsystem for linux: ubuntu-cpu

thanks

lissyx · December 5, 2018, 12:59pm

Can you please avoid using screenshots ?

lissyx · December 5, 2018, 1:00pm

This is classically documentd mismatch of libctc_decoder_with_kenlm.so against TensorFlow python package. WIthout more context on your setup, hard to help.

karthikeyank · December 5, 2018, 1:41pm

okay actually i am trying to fine tune the deepspeech 0.3.0 model, the data set is around 2hr 19 mins length. my system configuration is

4 GB RAM and intel core i3 CPU.
tensorflow version = 1.12.0rc2
python 2.7
installed all the required files which are recommended by deepspeech repo requirement.txt
Platform: Windows Subsystem for Linux : Ubuntu 16.04

my directory structure is…

home
- userk
  - DeepSpeechPro
    - DeepSpeech - // this is the core project directory
    - native_client
      - bin
        - deepspeech
        - generate_trie
        - libctc_decoder_with_kenlm.so
        - libdeepspeech.so
        - LICENSE
        - native_client.tar.xz
        - README.mozilla
      - models
        - output_graph.pb
        - output_graph.pbmm
        - other files
    - tuned_model // export_directory
    - datasets // the train, test, dev sets lies here

I am trying to train the model from /home/userk/DeepSpeechPro/DeepSpeech/

-> the command recommended by deepspeech readme file…

python DeepSpeech.py
–train_files /home/userk/DeepSpeechPro/datasets/train/train.csv
–dev_files /home/userk/DeepSpeechPro/datasets/dev/dev.csv
–test_files /home/userk/DeepSpeechPro/datasets/test/test.csv
–n_hidden 2048
–epoch -3
–learning_rate 0.0001

produces the following error…

ERROR: The decoder library file does not exist. Make sure you have downloaded or built the native client binaries and pass the appropriate path to the binaries in the --decoder_library_path parameter.
Traceback (most recent call last):
File “DeepSpeech.py”, line 1959, in
tf.app.run(main)
File “/home/userk/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1910, in main
initialize_globals()
File “DeepSpeech.py”, line 330, in initialize_globals
custom_op_module = tf.load_op_library(FLAGS.decoder_library_path)
File “/home/userk/.local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py”, line 60, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: native_client/libctc_decoder_with_kenlm.so: cannot open shared object file: No such file or directory

-> the command which I saw from the mozilla discourse forum…

python DeepSpeech.py
–train_files /home/userk/DeepSpeechPro/datasets/train/train.csv
–dev_files /home/userk/DeepSpeechPro/datasets/dev/dev.csv
–test_files /home/userk/DeepSpeechPro/datasets/test/test.csv
–n_hidden 2048
–epoch -30
–export_dir /home/userk/DeepSpeechPro/tuned_model/models/
– lm_binary_path /home/userk/DeepSpeechPro/native_client/models/lm.binary
–checkpoint_dir /mnt/c/users/karthikeyan/downloads/modelss/DeepSpeech/
–decoder_library_path /home/userk/DeepSpeechPro/native_client/bin/libctc_decoder_with_kenlm.so
–alphabet_config_path /home/userk/DeepSpeechPro/native_client/models/alphabet.txt
–lm_trie_path /home/userk/DeepSpeechPro/native_client/models/trie
–learning_rate 0.0001

produces the following error…

Traceback (most recent call last):
File “DeepSpeech.py”, line 1959, in
tf.app.run(main)
File “/home/userk/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1910, in main
initialize_globals()
File “DeepSpeech.py”, line 330, in initialize_globals
custom_op_module = tf.load_op_library(FLAGS.decoder_library_path)
File “/home/userk/.local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py”, line 60, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/userk/DeepSpeechPro/native_client/bin/libctc_decoder_with_kenlm.so: undefined symbol: _ZN10tensorflow12OpDefBuilderC1EN4absl11string_viewE

I hope this information is feasible. if needed more information please ask sir.
I Even gone through lot of github issues but I can’t Find a solution. Hope you might be able to help me out in this.
Thank you…

lissyx · December 5, 2018, 2:23pm

How did you fetched libctc_decoder_with_kenlm ? 0.3.0 was TensorFlow r1.11 based, not 1.12 ; it seems you are using 0.3.0 binaries (libctc decoder) with current master for training.

karthikeyank · December 5, 2018, 3:23pm

I built the entire project by referring this blog post. in that post he built using DeepSpeech 0.1.0 model but I wanted to use the latest model hence I cloned from the master which is 0.3.0 on 30.10.2018.

The libctc_decoder_with_kenlm.so file comes preloaded in the native_client.amd64.cpu.linux.tar.xz from the DeepSpeech 0.3.0 release.

I am using the DeepSpeech 0.3.0 files from releases and the project is not current one, I cloned it on 30.10.2018…

I tried exrtracting a new native_client.tar.xz file referred by you but it doesn’t comes with the libctc_decoder_with_kenlm.so file.

I will be very happy if you could direct me through the correct path and steps to train the model and I am really sorry I posted the same question on most forums which annoyed you badly, I was struct in this issue for almost a day. I hope you might be able to direct me to the correct path…
Thank you…

karthikeyank · December 6, 2018, 5:44am

will building the project again from the released source code.tar.gz and using the native_client.amd64.cpu.linux.tar.xz from release resolve this issue…

karthikeyank · December 6, 2018, 7:14am

now I built a comlpete new project from the deepspeech 0.3.0 release with the respective requirements.txt packages and native_client.amd64.cpu.linux.tar.xz file from the releases. And I am getting this issue…

    ('Preprocessing', ['/home/userk/DeepSpeechPro/datasets/train/train.csv'])
    Traceback (most recent call last):
    File "DeepSpeech.py", line 1988, in 
    tf.app.run(main)
    File "/home/userk/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
    File "DeepSpeech.py", line 1944, in main
    train()
    File "DeepSpeech.py", line 1468, in train
    hdf5_cache_path=FLAGS.train_cached_features_path)
    File "/home/userk/DeepSpeechPro/DeepSpeech2/DeepSpeech-0.3.0/util/preprocess.py", line 68, in preprocess
    out_data = pmap(step_fn, source_data.iterrows())
    File "/home/userk/DeepSpeechPro/DeepSpeech2/DeepSpeech-0.3.0/util/preprocess.py", line 13, in pmap
    results = pool.map(fun, iterable)
    File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
    File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
AttributeError: 'Series' object has no attribute 'transcript'

lissyx · December 6, 2018, 3:24pm

This is likely because your train.csv file is incorrect and has no transcript column

el.abed.houssem94 · March 28, 2019, 12:08pm

please how do you solve the AttributeError: ‘Series’ object has no attribute ‘transcript’ error , because she is not solved for me

karthikeyank · March 31, 2019, 5:55am

Hi @el.abed.houssem94. That was actually an issue with my CSV file. Later I solved it by clearly looking into the csv file rows for whether any empty transcript columns. I would recommend you to use Open Office or Libre Office for better Results.

Topic		Replies	Views
KeyError: 'wav_filename' DeepSpeech	19	1585	July 21, 2020
Wav_filename = Path(row['wav_filename']). KeyError: 'wav_filename' DeepSpeech	4	524	April 16, 2021
Training Error due to file formatting DeepSpeech	11	3012	August 14, 2019
Error while training the model DeepSpeech	2	302	March 12, 2020
Error when trying to train DeepSpeech	7	1873	January 30, 2018

KeyError: 'wav_filename' on training DeepSpeech

Related topics