System requirements for training Indian accent English over DeepSpeech pre-trained model checkpoints?

DeepSpeech does not perform great with Indian accent English. I am trying to train it with more than 5000 audio files of Indian English accent each a length of 5 sec.

What should be my system requirements?
Any idea how much time it will take?

I’m sure it will work with an Indian accent. It can work with various accents and languages. So I doubt that an Indian accent will trip it up. I think the issue is hyperparameters + data size + data quality.

Could you describe what you are doing in more detail?

Sorry. I think i was not clear in what i wanted to convey.

I tried DeepSpeech pre-trained model on some Indian accent audio files, however it gave a WER of around ~35%.

To reduce this WER, i plan to train this model at checkpoints or with frozen model approach with Indian accent audios (~ 10 hours).

My system specs are a 10 GB RAM MAC with 1.5 GB Intel Graphics card and around 100 GB empty storage.
Will this be enough? Or i will have to use a different system with better specs?

Oh, I see now.

I agree the pre-trained model is unlikely to work well on Indian accented English and needs to be “fine-tuned”, as it looks like you are about to do.

Unfortunately, Intel Graphics cards are not supported, a TensorFlow limitation. So training will happen only on the CPU and likely take a long time.

I’d suggest trying to get your hands on something like a 1080Ti or more powerful if possible.

Yes, I am planning to fine-tune the model using the checkpoints.

I have around 88000 audio files (~5s in length each).

I am not able to arrange for a higher capability system for now. Is it possible for me to fine-tune the pre-trained model using these files on Google cloud/amazon AWS?

@kdavis @lissyx @reuben

@pra978 Yes it should work on Google cloud or Amazon AWS.

Hey @kdavis, I am getting this keyerror - “ctcbeamsearchdecoderlm” while trying to restore the checkpoints. Anyone has any idea about how to resolve this?

@pra978 Could you give the full text of the error? It will help in debugging

i ran this for trying to import meta file from latest release:

saver = tf.train.import_meta_graph(‘model.ckpt-97999.meta’)

and i got this error:


KeyError Traceback (most recent call last)
in ()
----> 1 saver = tf.train.import_meta_graph(‘model.ckpt-97999.meta’)

/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py in import_meta_graph(meta_graph_or_file, clear_devices, import_scope, **kwargs)
1836 clear_devices=clear_devices,
1837 import_scope=import_scope,
-> 1838 **kwargs)
1839 if meta_graph_def.HasField(“saver_def”):
1840 return Saver(saver_def=meta_graph_def.saver_def, name=import_scope)

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py in import_scoped_meta_graph(meta_graph_or_file, clear_devices, graph, import_scope, input_map, unbound_inputs_col_name, restore_collections_predicate)
658 importer.import_graph_def(
659 input_graph_def, name=(import_scope or “”), input_map=input_map,
–> 660 producer_op_list=producer_op_list)
661
662 scope_to_prepend_to_names = “/”.join(

/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py in new_func(*args, **kwargs)
314 ‘in a future version’ if date is None else (‘after %s’ % date),
315 instructions)
–> 316 return func(*args, **kwargs)
317 return tf_decorator.make_decorator(func, new_func, ‘deprecated’,
318 _add_deprecated_arg_notice_to_docstring(

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/importer.py in import_graph_def(graph_def, input_map, return_elements, name, op_dict, producer_op_list)
431 if producer_op_list is not None:
432 # TODO(skyewm): make a copy of graph_def so we’re not mutating the argument?
–> 433 _RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
434
435 graph = ops.get_default_graph()

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/importer.py in _RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
209 # Remove any default attr values that aren’t in op_def.
210 if node.op in producer_op_dict:
–> 211 op_def = op_dict[node.op]
212 producer_op_def = producer_op_dict[node.op]
213 # We make a copy of node.attr to iterate through since we may modify

KeyError: ‘CTCBeamSearchDecoderWithLM’

I am trying to further train this model at checkpoints using Indian accent datasets on a mac (macOS High Sierra 10.13.2). Will “CTCBeamSearchDecoderWithLM” work on this?

What command line arguments did you give DeepSpeech.py?

I didn’t give any command line arguments to DeepSpeech.py.

I just ran this in jupyter Notebook :

saver = tf.train.import_meta_graph(‘model.ckpt-97999.meta’)

and got this error:

KeyError: ‘CTCBeamSearchDecoderWithLM’

Could you provide a link to the notebook?

If you’re writing your own training code you’ll need to load the decoder module as the training graph depends on it. See tf.load_op_library.

i just gave these 3 commands in the jupyter notebook. Nothing else.

import tensorflow as tf
sess=tf.Session()
saver = tf.train.import_meta_graph(‘model.ckpt-97999.meta’)

and got this error:

KeyError: ‘CTCBeamSearchDecoderWithLM’

I am not writing my own training code, i am just trying to load the pre-trained model with the help of checkpoints provided in latest release. I am doing this so that i can train some Indian accent audios on this afterwards.

Do i still need to load the decoder module?

Do i still need to load the decoder module?

Yes.

In that case, I don’t understand why you’re trying to load the checkpoint with your own code. Just use DeepSpeech.py and specify --checkpoint_dir and it’ll load the decoder module and the weights appropriately.

Thanks a lot. I will try that for sure. I was supposed to use it in terminal and I was trying to do it in python which I see is possible but kind of a longer process to follow.

Since this was not working out, I was trying to create my own model. I got all the files necessary based upon a French model tutorial from one of the discourse. When I am running the model, I am getting text error which as I came to know happens due to wrong encoding of text files. I will keep you updated on that. Thanks :blush:.

Hi @pra978, How are you going with this? I’m trying to do the same with Australian accent.