System requirements for training Indian accent English over DeepSpeech pre-trained model checkpoints?

I’m sure it will work with an Indian accent. It can work with various accents and languages. So I doubt that an Indian accent will trip it up. I think the issue is hyperparameters + data size + data quality.

Could you describe what you are doing in more detail?

Sorry. I think i was not clear in what i wanted to convey.

I tried DeepSpeech pre-trained model on some Indian accent audio files, however it gave a WER of around ~35%.

To reduce this WER, i plan to train this model at checkpoints or with frozen model approach with Indian accent audios (~ 10 hours).

My system specs are a 10 GB RAM MAC with 1.5 GB Intel Graphics card and around 100 GB empty storage.
Will this be enough? Or i will have to use a different system with better specs?

Oh, I see now.

I agree the pre-trained model is unlikely to work well on Indian accented English and needs to be “fine-tuned”, as it looks like you are about to do.

Unfortunately, Intel Graphics cards are not supported, a TensorFlow limitation. So training will happen only on the CPU and likely take a long time.

I’d suggest trying to get your hands on something like a 1080Ti or more powerful if possible.

Yes, I am planning to fine-tune the model using the checkpoints.

I have around 88000 audio files (~5s in length each).

I am not able to arrange for a higher capability system for now. Is it possible for me to fine-tune the pre-trained model using these files on Google cloud/amazon AWS?

@kdavis @lissyx @reuben

@pra978 Yes it should work on Google cloud or Amazon AWS.

Hey @kdavis, I am getting this keyerror - “ctcbeamsearchdecoderlm” while trying to restore the checkpoints. Anyone has any idea about how to resolve this?

@pra978 Could you give the full text of the error? It will help in debugging

i ran this for trying to import meta file from latest release:

saver = tf.train.import_meta_graph(‘model.ckpt-97999.meta’)

and i got this error:


KeyError Traceback (most recent call last)
in ()
----> 1 saver = tf.train.import_meta_graph(‘model.ckpt-97999.meta’)

/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py in import_meta_graph(meta_graph_or_file, clear_devices, import_scope, **kwargs)
1836 clear_devices=clear_devices,
1837 import_scope=import_scope,
-> 1838 **kwargs)
1839 if meta_graph_def.HasField(“saver_def”):
1840 return Saver(saver_def=meta_graph_def.saver_def, name=import_scope)

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py in import_scoped_meta_graph(meta_graph_or_file, clear_devices, graph, import_scope, input_map, unbound_inputs_col_name, restore_collections_predicate)
658 importer.import_graph_def(
659 input_graph_def, name=(import_scope or “”), input_map=input_map,
–> 660 producer_op_list=producer_op_list)
661
662 scope_to_prepend_to_names = “/”.join(

/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py in new_func(*args, **kwargs)
314 ‘in a future version’ if date is None else (‘after %s’ % date),
315 instructions)
–> 316 return func(*args, **kwargs)
317 return tf_decorator.make_decorator(func, new_func, ‘deprecated’,
318 _add_deprecated_arg_notice_to_docstring(

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/importer.py in import_graph_def(graph_def, input_map, return_elements, name, op_dict, producer_op_list)
431 if producer_op_list is not None:
432 # TODO(skyewm): make a copy of graph_def so we’re not mutating the argument?
–> 433 _RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
434
435 graph = ops.get_default_graph()

/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/importer.py in _RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
209 # Remove any default attr values that aren’t in op_def.
210 if node.op in producer_op_dict:
–> 211 op_def = op_dict[node.op]
212 producer_op_def = producer_op_dict[node.op]
213 # We make a copy of node.attr to iterate through since we may modify

KeyError: ‘CTCBeamSearchDecoderWithLM’

I am trying to further train this model at checkpoints using Indian accent datasets on a mac (macOS High Sierra 10.13.2). Will “CTCBeamSearchDecoderWithLM” work on this?

What command line arguments did you give DeepSpeech.py?

I didn’t give any command line arguments to DeepSpeech.py.

I just ran this in jupyter Notebook :

saver = tf.train.import_meta_graph(‘model.ckpt-97999.meta’)

and got this error:

KeyError: ‘CTCBeamSearchDecoderWithLM’

Could you provide a link to the notebook?

If you’re writing your own training code you’ll need to load the decoder module as the training graph depends on it. See tf.load_op_library.

i just gave these 3 commands in the jupyter notebook. Nothing else.

import tensorflow as tf
sess=tf.Session()
saver = tf.train.import_meta_graph(‘model.ckpt-97999.meta’)

and got this error:

KeyError: ‘CTCBeamSearchDecoderWithLM’

I am not writing my own training code, i am just trying to load the pre-trained model with the help of checkpoints provided in latest release. I am doing this so that i can train some Indian accent audios on this afterwards.

Do i still need to load the decoder module?

Do i still need to load the decoder module?

Yes.

In that case, I don’t understand why you’re trying to load the checkpoint with your own code. Just use DeepSpeech.py and specify --checkpoint_dir and it’ll load the decoder module and the weights appropriately.

Thanks a lot. I will try that for sure. I was supposed to use it in terminal and I was trying to do it in python which I see is possible but kind of a longer process to follow.

Since this was not working out, I was trying to create my own model. I got all the files necessary based upon a French model tutorial from one of the discourse. When I am running the model, I am getting text error which as I came to know happens due to wrong encoding of text files. I will keep you updated on that. Thanks :blush:.

Hi @pra978, How are you going with this? I’m trying to do the same with Australian accent.

Hi @pra978 Have you trained the model with Indian accent.?
Also can you please share how did you collect the dataset.?