Bilingual Speech Recognition using Deepspeech (English - Urdu)

Hi, How to add support for detecting two languages(English And Urdu) ?

I need to recognize both pure Urdu and English sentences, as well as Urdu sentences with English words mixed in.

Also what is length of audio data(in hours) to make something meaningful for near real world usage ?

Thanks in advance.

Do you want the model to be able to recognize both at the same time, or to tell you which language it is? The amount of data we have for the english model is circa 6000 hours.

Thanks for your response @lissyx.

My ultimate goal is to recognize Pure English (Pakistani Accent), Pure Urdu, Urdu with substituted English Words. Urdu is to be written in Roman Urdu.

What method would you suggest ? :confused:

I can get maximum 10 hours of Urdu speech data. Waiting for your quick response.

That’s something that will mostly interest @josh_meyer

@ameerhamza.rz - I’d recommend two approaches, depending on how experienced you are with ASR and DeepSpeech.

(1) Easier - train a DeepSpeech model with transfer learning (i.e. use the transfer-learning branch on Github), and make the size of the alphabet the combined English + Roman Urdu alphabets.

(2) Harder - this solution is more elegant, but probably harder. Use the utf8 branch of DeepSpeech, and train a single model on Urdu and English. This branch is experimental, but it will allow English and Roman Urdu to share utf-8 encodings at the output layer.

1 Like

Hi @josh_meyer
I am trying to do the same with German language.
Will it be possible to start transfer-learning from DS0.4.1 provided ckpts (that were trained on English), 'coz the output number of alphabets are different for English and German

Hi, @josh_meyer What is the difference between both of these branches ? Can you point me to a paper / article ?

Will the resultant model will infer both English and Urdu ? Thanks

Hi, @josh_meyer using the transfer-learning branch , do I need to drop layers also ? If yes then which one ?

Thanks, waiting for your quick response.

Hi there,

If you want bilingual ASR, you first need to be more specific. Do you want a model that can switch between “English ASR” and “Urdu ASR”? That is, the input audio is either completely English or completely Urdu.

Or, do you want a model that can read a sentence which has both English words and Urdu words in it? This is called code-mixing or code-switching. If you want to be able to decode code-mixed speech, how do you want the output to look like? Should the English words be written in Urdu script? should the Urdu words be written in English script? Should the words be written in their own alphabet? If you want code-mixed decoding, then you will likely need code-mixed training data. And you will definitely want a code-mixed language model.

Using transfer learning will help, but not in the way you are probably thinking. If you transfer from English to Urdu, there is no reason to believe the model will “remember” any of the English that is originally learned.

If you can be more specific then we might be able to help more.

-josh

I am working for Bilingual ASR with code-mixing of Hindi words in English.

I want the output words to be written in their own alphabet.
For ex.
that तो i know
i wanted to tell you की you are great guy

Currently i have a custom LM built and for Indian English accent, i have tuned the acoustic model (Deepspeech 0.5.0 acoustic model) using the transfer learning branch. I am getting WER as 30% as of now. But it isn’t a concern as of now, because the data is very less.

What should I exactly do for code mixing?

Will combining alphabets of Hindi and English for language model help along with transfer learning on Code-mixed Data?

If you change the alphabet, you cannot directly re-use trained model. I’m wondering how @josh_meyer’s branch works wrt that

1 Like

Thanks for the reply,

Alright, whats your view on:

  1. Going for transfer learning with pure Hindi data on the currently trained model?
    If that is done, will the trained model forget the previous English training?

  2. Going for transfer learning with code-mixed data with some Hindi words in English audio

I am currently about to begin the 1st approach, but a bit confused on it.

Getting error in the first approach.
Used transferlearning2 branch, but cant use the checkpoints after changing the alphabets as you said!

The checkpoint in /home/nikhil/DeepSpeech-transfer-learning2/deepspeech-0.5.0-checkpoint/train-484525 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of /home/nikhil/DeepSpeech-transfer-learning2/deepspeech-0.5.0-checkpoint/train-484525.

Can’t help if you don’t explain what you do. The error message suggests you are doing much more than you claim above.

I have tuned the acoustic model on Indian English accent earlier.

I am trying to apply transfer learning on the same and train it on Hindi data.
For that i have added Hindi alphabets to the original English alphabet file, because the transcripts are in Hindi and I am trying to train it.

But I got the error as previously stated, for this procedure.

I have no idea what codebase you are using, which version, etc … the error explicitely states you are loading incompatible shape model. So, again, please give better context.

I am using transferlearning2 branch, version 0.5.0

And your command line ? Seriously.

python3 DeepSpeech.py --checkpoint_dir /home/nikhil/DeepSpeech-transfer-learning2/deepspeech-0.5.0-checkpoint/ --epoch 1 --train_files /home/nikhil/Desktop/train1.csv --test_files /home/nikhil/Desktop/test1.csv --dev_files /home/nikhil/Desktop/dev.csv --alphabet /home/nikhil/Desktop/hindi_alphabet.txt --export_dir /home/nikhil/Desktop/ 
WARNING: Logging before flag parsing goes to stderr.
W0210 16:46:20.231066 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:884: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

W0210 16:46:20.255008 140318949906240 deprecation_wrapper.py:119] From /home/nikhil/DeepSpeech-transfer-learning2/util/config.py:60: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0210 16:46:20.256026 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:866: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

W0210 16:46:20.256174 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:867: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

W0210 16:46:20.419441 140318949906240 deprecation.py:323] From /home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
W0210 16:46:20.448142 140318949906240 deprecation_wrapper.py:119] From /home/nikhil/DeepSpeech-transfer-learning2/util/feeding.py:44: The name tf.read_file is deprecated. Please use tf.io.read_file instead.

W0210 16:46:20.504619 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:389: The name tf.data.Iterator is deprecated. Please use tf.compat.v1.data.Iterator instead.

W0210 16:46:20.504805 140318949906240 deprecation.py:323] From DeepSpeech.py:389: DatasetV1.output_types (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(dataset)`.
W0210 16:46:20.504961 140318949906240 deprecation.py:323] From DeepSpeech.py:390: DatasetV1.output_shapes (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.
W0210 16:46:20.505082 140318949906240 deprecation.py:323] From DeepSpeech.py:391: DatasetV1.output_classes (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(dataset)`.
W0210 16:46:20.507122 140318949906240 deprecation.py:323] From /home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py:348: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
W0210 16:46:20.507293 140318949906240 deprecation.py:323] From /home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py:349: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
W0210 16:46:20.507420 140318949906240 deprecation.py:323] From /home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/data/ops/iterator_ops.py:351: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
W0210 16:46:20.611987 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:415: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0210 16:46:20.615381 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:208: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W0210 16:46:20.725610 140318949906240 deprecation.py:506] From /home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0210 16:46:21.559910 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:189: The name tf.nn.ctc_loss is deprecated. Please use tf.compat.v1.nn.ctc_loss instead.

W0210 16:46:21.591238 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:285: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

W0210 16:46:21.619580 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:344: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.

W0210 16:46:21.644130 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:437: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

W0210 16:46:21.686030 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:441: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

W0210 16:46:21.686893 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:443: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

W0210 16:46:21.687747 140318949906240 deprecation_wrapper.py:119] From DeepSpeech.py:448: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

W0210 16:46:21.765034 140318949906240 deprecation.py:323] From /home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I0210 16:46:21.766776 140318949906240 saver.py:1280] Restoring parameters from /home/nikhil/DeepSpeech-transfer-learning2/deepspeech-0.5.0-checkpoint/train-484525
E Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E 
E Assign requires shapes of both tensors to match. lhs shape= [102] rhs shape= [29]
E 	 [[node save/Assign_12 (defined at DeepSpeech.py:448) ]]
E 
E Errors may have originated from an input operation.
E Input Source operations connected to node save/Assign_12:
E  layer_6/bias/Adam (defined at DeepSpeech.py:438)
E 
E Original stack trace for 'save/Assign_12':
E   File "DeepSpeech.py", line 884, in <module>
E     tf.app.run(main)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
E     _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/absl/app.py", line 300, in run
E     _run_main(main, args)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
E     sys.exit(main(argv))
E   File "DeepSpeech.py", line 868, in main
E     train()
E   File "DeepSpeech.py", line 448, in train
E     checkpoint_saver = tf.train.Saver(max_to_keep=FLAGS.max_to_keep)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
E     self.build()
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 837, in build
E     self._build(self._filename, build_save=True, build_restore=True)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 875, in _build
E     build_restore=build_restore)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
E     restore_sequentially, reshape)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saver.py", line 350, in _AddRestoreOps
E     assign_ops.append(saveable.restore(saveable_tensors, shapes))
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 72, in restore
E     self.op.get_shape().is_fully_defined())
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/state_ops.py", line 227, in assign
E     validate_shape=validate_shape)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 66, in assign
E     use_locking=use_locking, name=name)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
E     op_def=op_def)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
E     return func(*args, **kwargs)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
E     op_def=op_def)
E   File "/home/nikhil/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
E     self._traceback = tf_stack.extract_stack()
E 
E The checkpoint in /home/nikhil/DeepSpeech-transfer-learning2/deepspeech-0.5.0-checkpoint/train-484525 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of /home/nikhil/DeepSpeech-transfer-learning2/deepspeech-0.5.0-checkpoint/train-484525.

So you don’t drop any layer and you don’t pass any fine-tune related argument ? You’re just not enabling the feature then.

You really need to get into the details of how this is working. Have you read the differences ?