Ah, this will be the cause of some problems. Ideally chunks/audio segements/wavs have almost the same length 4-8 /10-15 seconds. I would recommend 5-10 seconds.
Ok i will updates this post with my findings after i normalize for audio length
I tried a batch size of 8 and it still fails - also it fails at a similar point when using 16 and 24 - fails when near the end.
Epoch 0 | Training | Elapsed Time: 1:50:51 | Steps: 3471 | Loss: inf E The following files caused an infinite (or NaN) loss: /home/anon/Downloads/jaSTTDatasets/processedAudio/18311.wav,/home/anon/Downloads/jaSTTDatasets/processedAudio/14902.wav,/home/anon/Downloads/jaSTTDatasets/processedAudio/13702.wav Epoch 0 | Training | Elapsed Time: 1:52:00 | Steps: 3482 | Loss: inf Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[13384,2048] and type bool on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node tower_0/dropout_3/GreaterEqual}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[concat/concat/_119]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. (1) Resource exhausted: OOM when allocating tensor with shape[13384,2048] and type bool on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node tower_0/dropout_3/GreaterEqual}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. 0 successful operations. 0 derived errors ignored. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "DeepSpeech.py", line 12, in <module> ds_train.run_script() File "/DeepSpeech/training/deepspeech_training/train.py", line 982, in run_script absl.app.run(main) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run _run_main(main, args) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/DeepSpeech/training/deepspeech_training/train.py", line 954, in main train() File "/DeepSpeech/training/deepspeech_training/train.py", line 607, in train train_loss, _ = run_set('train', epoch, train_init_op) File "/DeepSpeech/training/deepspeech_training/train.py", line 572, in run_set feed_dict=feed_dict) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[13384,2048] and type bool on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node tower_0/dropout_3/GreaterEqual (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[concat/concat/_119]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. (1) Resource exhausted: OOM when allocating tensor with shape[13384,2048] and type bool on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node tower_0/dropout_3/GreaterEqual (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. 0 successful operations. 0 derived errors ignored. Original stack trace for 'tower_0/dropout_3/GreaterEqual': File "DeepSpeech.py", line 12, in <module> ds_train.run_script() File "/DeepSpeech/training/deepspeech_training/train.py", line 982, in run_script absl.app.run(main) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run _run_main(main, args) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/DeepSpeech/training/deepspeech_training/train.py", line 954, in main train() File "/DeepSpeech/training/deepspeech_training/train.py", line 484, in train gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates) File "/DeepSpeech/training/deepspeech_training/train.py", line 317, in get_tower_results avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0) File "/DeepSpeech/training/deepspeech_training/train.py", line 244, in calculate_mean_edit_distance_and_loss logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl) File "/DeepSpeech/training/deepspeech_training/train.py", line 204, in create_model layers['layer_5'] = layer_5 = dense('layer_5', output, Config.n_hidden_5, dropout_rate=dropout[5], layer_norm=FLAGS.layer_norm) File "/DeepSpeech/training/deepspeech_training/train.py", line 93, in dense output = tf.nn.dropout(output, rate=dropout_rate) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 4229, in dropout return dropout_v2(x, rate, noise_shape=noise_shape, seed=seed, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 4313, in dropout_v2 keep_mask = random_tensor >= rate File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_math_ops.py", line 4481, in greater_equal "GreaterEqual", x=x, y=y, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__ self._traceback = tf_stack.extract_stack()
I have excluded files that are greater than 2 mb - it shouldnt be possible for 8x2 mb=16 mb to cause a 4gb ram gpu to go out of memory, correct me if there is some behaviour i am unaware about. Most files are around 250kb.
The face that it OOM’s towards then end is suspicious of some kind of memory leak.
Will retry with 4 batch size …
Try to run the files in reverse. There is some flag option for that. If the error is at the start, it is a file.
Batch size might not be the cause. But DeepSpeech, as most ML systems, uses the same feature size for all inputs. Therefore the largest file determines the memory. Try to exclude larger files.
Using the --reverse_train flag immediatly causes the program to go OOM. Apprently deepspeech sorts the files so large files are at the bottom - https://github.com/mozilla/DeepSpeech/issues/2513.
Anything above 4 batch size crashes.
I think the --reverse_train flag would be a useful tip for topics such as What is the ideal batch size?
Try train batch of 1, it will take longer, but with reverse you’ll see whether it would work.
Its working now with 4 reversed and it hasnt crashed - so i am pretty confident it will complete 1 epoch this time.
Also just wanted to confirm i have converted to utf-8 like this
ただまごまごするだけであった。夫人はそれを見澄してこういった。「誤解しちゃいけませんよ。私は私、
Into
\xE3\x81\x9F\xE3\x81\xA0\xE3\x81\xBE\xE3\x81\x94\xE3\x81\xBE\xE3\x81\x94\xE3\x81\x99\xE3\x82\x8B\xE3\x81\xA0\xE3\x81\x91\xE3\x81\xA7\xE3\x81\x82\xE3\x81\xA3\xE3\x81\x9F\xE3\x80\x82\xE5\xA4\xAB\xE4\xBA\xBA\xE3\x81\xAF\xE3\x81\x9D\xE3\x82\x8C\xE3\x82\x92\xE8\xA6\x8B\xE6\xBE\x84\xE3\x81\x97\xE3\x81\xA6\xE3\x81\x93\xE3\x81\x86\xE3\x81\x84\xE3\x81\xA3\xE3\x81\x9F\xE3\x80\x82\xE3\x80\x8C\xE8\xAA\xA4\xE8\xA7\xA3\xE3\x81\x97\xE3\x81\xA1\xE3\x82\x83\xE3\x81\x84\xE3\x81\x91\xE3\x81\xBE\xE3\x81\x9B\xE3\x82\x93\xE3\x82\x88\xE3\x80\x82\xE7\xA7\x81\xE3\x81\xAF\xE7\xA7\x81\xE3\x80\x81
So each record in the csv file looks like this -
/home/anon/Downloads/jaSTTDatasets/processedAudio/19752.wav,100070,\xE7\xB4\xA0\xE6\x99\xB4\xE3\x82\x89\xE3\x81\x97\xE3\x81\x84\xE8\xAA\x95\xE7\x94\x9F\xE6\x97\xA5\xE3\x82\x92\xE8\xBF\x8E\xE3\x81\x88\xE3\x82\x89\xE3\x82\x8C\xE3\x81\xBE\xE3\x81\x99\xE3\x82\x88\xE3\x81\x86\xE3\x81\xAB\xE3\x80\x82
Is this correct?
ただまごまごするだけであった。夫人はそれを見澄してこういった。「誤解しちゃいけませんよ。私は私、
it needs to stay this way
Update:
- Files causing inf loss -
This issue was fixed after fixing my transcripts from hex notation to proper utf-8 encoded chars. - OOM errors -
Fixed by reducing batch size to something my gpu could handle, in my case 4.
I was able to get it train for 2 epochs successfully, however i encountered another issue i have been stuck on
I wanted to test the model after 2 epochs, when the tests are run it returns the error -
I Loading variable from checkpoint: global_step I Loading variable from checkpoint: layer_1/bias I Loading variable from checkpoint: layer_1/weights I Loading variable from checkpoint: layer_2/bias I Loading variable from checkpoint: layer_2/weights I Loading variable from checkpoint: layer_3/bias I Loading variable from checkpoint: layer_3/weights I Loading variable from checkpoint: layer_5/bias I Loading variable from checkpoint: layer_5/weights I Loading variable from checkpoint: layer_6/bias I Loading variable from checkpoint: layer_6/weights Testing model on /home/anon/Downloads/jaSTTDatasets/final-test.csv Test epoch | Steps: 0 | Elapsed Time: 0:00:00 Traceback (most recent call last): File "DeepSpeech.py", line 12, in <module> ds_train.run_script() File "/DeepSpeech/training/deepspeech_training/train.py", line 982, in run_script absl.app.run(main) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run _run_main(main, args) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/DeepSpeech/training/deepspeech_training/train.py", line 958, in main test() File "/DeepSpeech/training/deepspeech_training/train.py", line 682, in test samples = evaluate(FLAGS.test_files.split(','), create_model) File "/DeepSpeech/training/deepspeech_training/evaluate.py", line 132, in evaluate samples.extend(run_test(init_op, dataset=csv)) File "/DeepSpeech/training/deepspeech_training/evaluate.py", line 114, in run_test cutoff_prob=FLAGS.cutoff_prob, cutoff_top_n=FLAGS.cutoff_top_n) File "/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/__init__.py", line 228, in ctc_beam_search_decoder_batch for beam_results in batch_beam_results File "/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/__init__.py", line 228, in <listcomp> for beam_results in batch_beam_results File "/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/__init__.py", line 227, in <listcomp> [(res.confidence, alphabet.Decode(res.tokens)) for res in beam_results] File "/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/__init__.py", line 138, in Decode return res.decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
Naturally i assumed its utf8 issue - and i need to fix my files. However i have tried almost everything to fix the file and nothing seems to work. Note that it works for training and validation - however during tests it fails.
anon@anon-Lenovo-Legion-Y540-15IRH-PG0:~$ isutf8 ./Downloads/jaSTTDatasets/final-dev.csv anon@anon-Lenovo-Legion-Y540-15IRH-PG0:~$ isutf8 ./Downloads/jaSTTDatasets/final-train.csv anon@anon-Lenovo-Legion-Y540-15IRH-PG0:~$ isutf8 ./Downloads/jaSTTDatasets/final-test.csv anon@anon-Lenovo-Legion-Y540-15IRH-PG0:~$ isutf8 ./Downloads/jaSTTDatasets/new newLogs.txt newutf.csv anon@anon-Lenovo-Legion-Y540-15IRH-PG0:~$ isutf8 ./Downloads/jaSTTDatasets/newutf.csv anon@anon-Lenovo-Legion-Y540-15IRH-PG0:~$ iconv -f UTF-8 ./Downloads/jaSTTDatasets/newutf.csv -o /dev/null; echo $? 0 anon@anon-Lenovo-Legion-Y540-15IRH-PG0:~$ iconv -f UTF-8 ./Downloads/jaSTTDatasets/final-test.csv -o /dev/null; echo $? 0 anon@anon-Lenovo-Legion-Y540-15IRH-PG0:~$ iconv -f UTF-8 ./Downloads/jaSTTDatasets/final-train.csv -o /dev/null; echo $? 0 anon@anon-Lenovo-Legion-Y540-15IRH-PG0:~$ iconv -f UTF-8 ./Downloads/jaSTTDatasets/final-dev.csv -o /dev/null; echo $? 0
As you can see the files seem to be properly utf 8 encoded.
Here is the test csv file - final-test.zip (349 Bytes)
My docker file if your curious about the environment i am building in -
# Please refer to the TRAINING documentation, "Basic Dockerfile for training" FROM tensorflow/tensorflow:1.15.4-gpu-py3 ENV DEBIAN_FRONTEND=noninteractive ENV DEEPSPEECH_REPO=https://github.com/mozilla/DeepSpeech.git ENV DEEPSPEECH_SHA=origin/master RUN apt-get update && apt-get install -y --no-install-recommends \ apt-utils \ bash-completion \ build-essential \ cmake \ curl \ git \ libboost-all-dev \ libbz2-dev \ locales \ python3-venv \ unzip \ wget # We need to remove it because it's breaking deepspeech install later with # weird errors about setuptools RUN apt-get purge -y python3-xdg # Install dependencies for audio augmentation RUN apt-get install -y --no-install-recommends libopus0 libsndfile1 # Try and free some space RUN rm -rf /var/lib/apt/lists/* WORKDIR / RUN git clone $DEEPSPEECH_REPO DeepSpeech WORKDIR /DeepSpeech RUN git checkout $DEEPSPEECH_SHA # Build CTC decoder first, to avoid clashes on incompatible versions upgrades RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings RUN pip3 install --upgrade native_client/ctcdecode/dist/*.whl # Prepare deps RUN pip3 install --upgrade pip==20.2.2 wheel==0.34.2 setuptools==49.6.0 # Install DeepSpeech # - No need for the decoder since we did it earlier # - There is already correct TensorFlow GPU installed on the base image, # we don't want to break that RUN DS_NODECODER=y DS_NOTENSORFLOW=y pip3 install --upgrade -e . # Tool to convert output graph for inference RUN python3 util/taskcluster.py --source tensorflow --branch r1.15 \ --artifact convert_graphdef_memmapped_format --target . # Build KenLM to generate new scorers WORKDIR /DeepSpeech/native_client RUN rm -rf kenlm && \ git clone https://github.com/kpu/kenlm && \ cd kenlm && \ git checkout 87e85e66c99ceff1fab2500a7c60c01da7315eec && \ mkdir -p build && \ cd build && \ cmake .. && \ make -j $(nproc) WORKDIR /DeepSpeech ENV TF_FORCE_GPU_ALLOW_GROWTH=true RUN apt-get update RUN apt-get install vim -y RUN sed -i 's/tfv1.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len)/tfv1.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len, ignore_longer_outputs_than_inputs=True)/g' training/deepspeech_training/train.py RUN sed -i 's/sequence_length=batch_x_len)/sequence_length=batch_x_len, ignore_longer_outputs_than_inputs=True)/g' training/deepspeech_training/evaluate.py
Glad to see that training is working now. Here are some ideas:
-
Do a test with data from training. You know that it is working, so it has to be sth else.
-
Run just a test set from the checkpoint to not train again.
-
It happens during beam search, so maybe the scorer. Use current master or release. Don’t know what commit that is.
-
Check how you built the scorer. The input for that has to be correct UTF-8 as well. Your docker doesn’t show how you do that.
Do a test with data from training. You know that it is working, so it has to be sth else.
I already did this, but i got the same issue…
Run just a test set from the checkpoint to not train again.
I dont understand, could you reiterate
It happens during beam search, so maybe the scorer. Use current master or release. Don’t know what commit that is.
Currently i am not using a scorer(not sure if it uses a default scorer) . The docker file pulls from current master.
Check how you built the scorer. The input for that has to be correct UTF-8 as well. Your docker doesn’t show how you do that.
Currently not using scorer.
PS
Is it ok if i build the scorer after some training, or will i need to retrain after i build the scorer. Will i have to do training all over again if i start using a scorer?
Also the loss is 270 - so maybe deepspeech is predicting some undefined charectors which cannot be decoded by utf-8
Maybe i should retry after i get a loss of under 100 to get valid results?
-
A high loss after 2 epochs is not concerning, you have to look at both losses over time.
-
The error will persist even with a loss of 10 as it has to do with how files are read. Check the points mentioned aboce.
Currently i am not using a scorer(not sure if it uses a default scorer) . The docker file pulls from current master.
OK, didn’t read this at first. As we don’t have the command you use it is hard to tell. How do you start testing?
I dont understand, could you reiterate
How do you test without training?
Is it ok if i build the scorer after some training, or will i need to retrain after i build the scorer. Will i have to do training all over again if i start using a scorer?
Please read the docs carefully and understand how DeepSpeech works. Currently you don’t. This will probably lead to bad results and a bad model …
python -u DeepSpeech.py --test_files /home/anon/Downloads/jaSTTDatasets/final-test.csv --test_batch_size 4 --epochs 5 --bytes_output_mode --checkpoint_dir /home/anon/Downloads/jaSTTDatasets/checkpoint/
So when i run this command it starts testing
My command for training is
python -u DeepSpeech.py --train_files /home/anon/Downloads/jaSTTDatasets/final-train.csv --train_batch_size 4 --dev_files /home/anon/Downloads/jaSTTDatasets/final-dev.csv --dev_batch_size 4 --test_files /home/anon/Downloads/jaSTTDatasets/final-test.csv --test_batch_size 4 --epochs 5 --bytes_output_mode --checkpoint_dir /home/anon/Downloads/jaSTTDatasets/checkpoint
You have to understand how deep learning works. Please start reading somewhere, maybe in Japanese, what happens in machine learning. DeepSpeech ist currently not yet an end user product. You won’t get results if you continue without any knowledge of what happens.
You are asking about high loss due to an UTF-8 error. You use a batch size of 4 for a single line. You set epochs to 5 for a single test run.
And again, 70 hours of input is not much, you won’t get results anywhere near what Google, … do.
Update on this -
Seems like these guys were having the same issue - Training Traditional Chinese for Common Voice using Deep Speech
I used thier ‘ignore’ solution and added a few debug statements in - /usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/__init__.py
def Decode(self, input): '''Decode a sequence of labels into a string.''' res = super(UTF8Alphabet, self).Decode(input) print("utf8 Decode function called") print(res) return res.decode('utf-8','ignore')
My test csv has only 1 record
When i test the following logs get printed -
root@6e061f9543ba:/DeepSpeech# python -u DeepSpeech.py --test_files /home/anon/Downloads/jaSTTDatasets/final-test.csv --test_batch_size 4 --epochs 1 --bytes_output_mode --checkpoint_dir /home/anon/Downloads/jaSTTDatasets/checkpoint/ I Loading best validating checkpoint from /home/anon/Downloads/jaSTTDatasets/checkpoint/best_dev-13477 I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel I Loading variable from checkpoint: global_step I Loading variable from checkpoint: layer_1/bias I Loading variable from checkpoint: layer_1/weights I Loading variable from checkpoint: layer_2/bias I Loading variable from checkpoint: layer_2/weights I Loading variable from checkpoint: layer_3/bias I Loading variable from checkpoint: layer_3/weights I Loading variable from checkpoint: layer_5/bias I Loading variable from checkpoint: layer_5/weights I Loading variable from checkpoint: layer_6/bias I Loading variable from checkpoint: layer_6/weights Testing model on /home/anon/Downloads/jaSTTDatasets/final-test.csv Test epoch | Steps: 0 | Elapsed Time: 0:00:00 utf8 Decode function called b'\xe3\x81\xe3\x81\xe3\x81\xe3\x81\xe3\x81\xe3\x81\xe3\x81\xe3\x81\xe3\x81\xe3\x81\xe3\x81\x8b\xe3\x80\x82' utf8 Decode function called b'\xe3\x81\x93\xe3\x81\xae\xe6\x96\x99\xe7\x90\x86\xe3\x81\xaf\xe5\x8d\xb5\xe3\x82\x92\xe4\xba\x8c\xe5\x80\x8b\xe4\xbd\xbf\xe3\x81\x84\xe3\x81\xbe\xe3\x81\x99\xe3\x80\x82' Test epoch | Steps: 1 | Elapsed Time: 0:00:21 Test on /home/anon/Downloads/jaSTTDatasets/final-test.csv - WER: 1.000000, CER: 0.928571, loss: 116.681183 -------------------------------------------------------------------------------- Best WER: -------------------------------------------------------------------------------- WER: 1.000000, CER: 0.928571, loss: 116.681183 - wav: file:///home/anon/Downloads/jaSTTDatasets/processedAudio/1254.wav - src: "この料理は卵を二個使います。" - res: "か。" -------------------------------------------------------------------------------- Median WER: -------------------------------------------------------------------------------- WER: 1.000000, CER: 0.928571, loss: 116.681183 - wav: file:///home/anon/Downloads/jaSTTDatasets/processedAudio/1254.wav - src: "この料理は卵を二個使います。" - res: "か。" -------------------------------------------------------------------------------- Worst WER: -------------------------------------------------------------------------------- WER: 1.000000, CER: 0.928571, loss: 116.681183 - wav: file:///home/anon/Downloads/jaSTTDatasets/processedAudio/1254.wav - src: "この料理は卵を二個使います。" - res: "か。" --------------------------------------------------------------------------------
When you encode the hex from the first call using https://mothereff.in/utf-8 its invalid, however when you encode the second call it gets encoded to - この料理は卵を二個使います。 which matches the transcript in my csv.
Also the loss is 270 - so maybe deepspeech is predicting some undefined charectors which cannot be decoded by utf-8
I am assuming the function is called once to decode the output predicted by the model and a second time to decode the transcript in the csv - this supports my hypothesis.
I am happy that you put in the time to solve this. Only you have all the infos and if you give us only bits and pieces, it is hard to suggest solutions.
If you have the time, put in a PR that solves this issue for future Japanese/Chinese models.
I am not sure this is the best solution since i am not a python programmer. The same decode function is being used in alot of places and in some of those places we may not want to ‘ignore’. Also i am not sure how python package manager works - this code was from the ctcdecoder package and not from deepspeech.
Basically not sure i would be the best person to make the change.