Decoding predictions pending

(VickyGu) #1

Hey guys:
I am training model by using thchs30 datasets. but it is pending in the test decoding predictions step, almost one day.

This is my parameters setting below:

CUDA_VISIBLE_DEVICES=1 python3 -u DeepSpeech.py
–alphabet_config_path $ALPHABET
–lm_binary_path $LM_BINARY
–lm_trie_path $LM_TRIE
–train_files $TRAIN_CSV
–dev_files $DEV_CSV
–test_files $TEST_CSV
–checkpoint_dir “$DATA_DIR/checkpoint”
–export_dir “$DATA_DIR/model”
–train_batch_size 40
–dev_batch_size 40
–test_batch_size 20
–n_hidden 512
–learning_rate 0.0001
–epoch 20
–log_level 0
–summary_secs 10
“$@”

And This is log :
Preprocessing done
2019-01-22 09:04:11.321570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-22 09:04:11.321669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-22 09:04:11.321694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-22 09:04:11.321701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-22 09:04:11.321973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21551 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:04:00.0, compute capability: 6.1)
Computing acoustic model predictions…
100% (124 of 124) |##################################################################################################################################################| Elapsed Time: 0:00:43 Time: 0:00:43
Decoding predictions…
N/A% (0 of 124) | | Elapsed Time: 0:00:00 ETA: --:–:--

Does anybody meet same problem? or know how to deal with it?

(Lissyx) #2

Can you:

  • use proper code formatting to make this readable and avoid missing informations ?
  • document better your setup ?
  • document how long you waited ?
(VickyGu) #3

I found the reason. Finally, I think It is not a problem. After waiting 2days, this step has been starting running. :sweat_smile:

Maybe the only one thing I can do is to add a new GPU helping calculation.

Thank you for answering my question. :kissing_smiling_eyes:

(Lissyx) #4

This won’t help, the decoder is not using the GPU.

(VickyGu) #5

:thinking:
What good methods can improve it except wait?

(Lissyx) #6

It would help if you replied to my first post, there’s no good reason it takes two days …

(Raul Tang Lc) #7

Thanks this great project, for now I can training on some sampling data set of data_thchs30 with good result for Chinese mandarin, when I try with full data set, I encounter this pending issue too, not sure where I put it wrong or something else.

I Training epoch 29...
I Training of Epoch 29 - loss: 1.401871
I FINISHED Optimization - training time: 4:52:49
100% (1250 of 1250) |####################| Elapsed Time: 0:20:36 Time:  0:20:36
Preprocessing ['./data/thchs30//test.csv']
Preprocessing done
Computing acoustic model predictions...
100% (1247 of 1247) |####################| Elapsed Time: 0:07:09 Time:  0:07:09
Decoding predictions...
  0% (5 of 1247) |              | Elapsed Time: 4:42:36 ETA:  54 days, 15:41:58

My training steps:

Data set: Chinese mandarin corpus data_thchs30
Deepspeech: v0.4.1
Cuda 9.0
Tensorflow 1.12.0

   ----------------sample of train.csv------------
wav_filename,wav_filesize,transcript
/data/data_thchs30/train/C6_639.wav,270044,檀 野 麻 子 骂 笠 冈 窝 囊 笠 冈 也 认 为 自 己 对 松 野 泰 造 之 死 负 有 不 可 推 卸 的 责 任
/data/data_thchs30/train/C31_736.wav,219884,这 也 难 怪 该 厂 已 走 马 灯 似 的 换 了 十 几 位 厂 长 没 一 个 有 能 耐 把 厂 子 整 好
/data/data_thchs30/train/C21_517.wav,262044,他 曾 为 火 药 仓 库 研 究 避 雷 装 置 为 国 家 造 币 厂 研 究 减 少 金 币 磨 损 消 耗 的 办 法
/data/data_thchs30/train/B8_404.wav,330044,液 体 乳 也 开 发 了 超 高 温 灭 菌 奶 婴 儿 配 方 奶 维 生 素 强 化 奶 可 可 奶 果 汁 奶 等
/data/data_thchs30/train/C7_726.wav,276044,如 果 超 过 成 本 目 标 一 概 否 决 要 么 干 要 么 让 位 要 么 受 奖 要 么 挨 罚

----------------sample of vocabulary.txt------------
一个民族 要 有 民族文化 一个 企业 要 有 企业文化 而 一个 家 也 要 有 家庭 文化
一九 九三年 二月 二十三日 上午 四川省 安岳县 岳 源 乡 五村 彭家 姑嫂 五人 进城 购置 衣服
一九 九三年 台湾 茶 饮料 销售额 达 五亿 美元 是 各类 饮料 中 增幅 最 快 的 产品
一九 九五年 一月 温州 市委 副 秘书长 翁 锦 武 来 接任 瓯海 区委书记
一九 八二年 夏 黄胄 先生 约 我 去 藻 鉴 堂 画画 恰逢 中国画 研究院 的 院庆

----------------sample of alphabet.txt------------
# Each line in this file represents the Unicode codepoint (UTF-8 encoded)
# # associated with a numeric label.
# # A line that starts with # is a comment. You can escape it with \# if you wish
# # to use '#' as a label.

=
l
一
丁
七
万
丈
三
上
下
不
与
丑
专
且
世
业
丛
东
丝
丢
两
严
丧
个
丫
中
丰
串
临

...

龄
龈
龙
龚
# The last (non-comment) line needs to end with a newline.

-----------language model steps------------
lmplz --text vocabulary.txt --arpa lm.arpa --o 5 -S 50% --discount_fallback

build_binary -T -s lm.arpa lm.binary

../../../native_client/generate_trie alphabet.txt lm.binary trie


-----------training steps------------
python -u DeepSpeech.py \
  --alphabet_config_path ./data/thchs30/${part}/alphabet.txt \
  --lm_binary_path ./data/thchs30/${part}/lm.binary \
  --lm_trie_path ./data/thchs30/${part}/trie \
  --train_files ./data/thchs30/${part}/train.csv \
  --dev_files ./data/thchs30/${part}/dev.csv \
  --test_files ./data/thchs30/${part}/test.csv \
  --train_batch_size 4 \
  --dev_batch_size 2 \
  --test_batch_size 2 \
  --n_hidden 2048 \
  --learning_rate 0.0001 \
  --checkpoint_dir $checkpoint_dir \
  --checkpoint_secs 600 \
  --validation_step 50 \
  --export_dir $checkpoint_dir \
  --epoch 30

Appreciate if any clues :slight_smile:

(Raul Tang Lc) #8

OK, I see same issue here: