How much disk space is required for training Deepspeech model


(Megha ) #1

Hi all,

I am continuing training from the pre-trained model using frozen graph parameter provided. I am using common voice data for training. I am not using GPU. When I start my training, it is throwing me a memory error with the below command.

./DeepSpeech.py --n_hidden 2048 --initialize_from_frozen_model models/output_graph.pb --checkpoint_dir fine_tuning_checkpoints --epoch 3 --train_files cv_corpus_v1/cv-valid-train.csv,cv_corpus_v1/cv-other-train.csv --dev_files cv_corpus_v1/cv-valid-dev.csv --test_files cv_corpus_v1/cv-valid-test.csv

How much disk space is required to do this process?

Thank you.


(Lissyx) #2

What is this “memory error” ? We cannot diagnose an error that you do not give us …


(Megha ) #3

./DeepSpeech.py --n_hidden 2048 --initialize_from_frozen_model models/output_graph.pb --checkpoint_dir fine_tuning_checkpoints --epoch 3 --train_files cv_corpus_v1/cv-valid-train.csv,cv_corpus_v1/cv-other-train.csv --dev_files cv_corpus_v1/cv-valid-dev.csv --test_files cv_corpus_v1/cv-valid-test.csv

W Parameter --validation_step needs to be >0 for early stopping to work
I Initializing from frozen model: models/output_graph.pb
Traceback (most recent call last):
File “./DeepSpeech.py”, line 1861, in
tf.app.run()
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py”, line 124, in run
_sys.exit(main(argv))
File “./DeepSpeech.py”, line 1818, in main
train()
File “./DeepSpeech.py”, line 1592, in train
session.run(init_from_frozen_model_op, feed_dict=feed_dict)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py”, line 539, in run
run_metadata=run_metadata)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py”, line 1013, in run
run_metadata=run_metadata)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py”, line 1104, in run
raise six.reraise(*original_exc_info)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py”, line 1089, in run
return self._sess.run(*args, **kwargs)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py”, line 1153, in run
feed_dict, options)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py”, line 1179, in _call_hook_before_run
request = hook.before_run(run_context)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py”, line 434, in before_run
"graph.pbtxt")
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/framework/graph_io.py”, line 69, in write_graph
text_format.MessageToString(graph_def))
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 163, in MessageToString
printer.PrintMessage(message)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 352, in PrintMessage
self.PrintField(field, element)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 383, in PrintField
self.PrintFieldValue(field, value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 419, in PrintFieldValue
self._PrintMessageFieldValue(value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 404, in _PrintMessageFieldValue
self.PrintMessage(value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 349, in PrintMessage
self.PrintField(field, entry_submsg)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 383, in PrintField
self.PrintFieldValue(field, value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 419, in PrintFieldValue
self._PrintMessageFieldValue(value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 404, in _PrintMessageFieldValue
self.PrintMessage(value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 354, in PrintMessage
self.PrintField(field, value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 383, in PrintField
self.PrintFieldValue(field, value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 419, in PrintFieldValue
self._PrintMessageFieldValue(value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 404, in _PrintMessageFieldValue
self.PrintMessage(value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 354, in PrintMessage
self.PrintField(field, value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 383, in PrintField
self.PrintFieldValue(field, value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 419, in PrintFieldValue
self._PrintMessageFieldValue(value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 404, in _PrintMessageFieldValue
self.PrintMessage(value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 354, in PrintMessage
self.PrintField(field, value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 383, in PrintField
self.PrintFieldValue(field, value)
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_format.py”, line 437, in PrintFieldValue
out.write(text_encoding.CEscape(out_value, out_as_utf8))
File “/home/megha/git-lfs-2.3.4/deepspeech-venv/local/lib/python2.7/site-packages/google/protobuf/text_encoding.py”, line 79, in CEscape
return ‘’.join(_cescape_byte_to_str[Ord©] for c in text)
MemoryError


(Lissyx) #4

This is hardly readable, can you ensure you are using proper code formatting ? Some valuable information might also be lost in reformatting if you are not using proper code format.

Can you give information on your system’s specs ?


(Megha ) #5

$ cat /proc/cpuinfo

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 78
model name : Intel® Core™ i7-6500U CPU @ 2.50GHz
stepping : 3
microcode : 0x8a
cpu MHz : 471.789
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 5184.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 78
model name : Intel® Core™ i7-6500U CPU @ 2.50GHz
stepping : 3
microcode : 0x8a
cpu MHz : 425.134
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 5184.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 78
model name : Intel® Core™ i7-6500U CPU @ 2.50GHz
stepping : 3
microcode : 0x8a
cpu MHz : 413.391
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 5184.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 78
model name : Intel® Core™ i7-6500U CPU @ 2.50GHz
stepping : 3
microcode : 0x8a
cpu MHz : 399.902
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 5184.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:


(Lissyx) #6

Formatting, please. And you have not given specs regarding your storage nor RAM.


(Megha ) #7

dmidecode 3.0

Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.

Handle 0x0026, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 16 GB
Error Information Handle: No Error
Number Of Devices: 2

Handle 0x0027, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0026
Error Information Handle: No Error
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: ChannelA-DIMM0
Bank Locator: BANK 0
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Not Specified
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown

Handle 0x0028, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0026
Error Information Handle: No Error
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: ChannelA-DIMM1
Bank Locator: BANK 1
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Not Specified
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown

Handle 0x0029, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0026
Error Information Handle: No Error
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: SODIMM
Set: None
Locator: ChannelB-DIMM0
Bank Locator: BANK 2
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: Samsung
Serial Number: 23156510
Asset Tag: 9876543210
Part Number: M471B1G73EB0-YK0
Rank: 2
Configured Clock Speed: 1600 MHz
Minimum Voltage: 1.25 V
Maximum Voltage: 1.35 V
Configured Voltage: 1.35 V

Handle 0x002A, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0026
Error Information Handle: No Error
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: Unknown
Set: None
Locator: ChannelB-DIMM1
Bank Locator: BANK 3
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer: Not Specified
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Rank: Unknown
Configured Clock Speed: Unknown
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: Unknown


(Lissyx) #8

8GB is the amount of (GPU) memory required for training on a GPU. I don’t think you are running out of disk space, but out of memory. How much swap do you have configured?


(Megha ) #9

$swapon -s

Filename	      Type		 Size	            Used	     Priority
/dev/sda4       partition	499708	          310104	     -1

(Lissyx) #10

So, 8GB of memory and 500M of swap? That’s likely not enough, I fear. Try to reduce the size of your dataset: --limit_train x (and same for dev and test sets).


(Megha ) #11

Okay. I will try that. Thank you for your response. But just for knowledge, if I don’t limit the dataset size then what should be my minimum disk space. can you tell me?

As I am planning to buy a new laptop, I can have this information in mind Because I am planning to dedicate this laptop just for Speech recognition project.


(Lissyx) #12

There are too many variables here in your problem, it depends on the dataset, mostly, I guess. Also, your first message is still incorrectly formatted, so it’s possible that valuable information to understand the stacktrace are missing.


(Lissyx) #13

Also, I don’t think you can expect to train on full Common Voice dataset (hundreds of hours of audio) on a single laptop without a GPU: training that amount of data on four dedicated TITANX GPUs would already take one full week.


(Megha ) #14

Yes, I am also planning to get a laptop with GPU. This is the reason I am upgrading to next laptop. So few specifications of hardware can help we look towards it.


(Lissyx) #15

Also, @erogol spotted that it is actually when reading the model that it is failing. Obviously not enough memory. Heap consumption on loading the protobuf model is peaking at ~4.5GiB of memory. That’s just to read the file.

@meghagowda5193 I would strongly advise against running serious training on a laptop: their GPUs are not as powerful as desktop ones, the heat constraints are really going to be a big issue. You might kill your hardware.


(Megha ) #16

Okay, thank you :slight_smile:
If I use limit parameter, how does it take the dataset? once the limit I specified is taken, the next set will be automatically taken?. Is there any thread on this topic where I can go through.


(Lissyx) #17

As much as I can remember, the limit is about the overall set of data.