Something which is a pleasant surprise to me when using DeepSpeech is that the model loading time is extremely fast (instantiating the DeepSpeech Model from memory-mapped file).
Typically, I can get loading time in less than 20-30 milliseconds on a fairly standard cloud machine with non-SSD persistent disks.
I’m not a computer scientist and really have no understanding of memory-mapped files, but I’m wondering what allows the model to load so quickly, when the model file itself is ~190 mb? Surely the disk read speed is much lower than what the load time would suggest (it would need to be in the Gb/s). I’m also not seeing any lag upon the initial streaming recognition (which would suggest some sort of page fault lazy loading.
Any disk I/O, or OS expert can offer some explanation of this?