Trainining Deep Speech on Customized data for tounge words better on words or sentences?

I want to train a model with my own data.
what I need to find is find within the spoken data words which have a tongue constant only in the word.
e.g. detect le/la/ll/ne/na/nn/ me/ma/mmm (all of those can be detected as a single word)
other words can be anything in any spoken language and I do not need to classify
how can I train such model?
is there any way I can use the pre-trained models?

when training better to have sequences including these ‘words’ and others or cut them and have a specific word.