Adding custom words to language model

Thanks for the heads up! That also needs to be mentioned in the docs!

I’m going to give up at this point, I don’t have time to setup a Linux box or use an online one, I need an OS independent way to add custom words, I just don’t have the time at the moment.

If anyone knows the value ranges to be used with addHotWord, please post! Thxs

Too bad you did not follow the guidelines we have documented on Discourse at the very begining to request any help, knowing you are using Windows would have been helping.

Yes, as @othiele said, training on Windows is not supported, and we are welcoming PRs for that.

That being said, wget exists for PowerShell, just use powershell and you get it.

gunzip to get raw text, cat >> to append. Yes, those are linux commands, I don’t use Windows, sorry.

I never said I don’t know how to control the word filtering, I said we have no code in place in generate_lm.py to do it.

Some of the people who worked on it are still there, but they are not working anymore on DeepSpeech. Myself and @reuben are doing that on our spare time, now.

In the past, I could have taken the time to add your feature quite quickly. Unfortunately, I don’t have the time now.

We don’t, it’s application-specific, so you need to experiment on your own usages.

Why use powershell when it can just be downloaded in a browser?

I found a post by JRMeyer about hot-word boosting values from last August that should go in your doc file!

As things are, it would take little effort to improve the user experience with DeepSpeech, but I don’t see that happening with the current politics.

You asked how to use wget, I’m helping you.

Thanks for your help, I’m glad I have taken time to try and get you something that works and be rewarded like that.

Sorry but this is just poor management, as over 5 months ago you were posting with JRMeyer and reuben about addHotWords, you posting that you weren’t sure it “even works”, reuben posting he was unsure the coefficient could be meaningfully applied, and JRMeyer confidently posting value ranges, and here we are and nothing has been done to resolve this issue at all. How is that good or useful?

I’m going to save the bytes on my LTE backup to do actual work, until my FTTH connection hopefully comes back.

Actually, FTTH is back.

So two months after layoff and while we were unable to know if Mozilla would continue DeepSpeech

Link please? Josh worked on this feature. I’m not sure what discussion you are referring to.

What issue ?

For smaller domain specific language models i found that boost values in the range of 1-20 gave me sufficient results to improve the recognition of the wanted words greatly. Mostly I stay in the 1-10 range actually.
Anything higher than that gave me worse results.
Keep in mind that a boost word doesn’t work with a space in it.
But thats just my personal experience thats limited to smallish language models.
Do you really need the general language model though? It really sounds to me like you have a very specific use case?
Wouldn’t it be easier to create a scorer from just the sentences that you actually expect?

If you are looking for off-the-shelf plug-and-play 24-7 support, sorry, but this is not what we can yet provide.

We have a general purpose 3D scene application, and we use standard language to control it, but then certain content also has custom words, crater names, biological words, etc

Come on, you don’t even know the parameter usage for your own functions! This is just poor management, and why rockets and planes blow up and crash, and maybe why Mozilla bosses gave up, I wonder. Let’s not make excuses, you need to sort this critical function out.

Your posts about the boost value for addHotWords is titled “enable hot-word boosting #3297”. It’s on your github.

This is a long PR, I’m not sure I remember all of it.

Rudeness is not going to get you any help, you know. Can we please focus on what is actionable?

1 Like

Try testing that with words like schrödinger (or schroedinger), schumacher and tycho

It does looks like you still don’t understand: we are not working anymore on this as part of our job.

I currently don’t have much spare time to allocate to work on this kind of feature, sorry. I burnt myself over the 2020 year, especially preparing the 1.0 release, that layoff wrecked, so now I can’t push too much and I physically need resting.