I want to do this project with Python or Cross or … Thank you for your help
The problem is that when a human being says a voice / speech or voice in Persian through a microphone, recognize that word or sentence using the dictionary of the words he has and return the address of the word that he recognized in the dictionary.
I want to use the most modeled searches I did, such as Deep speech , on the Mozilla Persian database, but I do not know what I should do. Please help me?
First, tell me what steps I need to take to complete this project?
Adding to @masoud_parpanchi you generally need a Persian DeepSpeech model from audio (I don’t think there one freely available) and a language model from text. Whya not search this forum for Persian language and connect with others who want to build one?
It looks like you don’t have much programming experience. It will be really hard for you to train a model as this isn’t like Word a turnkey solution yet. I would start by contacting these people who posted before via private message and connect:
@masoud_parpanchi already linked to the docs. If you have some programming experience you should be able to follow the steps. Start with running some tests.
If you don’t understand the docs, I am sorry, but DeepSpeech might not be right for you at the moment. Maybe there will be an easier way in the future.
If you don’t have hundreds of hours of transcribed Persian and some GPU computing power and some programming/devops expertise, you might be better off using Google or any other API providing speech to text for Persian.