Speech Recognition for Roman Urdu with limited dataset

Hi, I’ve a limited dataset of Urdu language transcribed in Roman alphabet. Urdu sentences generally include many English words such as book, school etc. Is it possible to create reasonable ASR system that can recognize the Urdu and English words and output it in Roman alphabets. What approach should I take for this? Fine tuning or transfer learning, which will be better? Should I train all layer or some of the layers?

Have a good day! Thanks.

For training a model from scratch you would need at least a couple hundred hours of Urdu. If you don’t have that, try to do some fine-tuning or transfer learning.