Arghhhh…
My friend, it’s totally impossible !!!
Ex:
2600 samples of only 1 speaker,
for 200 sentences,
with nearly 500 different words
with an alphabet of 26 to 30 characters…
You could reach a 40 to 60% accuracy max.
For 200 different sentences, for 4 peoples, it’s impossible without a bigger model.
It it was my problem, i’d work with at least
10000 train for each person, and separate it in 70/20/10% train/dev/test.
Sorry for the bad news.