Fine tuning data requirements

I just saw common voice dataset. there are lot nan values in accent columns in validated tsv so what is the solution. i could only find 16000 indian accent sample from 490000 samples.