Replacing non ascci characters when process corpus does not replace for the nearest character

Hi, here is a link to the LapsBM , but it is a small dataset :frowning: