Replacing non ascci characters when process corpus does not replace for the nearest character

The FalaBrasil download links are currently broken but I found archived copies of LapsBM and MailBenchmark on archive.org. I reached out to the maintainers and they told me they’re re-uploading the data.