A bit late, but I think there’s an option nobody seems to have mentioned: I think the way I’d do that (not that I’ve got far enough to try it myself) would be to make a model whose output isn’t used directly. I think this is easier to explain with some examples, but, since I don’t speak Portuguese, I’ll use examples with Spanish instead (you should still be able to tell what they’re doing).
First, I’d make training data, both for the acoustic model and for the scorer model, that’s in Portuguese and where some sentences contain the names I want it to be able to recognize. At this point I’d have sentences like these (in Spanish because I don’t speak Portuguese):
Escucha música de Amy Winehouse.
un concierto de U2
and when preprocessing it, rather than just removing punctuation (unless you pronounce the pronunciation marks) and uncapitalizing everything, I’d also replace everything with a spelling of the pronunciation. So in the first sentence I’d replace “Amy Winehouse” with “eimi wainjaus” because that’s the closest I can do with Spanish spelling, and in the second sentence, I’ll write “u dos” since speakers of Spanish do seem to pronounce the 2 in Spanish in the name of the band U2.
Next, I’d train both the acoustic model and the language/scorer model. If I don’t have a bunch of audio of people talking about foreign musicians, I’d make a more general model first and maybe fine tune it a little bit. After this, if I try to get it to transcribe speech with those models, it should hopefully be able to write those names, although it will use Spanish spelling of some English pronunciations.
Finally, I’d run the output through some other tool that has a list of the musician/band names in their English and mangled spellings, and replaces the mangled spellings with the correct ones. Such a tool shouldn’t be very difficult to make.
I think I’ve read that some commercial speech recognition tools let you add words to their vocabularies by writing how they’re spelled and writing how they sound like they should be pronounced. So they might be doing some variant of this. I’ve never used those tools myself though.