Adding custom words to language model

Maybe generate_lm.py could benefit from a PR that:

  • allows merging other text file
  • do so without applying top_k filtering on those