What is the usage of apostrophe here. More precisely what should each line of vocabulary file contain? Should each line simpy contain each word that appears in the language corpus? Or is there any special formatting required?
I looked at the README.md and generated my vocabulary file using the given command. But I was somewhat confused seeing apostrophes in middle of word in the provided data/lm/vocab.txt file.
My language corpus does not have any apostrophe. So, if my vocabulary file contains the unique words that appear in the language corpus, it would be okay, right?