Generate_lm doesn't support "--skip_symbols"

Hello
I am doing some tests with a portuguese Deep Speech which I trained.
I made a huge vocabulary.txt, inspired by the english version. But my .txt has some symbols too. So I got the error:

Converting to lowercase and counting word occurrences ...
| |                              #              | 23838383 Elapsed Time: 0:14:29

Saving top 500000 words ...

Calculating word statistics ...
  Your text file has 603695917 words in total
  It has 7083721 unique words
  Your top-500000 words are 97.6812 percent of all words
  Your most common word "de" occurred 29191195 times
  The least common word in your top-k is "super-heroínas;" with 16 times
  The first word with 17 occurrences is "✤" at place 488285

Creating ARPA file ...
=== 1/5 Counting and sorting n-grams ===
Reading /content/sample_data/DeepSpeech/data/lm/lower.txt.gz
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
tcmalloc: large alloc 2043600896 bytes == 0x56449cc22000 @  0x7faf03e611e7 0x56449a48f7a2 0x56449a42a51e 0x56449a4092eb 0x56449a3f5066 0x7faf01ffabf7 0x56449a3f6baa
tcmalloc: large alloc 9536798720 bytes == 0x564516910000 @  0x7faf03e611e7 0x56449a48f7a2 0x56449a47e7ca 0x56449a47f208 0x56449a409308 0x56449a3f5066 0x7faf01ffabf7 0x56449a3f6baa
**********/content/sample_data/DeepSpeech/kenlm/lm/builder/corpus_count.cc:179 in void lm::builder::{anonymous}::ComplainDisallowed(StringPiece, lm::WarningAction&) threw FormatLoadException.
Special word <s> is not allowed in the corpus.  I plan to support models containing <unk> in the future.  Pass --skip_symbols to convert these symbols to whitespace.
Traceback (most recent call last):
  File "generate_lm.py", line 210, in <module>
    main()
  File "generate_lm.py", line 201, in main
    build_lm(args, data_lower, vocab_str)
  File "generate_lm.py", line 97, in build_lm
    subprocess.check_call(subargs)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/content/sample_data/DeepSpeech/kenlm/cmake/bin/lmplz', '--order', '5', '--temp_prefix', '.', '--memory', '85%', '--text', './lower.txt.gz', '--arpa', './lm.arpa', '--prune', '0', '0', '1', '--discount_fallback']' died with <Signals.SIGABRT: 6>.

So I tried to write --skip_symbols as a flag for generate_lm, and it seems it doens’t allow this flag. How can I deal with this?
Skipping the symbols could help me a lot.

generate_lm is just a wrapper for other scripts. Check the source to see what it is doing and call kenlm directly or (better way) get rid of the special characters. Usually just stuff you don’t want to have in your language model anyways.

Ok, thank you!
I’m going to get rid of the special characters.