Recent changes of model geometry (last layer dimension)

I am preparing for the update from 0.7.x to the current master to prepare for 0.9 release, but I have a problem with the checkpoint. Obviously, as described in the release notes, the checkpoints may be incompatible between the releases, but this error which I get is somehow interesting.

It says:

ValueError: Cannot feed value of shape (32,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(33,)'

but if I consult the documentation the last layer dimension did not change between 0.7.x and current master: It is still number of characters in the alphabet + 1 dimension for blank. Apart from that I checked how the alphabet files are processed now comparing to 0.7.x and there is also no difference (or I haven’t found any).

Could you give me the hint what change can be the reason for this error?

We made no change, are you sure you are passing the same alphabet file?

Yes, it is the same file, interesting. Do you have any experience about the backward compatibility of the changes w.r.t. the checkpoints? Are older checkpoints, like 0.7.x, still compatible?

I think we already explain that it should work.

No, but I insist that your error does look like different alphabet being used. Maybe you have some extra blank line, etc. ?

I found what the issue was. I had spurious empty line at the end of alphabet.txt, which produced one more label in order versions and this empty line is not parsed in the newer versions (>=0.8.2), hence one character less.

I would like to reproduce this empty line so that I can still use the old checkpoints until I would be able to retrain the model without this spurious character.

Would you have an idea how I could then provide empty line character as UTF-8 for alphabet if it is not being parsed now? I am not that confident with the different encodings unfortunately, so I am asking for the help. Or maybe you could point me to the file where the alphabet is processed so that I could reverse this change (for now).

what is the code of that char?

this is code that has changed for 0.8+, so maybe we should consider that a regression, depending on exactly what was your “empty line”

That’s what hexdump is giving me (I am not sure if it is the best way to get the char code):

00000000  23 73 74 61 72 74 0a 20  0a c3 a4 0a c3 bc 0a c3  |#start. ........|
00000010  b6 0a c3 9f 0a 61 0a 62  0a 63 0a 64 0a 65 0a 66  |.....a.b.c.d.e.f|
00000020  0a 67 0a 68 0a 69 0a 6a  0a 6b 0a 6c 0a 6d 0a 6e  |.g.h.i.j.k.l.m.n|
00000030  0a 6f 0a 70 0a 71 0a 72  0a 73 0a 74 0a 75 0a 76  |.o.p.q.r.s.t.u.v|
00000040  0a 77 0a 78 0a 79 0a 7a  0a 23 65 6e 64 0a 0a     |.w.x.y.z.#end..|

It would be 0x0a then.

This change right now would avoid me getting one spurious new line character, which I don’t actually want, so maybe it is not that big regression after all. It diminishes the flexibility of the alphabet though.

We’re on PTO today and tomorrow, but I’m sure @reuben agrees this would look like a regression.

There are two 0x0A at the end in your dump. If you can look precisely into the C++ code for reading this file, it would be cool

My money is on

1 Like

I’d say not a regression, just an accidental bug fix. We don’t want to have empty letters in the alphabet or duplicated symbols…