Native-messaging API---bytes or characters or both for message measurements?

anon48797620 · January 5, 2021, 5:18am

In the native-messaging API, where it reads,

Each message is serialized using JSON, UTF-8 encoded and is preceded with a 32-bit value containing the message length in native byte order.

The maximum size of a single message from the application is 1 MB. The maximum size of a message sent to the application is 4 GB.

does this mean that the 32-bit value provides the length as number of characters and the maximum size is 1 MB in bytes regardless of the number of characters?

I tried to read whether or not JSON UTF-8 could contain more than one byte per character and some of what I read said it’s always one byte per character and some said could it be up to four bytes per character.

This may likely be a stupid question. I’ve been using this code in C and hadn’t thought much more about it until considering what to do if a message was to exceed 1 MB. I can send 1,048,576 letters ‘b’ without error but that’s not much of a test.

const uint32_t len = strlen( response );
if ( ( rc = fwrite( &len, sizeof len, 1, stdout ) ) != 1 ||
     ( rc = fwrite( response, sizeof *response, len, stdout ) ) != len )

I figured if all the double quotes are escaped in C and the JSON, or large part of it, is passed as a string, the components could be joined in JS and the result parsed as JSON. The question then arises upon what measure—bytes or character length—should the strings be broken that exceed 1 MB.

Also, should the escape ‘\’ characters count against the size limit? For example, if the JSON string on the C side is broken into components and the double quotes escaped, as { "component":1, "string":"{\"property_name\":\"value\"}" }, do the four ‘\’ count in the total size? I should be able to test it better shortly but thought I’d at it here since it is something to be considered in the whole. And I better determine how C will count them in determining the uint32 value and hope both sides count in the same manner or make an adjustment.

The escape characters count against the limit.

Thank you.

zombie · January 26, 2021, 8:40pm

It’s the number of bytes after JSONification and encoding into utf8. It should be the same size to a file you get from calling JSON.strigify on your data, and writing that to a file with utf8 encoding.

Anyway, it shouldn’t matter much, since if there’s chance your messages are coming close to any limits, you should do your own serialization and splitting.

juraj.masiar · January 27, 2021, 8:56am

If you are interested in transmitting binary data or if you are just curious about JSON data encoding, checkout this:

anon48797620 · February 2, 2021, 8:12am

Thank you.

What I wasn’t understanding is that strlen isn’t the length as the number of characters but is the number of bytes. At least according to Windows. I looked up strlen in the <string.h> header file and since using minGW-W64, there is only one line and it simply uses Windows’ strlen function and they state that

“strlen interprets the string as a single-byte character string, so its return value is always equal to the number of bytes, even if the string contains multibyte characters.”

I didn’t know that. I don’t know why it isn’t named strbytes. So, the tricky part isn’t getting the number of bytes, but the number of characters in those bytes. Fortunately, right now at least, I don’t need to know that.