Thank you Martin! That helped
.
So this is actually much harder than I thought. But luckily there are some smart people implementing useful libraries, like base-x which can convert to any base. And based on the StackOverflow post, there are 94 characters that can be represented by single byte in JSON.
I’ve just finished testing it and it does seems to work nice with the following alphabet:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~_!$()+,;@.:=^*?&<>[]{}%#|`/\u007f '-
I’ve found it here as Base95:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~_!$()+,;@.:=^*?&<>[]{}%#|`/\ "'-
But to make it compatible with JSON I’ve removed "
with \
and added “DEL” character - “\u007f”.
To test it, I’ve run:
await browser.storage.sync.set({t: "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~_!$()+,;@.:=^*?&<>[]{}%#|`/\u007f '-"})
await browser.storage.sync.getBytesInUse()
// 97 bytes = 1 key + 2 apostrophes + 94 characters :)
So if I’m right, this is the most efficient way to store data in storage.sync
.
…but it’s extremely slow, it takes like 5 seconds to encode 50KB of data
.
But anyway I’ve just saved 30KB just by changing encoding, so I’m super happy! And with the LZMA compression I can now store 160KB of data as 50KB which fits into storage.sync
! (well, after you chunk it to <8KB pieces). With so many operations involved I’m actually surprised it works
.
EDIT:
So after tracking a strange bug in Chrome I just found out that somehow (Chrome only!) "<"
character is encoded with 5 bytes, not 1
.
I’ve actually wrote an algorithm that goes through first 256 characters and tries to store each one to see how it goes and yes, there is 93 of them in this order:
" !#$%&'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007f"
EDIT 2:
Bug reported to Chromium. It seems to be part of XSS protection.