Efficient storage of ArrayBuffer / Uint8Array

juraj.masiar · May 10, 2020, 7:25am

I’m building new extension that will utilize Crypto API which returns ArrayBuffer, plus you need to store initialization vector which is Uint8Array.
I will be storing these in the browser.storage.sync area which has only 100KB of space so I want to use it wisely - not wasting space.

However there are two issues:

I’m not sure if storing these types there will really use only the space required (and I can’t measure it due to missing getBytesInUse function)
The main issue is that I’m trying to be compatible with Chrome but Chrome cannot serialize those types at all

To solve both problems, I would like to convert those values into something easy and more predictable, maybe string???
But in a way that it won’t take more space, so definitely not base64 string.
Or is there a better way?

freaktechnik · May 10, 2020, 7:57pm

Before we had ArrayBuffers and friends we’d always use strings to store byte data in JS. Essentially one character maps to one byte. You can convert to and from using String.prototype.charCodeAt and String.fromCharCode. Now, depending on the representation format the storage uses for JS strings that means either one byte of information takes one byte, or if the JS strings are saved as “UCS-2” strings, two bytes. Obviously you could also use a number, possibly a big-int, though I’m not sure if there are any benefits compared to the very raw byte storage of strings.

juraj.masiar · May 10, 2020, 8:29pm

Thanks!

I’ve just spent a lot of time trying to convert it to string and back (using TextDecoder/Encoder) but without success.
I will try the charCodeAt, that sound simple enough to work , but I need to check the UCS-2 first (which doesn’t look like a light reading ) .

I really like the idea with storing numbers, but I guess it won’t be easier nor smaller. Sadly storing the BigInt is again not supported in Chrome

But maybe if I use Uint32Array view and convert it to array of numbers, it could be good enough? I need to find out how big container is internally being used for numbers (maybe 4 bytes?).

…it’s study time!

EDIT:
So using Uint32Array numbers didn’t worked neither, something about not being aligned…
The only thing that worked is processing it byte by byte to string:

export function bufferToString(buf: Uint8Array | ArrayBuffer) {
  return String.fromCharCode(...new Uint8Array(buf));
  // return String.fromCharCode.apply(null, new Uint8Array(buf));
}

export function stringToUint8Array(str: string) {
  const buf = new ArrayBuffer(str.length);
  const bufView = new Uint8Array(buf);
  for (let i = 0, strLen = str.length; i < strLen; i++) {
    bufView[i] = str.charCodeAt(i);
  }
  return bufView;
}

Inspired by code here, but modified to use 8bit, not 16 (which again caused the issue with not being aligned).

juraj.masiar · August 9, 2020, 2:25pm

So storing strings actually yields very mixed results.

Now when browser.storage.sync.getBytesInUse() is implemented, I can precisely measure how much space my data occupies:

await browser.storage.sync.set({t: 1})
// 2 bytes
await browser.storage.sync.set({t: 11})
// 3 bytes
await browser.storage.sync.set({t: '0'})
// 4 bytes
await browser.storage.sync.set({t: String.fromCharCode(0)}) // "\u0000"
// 9 bytes - that's a whole 6 bytes just to store a zero :(

Any ideas how to improve this?
I can imagine a crazy idea where I cherry-pick 256 “single character” characters and somehow map them to all 256 Uint8 values

freaktechnik · August 9, 2020, 2:32pm

I would imagine that the storage usage also includes overhead for storing the key and possibly some other things the database does. Thus I’m not sure if you can rely on consecutive readings after a change of the value being reliable for only what the current value takes up.

Either way, if there is some kind of “non monotone” storage usage that seems like something developers should be aware of. I think either way it would be good to have some more detailed documentation on what exactly the quotas include, and how you can try to minimize your quota usage as developer.

juraj.masiar · August 9, 2020, 2:41pm

Actually it is defined and you can measure it and it totally works.
It’s the key size plus the value size.
From the MDN:

Name	Description	Value in bytes
Maximum total size	The maximum total amount of data that each extension is allowed to store in the sync storage area, as measured by the JSON stringification of every value plus every key’s length.	102400
Maximum item size	The maximum size of any one item that each extension is allowed to store in the sync storage area, as measured by the JSON stringification of the item’s value plus the length of its key.	8192
Maximum number of items	The maximum number of items that each extension is allowed to store in the sync storage area.	512

Test it in your console:

await browser.storage.sync.set({t: 555})
await browser.storage.sync.getBytesInUse()

Will give you 4 bytes because it’s 4 characters. Same in Firefox and Chrome.

freaktechnik · August 9, 2020, 3:09pm

Based on the documentation I am confused about this then:

EDIT: oh, it’s counting the quotes from the JSON stringification, thus numbers are more efficient. Got it.

freaktechnik · August 9, 2020, 3:12pm

You may be interested in base85 then… https://stackoverflow.com/questions/1443158/binary-data-in-json-string-something-better-than-base64

juraj.masiar · August 9, 2020, 9:46pm

Thank you Martin! That helped .
So this is actually much harder than I thought. But luckily there are some smart people implementing useful libraries, like base-x which can convert to any base. And based on the StackOverflow post, there are 94 characters that can be represented by single byte in JSON.

I’ve just finished testing it and it does seems to work nice with the following alphabet:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~_!$()+,;@.:=^*?&<>[]{}%#|`/\u007f '-

I’ve found it here as Base95:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~_!$()+,;@.:=^*?&<>[]{}%#|`/\ "'-

But to make it compatible with JSON I’ve removed " with \ and added “DEL” character - “\u007f”.

To test it, I’ve run:

await browser.storage.sync.set({t: "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~_!$()+,;@.:=^*?&<>[]{}%#|`/\u007f '-"})
await browser.storage.sync.getBytesInUse()
// 97 bytes = 1 key + 2 apostrophes + 94 characters :)

So if I’m right, this is the most efficient way to store data in storage.sync .
…but it’s extremely slow, it takes like 5 seconds to encode 50KB of data .

But anyway I’ve just saved 30KB just by changing encoding, so I’m super happy! And with the LZMA compression I can now store 160KB of data as 50KB which fits into storage.sync! (well, after you chunk it to <8KB pieces). With so many operations involved I’m actually surprised it works .

EDIT:
So after tracking a strange bug in Chrome I just found out that somehow (Chrome only!) "<" character is encoded with 5 bytes, not 1 .
I’ve actually wrote an algorithm that goes through first 256 characters and tries to store each one to see how it goes and yes, there is 93 of them in this order:

" !#$%&'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007f"

EDIT 2:
Bug reported to Chromium. It seems to be part of XSS protection.

Topic		Replies	Views
Storage sync works when Firefox account is totally disabled? Development	6	682	August 14, 2023
Get storage.StorageArea size and let user decide if sync or local Development	3	823	February 10, 2018
Why doesn't storage.sync simply have a maximum total size? Development	5	771	March 26, 2021
Performance optimization: multiple browser.storage queries vs caching everything in background script Development	3	740	March 3, 2021
Storage for private data Development	3	2026	April 12, 2017

Efficient storage of ArrayBuffer / Uint8Array

Related topics