Is it possible to cache JS::Stencil objects?

We have embedded Spidermonkey 102.7 ESR in our product.

We run a pool of JS threads to execute JS code. Each JS thread allocates a single JSContext. We want to cache compiled scripts in memory and we don’t need to persist them across the process runs. Hence, we experimented with a singleton cache object of the type:

std::map<std::string, RefPtr<JS::Stencil>> cache;

When we don’t have a script in the cache we create a stencil with

RefPtr<JS::Stencil> st = JS::CompileGlobalScriptToStencil(_cx, opts, source);

and then create a JS::RootedScript script with

script = JS::InstantiateGlobalStencil(_cx, instantiateOptions, st);

at the same time, we store the stencil in our cache object.
When we have a cached stencil, we take it from the cache and call

JS::InstantiateGlobalStencil(_cx, instantiateOptions, st);

Everything works well. We tested the approach under a heavy load and saw no problems. However, our process crashes during the shutdown. Depending on how we access our JS pool of threads, the crash happens either in

~StringBox at firefox-102.7.0/js/src/vm/SharedImmutableStringsCache.h:265

or in

JSRuntime::destroyRuntime at firefox-102.7.0/js/src/vm/Runtime.cpp:292

The former is in MOZ_RELEASE_ASSERT(refcount == 0) — refcount is 1, the latter is in MOZ_ASSERT(scriptDataTable(lock).empty());

So something, in Spidermonkey code, keeps holding shared objects.

We’ve rewritten our code by adding JS::EncodeStencil and JS::DecodeStencil pair of functions and saving uint8_t * buffer instead of stencils in our cache.

The process stopped crashing.

Does our initial approach have flaws?

As we don’t need a persistent cache we want to avoid extra steps of encoding and decoding stencil. But maybe it is not possible because of the design of the stencils?

Thank you.

Hello.

Thank you for using the new JS::Stencil feature :slight_smile:

JS::Stencil holds a reference to the bytecode compiled from JS code, and the bytecode is registered to JSRuntime's “script data table” to de-duplicate same bytecodes across multiple compilations, in order to reduce the memory consumption.

That “script data table” is what you see in the 2nd assertion failure.
The assertion is to verify that all bytecodes are freed before shutting down,
which requires all bytecode no longer being referred from any JS::Stencil or any runtime objects.

The 1st assertion failure is similar thing, where, the assertion is to verify that all "shared string"s, such as source code or filename used during compilation, are freed before shutting down.

So, possible situation is that, there’s JS::Stencil reference alive when shutting down, which makes the JS::Stencil's ref-count non-zero, and which makes the bytecode’s ref-count greater than 1 (stencil’s reference + script data table’s reference), and the script data table doesn’t become empty.

Is your JS::Stencil cache cleared before starting the shutdown?
If not, try clearing the map before shutting down to see if it solves the issue.

Let me know if it doesn’t solve the issue.

Thank you for your answer. We were caching compiled scripts with Spidermonkey 45 and before. However, Spidermonkey 45 required creating a cache in every JS thread which led to significant memory consumption as we run many threads. The stencils are very promising.

Yes, the first JS thread that goes down clears the map that holds stencils. The map is a singleton so the call clears stencils created in other JSContexts too. Then the thread calls JS_DestroyContext(_cx) on _cx that belongs to the thread. When other JS threads go down, there are no stencils in the map.

Okay, so, the cache is shared across multiple JSContexts, and which means that, a JS::Stencil compiled in thread A’s JSContext is used by thread B’s JSContext, right?

What’s the lifetime model of each thread’s JSContext, and how are multiple JSContext's lifetime overlap?
When a thread is shutting down its JSContext, is there any other thread that is still running scripts, or holds a reference to their JSScript, or perhaps just that JSScript is not yet GC-ed ?

The possible case I can think of is the following:

  1. When thread A compiles script, the script’s bytecode has ref-count 2, which is from:
  • JS::Stencil referred from the cache map
  • A’s JSRuntime's script data table
  1. When thread A instantiates the script, the script’s bytecode has ref-count 3:
  • JS::Stencil referred from the cache map
  • A’s JSRuntime's script data table
  • (new) JSScript belongs to A’s JSRuntime
  1. When thread B gets JS::Stencil from cache and instantiate it, the script’s bytecode has ref-count 4:
  • JS::Stencil referred from the cache map
  • A’s JSRuntime's script data table
  • JSScript belongs to A’s JSRuntime
  • (new) JSScript belongs to B’s JSRuntime
  1. When thread A is shutting down, JSScript in A’s JSRuntime gets GC-ed, but JSScript in B's JSRuntimedoesn't get GC-ed. so, even if the cache map is cleared, the ref-count is still 2, not 1, and that doesn't meet the requirement [1] to remove the item from theJSRuntime`'s script data table, and the table doesn’t become empty, and the assertion fails:
  • A’s JSRuntime's script data table
  • JSScript belongs to B’s JSRuntime

[1] https://searchfox.org/mozilla-esr102/rev/85c472e28639d76c8dcfa9b72fdb50cb7b164af0/js/src/vm/JSScript.cpp#2192,2202-2204

void js::SweepScriptData(JSRuntime* rt) {
...
    if (sharedData->refCount() == 1) {
      sharedData->Release();
      e.removeFront();

So, if this is the case, what needs to be done is to ensure that the JS::Stencil compiled by thread A’s JSContext is not referred by anything when calling JS_DestroyContext on A’s JSContext.

If the lifetime of all threads are almost same (such as, all threads and JSContexts are destroyed at once when shutting down the application), that could be achieved by:

  1. keep the JSContext until all threads finishes running JS code
  2. GC all JSScripts before calling JS_DestroyContext on any thread, in order to remove the reference to bytecode from runtime objects
  3. clear the stencil cache map before calling JS_DestroyContext on any thread, in order to remove the reference to bytecode from JS::Stencils
  4. call JS_DestroyContext on all contexts

Then, now we’re developing more flexible JSContext-free Stencil APIs, which uses singleton script data table, instead of per-JSRuntime table (bug 1773319 and followups), and that will make the situation simpler, where it doesn’t require managing the lifetime of JSContexts like the above solution.
The new APIs are still under development, but hopefully it will be available in the next ESR (according to the release calendar, it will be 115).

So, until the new API arrives, the workaround would be either:

  • (a) cache the encoded data, as you’re doing, which makes the decoded JS::Stencil belongs to the thread’s script data table, and there’s no complex dependencies between multiple JSContexts
  • (b) use the above “call JS_DestroyContext at once” way

Thank you!
I know that in one case our process crashes when thread B is still holding the stencil—that’s when we are hit by the assertion in ~StringBox. I was suspecting that that was bad. Thank you for your confirmation.
However, in the other case, I can’t see that. I need to investigate further. I will follow your suggestions and will let you know.