Is it possible to cache JS::Stencil objects?

dmitri · March 1, 2023, 2:29am

We have embedded Spidermonkey 102.7 ESR in our product.

We run a pool of JS threads to execute JS code. Each JS thread allocates a single JSContext. We want to cache compiled scripts in memory and we don’t need to persist them across the process runs. Hence, we experimented with a singleton cache object of the type:

std::map<std::string, RefPtr<JS::Stencil>> cache;

When we don’t have a script in the cache we create a stencil with

RefPtr<JS::Stencil> st = JS::CompileGlobalScriptToStencil(_cx, opts, source);

and then create a JS::RootedScript script with

script = JS::InstantiateGlobalStencil(_cx, instantiateOptions, st);

at the same time, we store the stencil in our cache object.
When we have a cached stencil, we take it from the cache and call

JS::InstantiateGlobalStencil(_cx, instantiateOptions, st);

Everything works well. We tested the approach under a heavy load and saw no problems. However, our process crashes during the shutdown. Depending on how we access our JS pool of threads, the crash happens either in

~StringBox at firefox-102.7.0/js/src/vm/SharedImmutableStringsCache.h:265

or in

JSRuntime::destroyRuntime at firefox-102.7.0/js/src/vm/Runtime.cpp:292

The former is in MOZ_RELEASE_ASSERT(refcount == 0) — refcount is 1, the latter is in MOZ_ASSERT(scriptDataTable(lock).empty());

So something, in Spidermonkey code, keeps holding shared objects.

We’ve rewritten our code by adding JS::EncodeStencil and JS::DecodeStencil pair of functions and saving uint8_t * buffer instead of stencils in our cache.

The process stopped crashing.

Does our initial approach have flaws?

As we don’t need a persistent cache we want to avoid extra steps of encoding and decoding stencil. But maybe it is not possible because of the design of the stencils?

Thank you.

arai · March 1, 2023, 5:32pm

Hello.

Thank you for using the new JS::Stencil feature

JS::Stencil holds a reference to the bytecode compiled from JS code, and the bytecode is registered to JSRuntime's “script data table” to de-duplicate same bytecodes across multiple compilations, in order to reduce the memory consumption.

That “script data table” is what you see in the 2nd assertion failure.
The assertion is to verify that all bytecodes are freed before shutting down,
which requires all bytecode no longer being referred from any JS::Stencil or any runtime objects.

The 1st assertion failure is similar thing, where, the assertion is to verify that all "shared string"s, such as source code or filename used during compilation, are freed before shutting down.

So, possible situation is that, there’s JS::Stencil reference alive when shutting down, which makes the JS::Stencil's ref-count non-zero, and which makes the bytecode’s ref-count greater than 1 (stencil’s reference + script data table’s reference), and the script data table doesn’t become empty.

Is your JS::Stencil cache cleared before starting the shutdown?
If not, try clearing the map before shutting down to see if it solves the issue.

Let me know if it doesn’t solve the issue.

dmitri · March 1, 2023, 11:51pm

Thank you for your answer. We were caching compiled scripts with Spidermonkey 45 and before. However, Spidermonkey 45 required creating a cache in every JS thread which led to significant memory consumption as we run many threads. The stencils are very promising.

Yes, the first JS thread that goes down clears the map that holds stencils. The map is a singleton so the call clears stencils created in other JSContexts too. Then the thread calls JS_DestroyContext(_cx) on _cx that belongs to the thread. When other JS threads go down, there are no stencils in the map.

arai · March 2, 2023, 12:44am

Okay, so, the cache is shared across multiple JSContexts, and which means that, a JS::Stencil compiled in thread A’s JSContext is used by thread B’s JSContext, right?

What’s the lifetime model of each thread’s JSContext, and how are multiple JSContext's lifetime overlap?
When a thread is shutting down its JSContext, is there any other thread that is still running scripts, or holds a reference to their JSScript, or perhaps just that JSScript is not yet GC-ed ?

The possible case I can think of is the following:

When thread A compiles script, the script’s bytecode has ref-count 2, which is from:

JS::Stencil referred from the cache map
A’s JSRuntime's script data table

When thread A instantiates the script, the script’s bytecode has ref-count 3:

JS::Stencil referred from the cache map
A’s JSRuntime's script data table
(new) JSScript belongs to A’s JSRuntime

When thread B gets JS::Stencil from cache and instantiate it, the script’s bytecode has ref-count 4:

JS::Stencil referred from the cache map
A’s JSRuntime's script data table
JSScript belongs to A’s JSRuntime
(new) JSScript belongs to B’s JSRuntime

When thread A is shutting down, JSScript in A’s JSRuntime gets GC-ed, but JSScript in B's JSRuntimedoesn't get GC-ed. so, even if the cache map is cleared, the ref-count is still 2, not 1, and that doesn't meet the requirement [1] to remove the item from theJSRuntime`'s script data table, and the table doesn’t become empty, and the assertion fails:

A’s JSRuntime's script data table
JSScript belongs to B’s JSRuntime

[1] https://searchfox.org/mozilla-esr102/rev/85c472e28639d76c8dcfa9b72fdb50cb7b164af0/js/src/vm/JSScript.cpp#2192,2202-2204

void js::SweepScriptData(JSRuntime* rt) {
...
    if (sharedData->refCount() == 1) {
      sharedData->Release();
      e.removeFront();

So, if this is the case, what needs to be done is to ensure that the JS::Stencil compiled by thread A’s JSContext is not referred by anything when calling JS_DestroyContext on A’s JSContext.

If the lifetime of all threads are almost same (such as, all threads and JSContexts are destroyed at once when shutting down the application), that could be achieved by:

keep the JSContext until all threads finishes running JS code
GC all JSScripts before calling JS_DestroyContext on any thread, in order to remove the reference to bytecode from runtime objects
clear the stencil cache map before calling JS_DestroyContext on any thread, in order to remove the reference to bytecode from JS::Stencils
call JS_DestroyContext on all contexts

Then, now we’re developing more flexible JSContext-free Stencil APIs, which uses singleton script data table, instead of per-JSRuntime table (bug 1773319 and followups), and that will make the situation simpler, where it doesn’t require managing the lifetime of JSContexts like the above solution.
The new APIs are still under development, but hopefully it will be available in the next ESR (according to the release calendar, it will be 115).

So, until the new API arrives, the workaround would be either:

(a) cache the encoded data, as you’re doing, which makes the decoded JS::Stencil belongs to the thread’s script data table, and there’s no complex dependencies between multiple JSContexts
(b) use the above “call JS_DestroyContext at once” way

dmitri · March 2, 2023, 4:04am

Thank you!
I know that in one case our process crashes when thread B is still holding the stencil—that’s when we are hit by the assertion in ~StringBox. I was suspecting that that was bad. Thank you for your confirmation.
However, in the other case, I can’t see that. I need to investigate further. I will follow your suggestions and will let you know.

dmitri · February 21, 2024, 9:18am

I’ve got some time to look into the issue further.
I verified the behaviour with firefox-115.7.0 and the changeset: 775392:0981a1f2fb82 (Feb 16 2024) and can see a problem similar to the one I saw before.

I run the test with two threads. The main thread owns a cache of stencils and spawns a child thread. Both threads compile scripts at a random time, populate the cache and execute scripts taken from the cache. I observe a crash if the child thread creates and caches a stencil before the main thread does. The crash happens when the child thread destroys the JSContext object in

JSRuntime::destroyRuntime, MOZ_ASSERT(scriptDataTableHolder().getWithoutLock().empty());

(this is a debug build of Spidermonkey.)

The scriptDataTable is not empty, as stencils (RefPtrJS::Stencil) are kept alive by the main thread’s cache.

Currently, it is possible to compile a script on one thread, cache it and execute it on another. However, the dependency from the JSContext still exists. To be precise, scripts depend on the JSRuntime owned by the JSContext. That dependency complicates the design of a stencil cache shared by threads.

Can the scriptDataTable be moved to the parent JSRuntime, or can the table ownership be offloaded to the caller? Is there a need for the table at all?

arai-b · February 21, 2024, 11:33am

Can you provide the details about how you compile?
Do you use JSContext when compiling in child thread, or JS::FrontendContext ?

The API added by 1773319 is based on JS::FrontendContext, which uses global SharedScriptDataTableHolder instance, which is not associated with JSRuntime, and is shared across all non-main threads.

See testFrontendCompileStencil.cpp for example usage.

dmitri · February 22, 2024, 10:53am

I use JSContext in my test.
I tried reproducing the example from testFrontendCompileStencil.cpp. However, it required includes outside of the distribution and didn’t link in the end with the error:

undefined reference to JS::CompilationStorage::~CompilationStorage()

It is a local text symbol:

$nm libmozjs-124a1.so |grep CompilationStorage|c++filt
00000000024a8c70 t JS::CompilationStorage::~CompilationStorage()
00000000024a8c70 t JS::CompilationStorage::~CompilationStorage()
00000000024a90f0 T JS::CompileGlobalScriptToStencil(js::FrontendContext*, JS::ReadOnlyCompileOptions const&, JS::SourceText<char16_t>&, JS::CompilationStorage&)
...

I built my copy of Spidermonkey following Firefox Contributors’ Quick Reference

Is it an experimental feature, not released yet? Have you introduced FrontendContext for Stencil compilation only? If so, why not using the global SharedScriptDataTable every time you need to compile a stencil?

arai · February 22, 2024, 11:24am

Looks like there’s an issue with visibility.

JS::CompilationStorage would require JS_PUBLIC_API.

Can you test with replacing that line in js/public/experimental/CompileScript.h with the following, and the build the library and then link to it?

struct JS_PUBLIC_API CompilationStorage {

FrontendContext is for operation that’s not tied to JSContext (runtime data and GC), like off-thread compilation, JSON parsing (not yet available in 115), etc.
JSStencil.h and CompileScript.h contains related APIs.

its still marked experimental, but mostly stable in the latest central (except for some issues like the above visibility. Thank you for discovering the issue!).

Then, SharedScriptDataTableHolder is used for sharing bytecode between multiple compilation, to reduce the memory consumption when there are many duplicate bytecode (top-level script or function).

Compilation with JS::FrontendContext uses the global table with lock. So it can be done with off-thread compilation with multiple threads.
The compilation with JSContext uses its own table without lock.
In Firefox, the small files are compiled on main thread with JSContext and large files are compiled off main thread with JS::FrontendContext.

dmitri · February 23, 2024, 7:23am

I compiled Spidermonkey with the JS_PUBLIC_API and my test worked with no crashes. However, when I tried to change my code to use a single FrontendContext, it crashed again. I was hoping to have a single FrontendContext to perform compilation of all scripts.

I guess each thread should use own FrontendContext and I need to allocate a JSContext and a FrontendContext for every thread that compiles and executes scripts. Is that correct?
Something like so:

thread_function(…)
{
JSContext *cx =…;
FrontendContext *fc = …;

stencil = CompileGlobalScriptToStencil(fc, …);
…
JS_ExecuteScript(cx, …);
}

arai · February 23, 2024, 7:50am

Thank you for verifying!
I’ve filed bug 1881682 for the fix.

Can you provide the detail of the crash?
such as backtrace, or if it fails with assertion failure, the assertion’s message, etc.

Basically each thread is supposed to have its own JS::FrontendContext.

JS::FrontendContext can be passed between threads (for example, create in the main thread, pass to off-main-thread and compile, and pass back to the main thread), but it’s not protected with mutex, and cannot be used concurrently. So, if the compilation is supposed to happen concurrently, there should be multiple JS::FrontendContext.

With the current API set, that would be a reasonable solution for the situation.

dmitri · February 23, 2024, 9:13am

Here is the assertion:

Assertion failure: stackLimitThreadId_ == GetTid(), at /home/dmitri/mozilla-unified/js/src/frontend/FrontendContext.cpp:221
#01: ???[mozilla-unified/obj-x86_64-pc-linux-gnu/dist/bin/libmozjs-124a1.so +0x24c717f]
#02: JS::CompileGlobalScriptToStencil(js::FrontendContext, JS::ReadOnlyCompileOptions const&, JS::SourceTextmozilla::Utf8Unit&, JS::CompilationStorage&)[mozilla-unified/obj-x86_64-pc-linux-gnu/dist/bin/libmozjs-124a1.so +0x24a903c]

Here is the stack:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 js::FrontendContext::assertNativeStackLimitThread (this=) at mozilla-unified/js/src/frontend/FrontendContext.cpp:221

221 MOZ_ASSERT (*stackLimitThreadId_ == GetTid ());
[Current thread is 1 (Thread 0x7fe7015ff640 (LWP 1382))]

(gdb) bt
#0 js::FrontendContext::assertNativeStackLimitThread (this=) at mozilla-unified/js/src/frontend/FrontendContext.cpp:221
#1 0x00007fe7042a903c in JS::CompileGlobalScriptToStencil (fc=0x0, options=…, srcBuf=…, compileStorage=…)
at mozilla-unified/js/src/frontend/CompileScript.cpp:170
#3 0x0000000000405783 in CompileJob::CompileScript (this=0x7fff8b04c7a0, cx=0x7fe6fc00a410,
script="\nfor (let i = 0; i < 1; i++) {\n print(in worker thread, it is ${new Date()});\n sleep(100);\n}\n ", filename=0x40bf26 “noname”, linenumber=1) at stencils.cpp:120

(gdb) frame 2
#2 0x0000000000405ab4 in CompileJob::compile_script at stencils.cpp:193
193 JS::CompileGlobalScriptToStencil(_fc, o, source, compileStorage);

(gdb) p _fc
$5 = (JS::FrontendContext *) 0xb72e80

gdb shows fc=0x0 which is incorrect.

arai · February 23, 2024, 9:21am

JS::SetNativeStackQuota ties the JS::FrontendContext to the thread, given the stack quota is calculated from the thread’s stack space.
So, you need to call it in the thread where you want to perform compile.

The documentation is added in the latest version:
https://searchfox.org/mozilla-central/rev/da49863c3d6f34038d00f5ba701b9a2ad9cbadba/js/public/experimental/CompileScript.h#37-42

// Set the size of the native stack that should not be exceed. To disable
// stack size checking pass 0.
//
// WARNING: When the stack size checking is enabled, the JS::FrontendContext
// can be used only in the thread where JS::SetNativeStackQuota is called.
JS_PUBLIC_API void SetNativeStackQuota(JS::FrontendContext* fc,

dmitri · February 23, 2024, 9:46am

Thank you, that helps!
Do you think it will be useful if I add a Stencils example to spidermonkey-embedding-examples?

arai · February 23, 2024, 10:01am

Yeah, it will be great to have Stencil examples there.

bhuisgen · February 23, 2024, 11:16am

I use stencils since two days here. Following the different answers I will use a FrontendContext for the compilation thread. But could you explain why CompileOptions still requires a JSContext ? The dependency between CompileOptions and InstantiationOptions looks strange to me too. Thanks for your help.

arai · February 23, 2024, 11:33am

JS::CompileOptions doesn’t require JSContext.
If you pass JSContext, compilation-related options are copied from JSContext to the JS::CompileOptions.

in ESR115, you can pass ForFrontendContext to construct JS::CompileOptions without JSContext, and then set the compilation-related options manually.
https://searchfox.org/mozilla-esr115/rev/2541acb6bd2b5d944f30b87951a1e9026426a757/js/public/CompileOptions.h#477-479

// Construct CompileOptions for FrontendContext-APIs.
struct ForFrontendContext {};
explicit CompileOptions(const ForFrontendContext&)

example: https://searchfox.org/mozilla-esr115/rev/2541acb6bd2b5d944f30b87951a1e9026426a757/js/src/jsapi-tests/testFrontendCompileStencil.cpp#35

JS::CompileOptions options((JS::CompileOptions::ForFrontendContext()));

in the latest central, you can pass PrefableCompileOptions instead, which encapsulates the same set of options as JSContext.

https://searchfox.org/mozilla-central/rev/da49863c3d6f34038d00f5ba701b9a2ad9cbadba/js/public/CompileOptions.h#598-600

// Construct a CompileOption in the context where JSContext is not available.
// prefableOptions should reflect the compilation-specific user prefs.
explicit CompileOptions(const PrefableCompileOptions& prefableOptions)

https://searchfox.org/mozilla-central/rev/da49863c3d6f34038d00f5ba701b9a2ad9cbadba/js/public/CompileOptions.h#121-123

// Compilation-specific part of JS::ContextOptions which is supposed to be
// configured by user prefs.
class JS_PUBLIC_API PrefableCompileOptions {

example: https://searchfox.org/mozilla-central/rev/da49863c3d6f34038d00f5ba701b9a2ad9cbadba/js/src/jsapi-tests/testFrontendCompileStencil.cpp#35-36

JS::PrefableCompileOptions prefableOptions;
JS::CompileOptions options(prefableOptions);

Then, JS::InstantiateOptions is a subset of JS::CompileOptions which is used when instantiating the stencil to JSScript.
If you have the same data for both compilation and instantiation, you can create JS::CompileOptions and convert it to JS::InstantiateOptions.
If you don’t keep the entire compile option until you instantiate, you need to keep the JS::InstantiateOptions's fields (hideScriptFromDebugger etc) and directly create it and use.

dmitri · February 29, 2024, 10:19am

Hi Arai,

I wrote an example of compiling and caching stencils and integrated it with spidermonkey embedding examples.

However, it wont compile with the current Spidermonkey code because of the issue bug 1881682.

Also, I don’t see how to report JS compilation errors (JS::CompileGlobalScriptToStencil) as FrontendContext.h is not available for an external embedding and, as a result, FrontendContext and FrontendErrors are opaque classes–at least in the esr115.8.

So my example is not useful without mods to Spidermonkey.

Any suggestions on should I proceed and how to proceed?

arai · February 29, 2024, 11:15am

For the bug, if you’re willing to post a patch, I’m happy to mentor and review.
This document explains how to setup the environment and submit a patch: https://firefox-source-docs.mozilla.org/setup/index.html

For error handling, the latest version has the following API to handle errors:

JS::HadFrontendErrors to check if there was error
JS::ConvertFrontendErrorsToRuntimeErrors to report FrontendContext error to JSContext
JS::GetFrontendErrorReport to get raw error report
JS::HadFrontendOverRecursed, JS::HadFrontendOutOfMemory, and JS::HadFrontendAllocationOverflow for checking special errors which don’t have error report
JS::ClearFrontendErrors for resetting the error state
JS::GetFrontendWarningCount, and JS::GetFrontendWarningAt to get raw error report for warning

So, in order to handle errors as well, the example will be for the next esr (128).