Use cases for self hosting

gfodor · July 6, 2018, 5:40pm

This thread may turn into more of a train-of-thought rambling but I am going to start by writing down the requirements we have that we’d like to satisfy as we reconfigure the way we deal with JS/reticulum integration/packaging/deploys.

First, the things we currently have in place that should continue to be true on the other side of these changes:

The current flow and URL structure: hubs.mozilla.com landing page, slugged URLs for rooms
Smoke testing in prod before any code changes are rolled out to the public
META tag inlining at server render time for Slack unfurls, etc.
No-configure experience for running yarn start locally and hitting home.html, with the full flow working properly
Proper workflow for running a copy of hubs on your PC and hitting it from another IP address on the LAN, while using our dev servers
Ability to transactionally deploy a new version of the client from slack, bust the CDN properly, etc
All assets (other than the root document) on hubs.mozilla.com are CDN cached, gzipped, etc
Ideally minimal complexity in the CI step for doing a deploy, to reduce failure rate. Right now it’s eminently simple: it just does a habitat package promotion
CSP and CORS headers that allow least privilege and limit the blast radius of potential exploits against users.
The habitat system for running services in prod (habitat supervisor, habitat config deploys, habitat packages, etc.) I realize we may be revisiting some of this in time but I think it is out of scope for this body of work.

Things we want that are not currently possible or easy to do:

I want to take my own locally hacked copy of Hubs, and push it up on the internet somewhere and have it work. The full room creation flow should work, and it should not require me running any servers of my own.
If I do want to decouple myself from Mozilla’s servers, it should be possible for me to do so in a variety of ways:
- I should be able to do a one-click deploy when possible on common hosting providers like AWS, and then be able to point my version of hubs to that IP or domain
- I should be able to run a docker image on my own VPC similarly
- I should be able to on my local computer run a small number of commands to get the “full stack” working against my local copy. If my local computer is accessible by the internet, I should be able to use it as a server.
We’d like to make it so its not necessary to do a full reticulum build + deploy in order to smoke test + roll out an update to the client, to reduce the time and complexity of client deploys, and also minimize downtime since reticulum deploys currently create a short outage. This is particularly useful because the client changes more rapidly than reticulum.
If I have my own GLTF scene I’d like to use for environments that is already accessible on the internet, it should be straightforward to incorporate it for the cases above.
Stretch goal: I should be able to easily stand up similar infrastructure on AWS as we do at Mozilla using our terraform scripts, to enable a horizontally scalable Hubs backend environment

brianpeiris1 · July 6, 2018, 5:51pm

Are you referring to the GitHub Pages scenario, where a dev has just modified the client, hosts it statically, and is using our backend for everything else?

As addendum to this scenario, I’d add that API versioning or guarantees are out of scope (for now). i.e. If you’ve hosted your own client pointing at our servers, you understand that it may break at any time and it is up to you to keep it up to date if you wish to.

Also possibly out of scope is our ability to control who can connect to our Janus server. Or perhaps we need to have control of this if we want to endorse the self-hosted client scenario. This applies to farspark now as well I suppose.

gfodor · July 6, 2018, 5:54pm

Yep I am referring to the GitHub pages scenario. I agree with your other points – I think as long as the “public server” infrastructure is isolated sufficiently from the Mozilla sanctioned hubs.mozilla.com infrastructure we can, in the worst case, shut things down if we need to without affecting Hubs users’ experiences. (This includes not just boxing out servers, but also generating separate 3rd party API keys, credentials, etc.)

gfodor · July 7, 2018, 12:49am

Ok here’s a rough plan:

Building Hubs using yarn will now include a manifest file that includes a list of all the files that were generated.
The deploy process for Hubs proper will now have a slack-driven “client push”, which will take HEAD, yarn build it, and publish everything to S3.
Instead of packing the JS into reticulum as part of the package, reticulum is responsible for fetching and serving latest version of the client from the web. So, Reticulum can be optionally configured with a URL pointing to a Hubs build manifest file on the Internet, and if that config is set, Reticulum enables the page controller and static asset proxying and the invariant is that whatever files are in that manifest are what it will serve.
URLs accessed under assets/ as well as the root page controller will be fronted by an in-memory cache for the files, and will use the manifest to resolve them. Cachex cache warmers can be used to ensure users are never exposed to cache misses from client deploys (except on reticulum startup. This can eventually be resolved by dumping and reloading the cache on shutdown.) Reticulum will expose a tiny API that relies upon a shared secret to force a warm of the cache for a new version before flipping over to it, which will be hit by CI during a client deploy.
CSP and CORS headers will be broken out into configs for reticulum and farspark. For now, we will repurpose dev to be a “public” server that people will implicitly get for free for Hubs forks. We’ll want to ensure there are no shared secrets between dev and prod. If things become problematic we can split out a new environment for the public “sandbox”. Unlike prod, dev will have an open CORS headers and CSP for both reticulum and farspark, and will not be configured with a Hubs manifest file (so the page controller and static assets wil not be served from dev.)
Once that is done, we can write docs on how to publish your Hubs fork to the web. It should, at that point, “Just work.” You will need to hit /home.html to get the landing page, it will use the dev reticulum API, and you’ll be redirected to /hub.html?hub_sid=abcd. (This already works in dev as-is, we shouldn’t need to make any changes.)
Once this is done, the Habitat packages will need to be cleaned up so the full stack can be run with a few hab svc load commands. The biggest missing piece will be to have some basic tooling to land smart default user.toml files and documentation on how to edit those files. We should assume most users will be running things on a single box and will not use the habitat supervisor ring for configuration, so user.toml focused documentation seems like the right move. Advanced users who want to learn habitat will be able to easily figure out how to run the services in service groups.
Once the habitat packages are cleaned up, we can explore the potential paths for easily creating VM/Docker images for running hosted copies of the service stack. Due to the fact that there isn’t much valuable durable state, it seems OK to just run everything, including the database, on a single node. If the node dies, the database dies, but the database doesn’t include much useful information anyway. If you want to run your own copy of Hubs, it boils down to getting the service stack running, deploying your client code somewhere on the internet, and pointing reticulum to the manifest file for your client. We’ll also need to provide instructions on how to bust the cache, or we can have a default short TTL or something.

brianpeiris1 · July 7, 2018, 12:48am

This sounds wonderful!

I guess the only downside is that we’d have to shutdown dev- if we notice abuse, which would disrupt our own development?

gfodor · July 7, 2018, 12:53am

Yep, it’s simple enough to fork it out to a separate environment and is a fixed cost, but doing it adds a time tax to every subsequent ops change, so I’d rather do it if/when we need to instead of pre-emptively.

mquander · July 11, 2018, 11:11pm

This all sounds nice.

You mentioned that the whole room creation stuff should work for people who are just running the client pointing at our infrastructure. I’m not 100% sure what I want, but my dream is that third parties don’t have to use our splash page and UI (which is largely optimized to be good marketing) to create rooms.

It seems to me like the best design would be something like having the React website and VR client (index.html vs. hub.html) as 98% separate codebases, with all of the actual work going on to create and configure rooms being part of the VR client, and the React website just invoking it with some query parameters that suggest what parameters you chose for your room.

siparisimnet · July 13, 2018, 7:50am

Thank you for sharing!