Get HTML of remote page with JS

Yes and yes. I typed the first part without my glasses and copied the second one. For reference I corrected it above.

Regarding host permission:
Yes, one does indeed need it (<all_urls> is the easiest way to test it). Here is why:

Is there answer by your link or I read bad? So what should I do? I saw extensions somehow did get remote HTML.

The two bullet points after “Here is why” just explain why you need a host permission.

I’m not sure what you mean wit the rest of your comment:

I saw extensions somehow did get remote HTML.

So it works? Great!

I meant somewhose other’s extension did it: Group Speed Dial can get page title and take a screenshot of it by user’s url. So it can be done, but question is how.

The code to load the source of a page and get its title and body (as DOM element) from that is in my first comment.

So far, you didn’t say that you want a screenshot of the rendered page. That is only possible to get from open/loaded pages. I am quite sure that the linked extension just waits for the page to be loaded by the user and grabs the screenshot then.

An alternative would be to render the pages on a server (this can work with puppeteer).

With tab hiding (experimental in Firefox) you may also be able to just load the desired url in a hidden tab and take a screenshot there. Loading hidden tabs may have unforeseen consequences, though.

I found lib that generates screenshot by DOM element so discussion is relevant.
I didn’t hear about hidden tabs, it may be interesting. As for the linked extension, there are two options to take a screenshot: just take and take via visiting.

I will try a trick with hidden tab a bit later. My first idea is to read all needed info with content script and send it to main script (wondering how to). Or it can be done easier?

It really depends on what exactly you intend to achieve. If the page is already loaded, https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/tabs/captureTab seems the most straight-forward solution.

My first idea is to read all needed info with content script and send it to main script (wondering how to).

I don’t know what “needed info” you refer to and how any information (except for the entire DOM serialized with evaluated inline styles) could ever let you render an external page in a background script.

Document title and body as I said

Document title is pretty clear, it’s a string, but “document body” in what form?

In form that can be rendered to image. DOM element is suitable at the moment unless my idea about getting it from website and sending from content script to script inserted into my own HTML page is too hard to implement.

Ok. If your goal is to render arbitrary page bodies in the background the same way they are / would be rendered on the webpage, then that can’t be done (or is very difficult). The reason is mostly that modern web pages do not only consist of HTML. When you fetch or serialize the body element as a string, you are missing information. And reconstructing that in general, without actually executing and rendering the entire page, it very far from trivial.

So (as I said):
You need to take (maybe partial) screenshots of the actual running page. As I said, that can’t be done in the background. You can either do it in a browser tab (maybe already open, maybe hidden) or on a server.

Sad to know. Your method is promising though. I’ll try it one of these days and inform about successes.

Thanks, it works and it is much easier than I imagined:

browser.tabs.create({url: "https://developer.mozilla.org/", active: false}).then(function (tab) {
    console.log("Tab:", tab);
    setTimeout(function () {
        browser.tabs.captureTab(tab.id).then(function(base64img) {
            console.log("Title:", tab.title); 
            console.log("Favicon:", tab.favIconUrl);
            console.log("Base64img", base64img);
            browser.tabs.remove(tab.id);
        });
    }, 3000); //give page some time to load itself
});

But I wonder why tab.title is just url and tab.favIconUrl is undefined. Extension has tabs and <all_urls> permissions.

Probably because you’re not actually waiting for the tab to load and instead just wait an arbitrary number of seconds.

True, I am, but it’s enough to load page completely and take a good screenshot.
Btw I don’t see anything like tabs.onLoaded event in tabs API. What is the good way?

Not really what I’m looking for. It was fired 5 times (or less and threw Message manager disconnected if tab was closed) but I need fire it only once when page loaded.

You can check what changed, which is in your case the tab status that should change.

1 Like

Oh it even could work before, I just missed that tab info have to be updated with tabs.get(). Timer on 3 seconds was a temp placeholder anyway so thx.

In your code, you are waiting to capture, but you are still printing the old tab object. That is not being updated. Every call to the tabs API gives you a new copy. If you need a fresh one, use tab = await tabs.get(tab.id).