Get HTML of remote page with JS

How can I get HTML content of remote page by given url? In particular I need it’s title and whole body.

const html = (await (await fetch(url)).text()); // html as text
const doc = new DOMParser().parseFromString(html, 'text/html');
doc.title; doc.body;

You probably meant .text()

Also view is the global scope, I assume (i.e. window).

I tried this and browser said request has been blocked because CORS herader is missing. And then

TypeError: NetworkError when attempting to fetch resource.

(Because fetch was failed I guess?)

You do need a host permission for the page in most cases to circumvent CORS restrictions, yes.

Yes and yes. I typed the first part without my glasses and copied the second one. For reference I corrected it above.

Regarding host permission:
Yes, one does indeed need it (<all_urls> is the easiest way to test it). Here is why:

Is there answer by your link or I read bad? So what should I do? I saw extensions somehow did get remote HTML.

The two bullet points after “Here is why” just explain why you need a host permission.

I’m not sure what you mean wit the rest of your comment:

I saw extensions somehow did get remote HTML.

So it works? Great!

I meant somewhose other’s extension did it: Group Speed Dial can get page title and take a screenshot of it by user’s url. So it can be done, but question is how.

The code to load the source of a page and get its title and body (as DOM element) from that is in my first comment.

So far, you didn’t say that you want a screenshot of the rendered page. That is only possible to get from open/loaded pages. I am quite sure that the linked extension just waits for the page to be loaded by the user and grabs the screenshot then.

An alternative would be to render the pages on a server (this can work with puppeteer).

With tab hiding (experimental in Firefox) you may also be able to just load the desired url in a hidden tab and take a screenshot there. Loading hidden tabs may have unforeseen consequences, though.

I found lib that generates screenshot by DOM element so discussion is relevant.
I didn’t hear about hidden tabs, it may be interesting. As for the linked extension, there are two options to take a screenshot: just take and take via visiting.

I will try a trick with hidden tab a bit later. My first idea is to read all needed info with content script and send it to main script (wondering how to). Or it can be done easier?

It really depends on what exactly you intend to achieve. If the page is already loaded, https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/tabs/captureTab seems the most straight-forward solution.

My first idea is to read all needed info with content script and send it to main script (wondering how to).

I don’t know what “needed info” you refer to and how any information (except for the entire DOM serialized with evaluated inline styles) could ever let you render an external page in a background script.

Document title and body as I said

Document title is pretty clear, it’s a string, but “document body” in what form?

In form that can be rendered to image. DOM element is suitable at the moment unless my idea about getting it from website and sending from content script to script inserted into my own HTML page is too hard to implement.

Ok. If your goal is to render arbitrary page bodies in the background the same way they are / would be rendered on the webpage, then that can’t be done (or is very difficult). The reason is mostly that modern web pages do not only consist of HTML. When you fetch or serialize the body element as a string, you are missing information. And reconstructing that in general, without actually executing and rendering the entire page, it very far from trivial.

So (as I said):
You need to take (maybe partial) screenshots of the actual running page. As I said, that can’t be done in the background. You can either do it in a browser tab (maybe already open, maybe hidden) or on a server.

Sad to know. Your method is promising though. I’ll try it one of these days and inform about successes.

Thanks, it works and it is much easier than I imagined:

browser.tabs.create({url: "https://developer.mozilla.org/", active: false}).then(function (tab) {
    console.log("Tab:", tab);
    setTimeout(function () {
        browser.tabs.captureTab(tab.id).then(function(base64img) {
            console.log("Title:", tab.title); 
            console.log("Favicon:", tab.favIconUrl);
            console.log("Base64img", base64img);
            browser.tabs.remove(tab.id);
        });
    }, 3000); //give page some time to load itself
});

But I wonder why tab.title is just url and tab.favIconUrl is undefined. Extension has tabs and <all_urls> permissions.

Probably because you’re not actually waiting for the tab to load and instead just wait an arbitrary number of seconds.

True, I am, but it’s enough to load page completely and take a good screenshot.
Btw I don’t see anything like tabs.onLoaded event in tabs API. What is the good way?