How to remove span elements from webpage

tvorba.webu.havirov.karvina · June 27, 2018, 12:12am

Hi, I would like to write a simple addon for my personal use, which would remove all spans on the web page. Possibly also to remove ul - li tags. How to do it?

Mittineague · June 27, 2018, 4:45am

It depends. Do you want to change something like:

pre <span class="maybe">text</span> post

to
pre text post
or
pre post ?

http://mdn.beonex.com/en/JavaScript/Guide/Regular_Expressions.html

tvorba.webu.havirov.karvina · June 27, 2018, 9:03am

This is the page I am working with:
https://slovnik.seznam.cz/en-cz/?q=run
https://slovnik.seznam.cz/en-cz/?q=mister

What I need to get is headers and links with a text from webpage…

My goal is to copy the words, but I am interested about the links only or in the headers.

Maybe you have an idea how to extract the words to clipboard or just to remove the redundant text.

Anyway I do not know how to collect the spans and how to process them in a loop. Help needed

Edit:
I will try to follow this:

tvorba.webu.havirov.karvina · June 27, 2018, 3:03pm

I used JQuery

$('body>iframe').remove();
$('body>script').remove();
$('body>div#adFox').remove();
$('body>div#page>div#head').remove();
$('body>div#page>hr').remove();
$('body>div#page>div#container>div#menu').remove();
$('body>div#page>div#skyStopper').remove();
$('body>div#page>div#foot').remove();
$('body>div#page>div#keyboardContent').remove();
$('body>div#page>div#container>div#content>div#results>div#fastTrans>div>iframe').remove();
var rh = $('body>div#page>div#container>div#content>div#results>div.hgroup');
rh.find('p.pron').remove();
rh.find('p.audio').remove();
rh.find('div').remove();

$('ul#fulltext').remove(); // fráze
$('ol.topdef>li>dl>dt>span').find(`[lang="en"]`).remove(); // seznamy frází
$('div.other-meanning').remove(); // Frázová slovesa
$('h3').each(function(){ 
    if ($(this).text() == "Základní fráze"){
        $(this).remove(); }
  });

$('p.longView').remove();
$('body>div>p').remove();
$('span.w').remove();
$("a").find(`[lang="en"]`).remove();
$("span").find(`[lang="en"]`).remove();

Mittineague · June 27, 2018, 6:36pm

I see the post was marked as “solution”, but I respectfully disagree.

The approach is analogous to “removing the farm so only the vegetables remain”.

I think the problem is because you are thinking in terms of “remove” when what you really want is to “get”. True, the end result may be similar, but the approach and efficiency are vastly different.

IMHO, it would be better to determine the closest parent element that contains the spans of interest, get the spans, and then get their text content. I haven’t yet worked with the clipboard API since SDK but AFAIK, even though there have been some security changes it should still work.

I usually work with native JavaScript functions more than I do with jQuery, but something like this psuedocode should work.

let found_text_content = ''; 
$( 'selector_for_closest_parent' ).find( 'span' ).each( function () { 
 found_text_content += $(this).text();  // maybe with a + "\r\n" ?
}

If that gets more than you want you may need to wrap the appending inside conditionals. Once “found_text_content” is what you want it to be, then write it to the clipboard.

tvorba.webu.havirov.karvina · July 10, 2018, 12:05pm

Nice analogy.