How to remove span elements from webpage


(h_four_ever) #1

Hi, I would like to write a simple addon for my personal use, which would remove all spans on the web page. Possibly also to remove ul - li tags. How to do it?


(Mittineague) #2

It depends. Do you want to change something like:

pre <span class="maybe">text</span> post

to
pre text post
or
pre post ?

http://mdn.beonex.com/en/JavaScript/Guide/Regular_Expressions.html


(h_four_ever) #3

This is the page I am working with:
https://slovnik.seznam.cz/en-cz/?q=run
https://slovnik.seznam.cz/en-cz/?q=mister

What I need to get is headers and links with a text from webpage…

My goal is to copy the words, but I am interested about the links only or in the headers.

Maybe you have an idea how to extract the words to clipboard or just to remove the redundant text.

Anyway I do not know how to collect the spans and how to process them in a loop. Help needed :slight_smile:

Edit:
I will try to follow this:


(h_four_ever) #4

I used JQuery

$('body>iframe').remove();
$('body>script').remove();
$('body>div#adFox').remove();
$('body>div#page>div#head').remove();
$('body>div#page>hr').remove();
$('body>div#page>div#container>div#menu').remove();
$('body>div#page>div#skyStopper').remove();
$('body>div#page>div#foot').remove();
$('body>div#page>div#keyboardContent').remove();
$('body>div#page>div#container>div#content>div#results>div#fastTrans>div>iframe').remove();
var rh = $('body>div#page>div#container>div#content>div#results>div.hgroup');
rh.find('p.pron').remove();
rh.find('p.audio').remove();
rh.find('div').remove();

$('ul#fulltext').remove(); // fráze
$('ol.topdef>li>dl>dt>span').find(`[lang="en"]`).remove(); // seznamy frází
$('div.other-meanning').remove(); // Frázová slovesa
$('h3').each(function(){ 
    if ($(this).text() == "Základní fráze"){
        $(this).remove(); }
  });

$('p.longView').remove();
$('body>div>p').remove();
$('span.w').remove();
$("a").find(`[lang="en"]`).remove();
$("span").find(`[lang="en"]`).remove();

(Mittineague) #5

I see the post was marked as “solution”, but I respectfully disagree.

The approach is analogous to “removing the farm so only the vegetables remain”.

I think the problem is because you are thinking in terms of “remove” when what you really want is to “get”. True, the end result may be similar, but the approach and efficiency are vastly different.

IMHO, it would be better to determine the closest parent element that contains the spans of interest, get the spans, and then get their text content. I haven’t yet worked with the clipboard API since SDK but AFAIK, even though there have been some security changes it should still work.

I usually work with native JavaScript functions more than I do with jQuery, but something like this psuedocode should work.

let found_text_content = ''; 
$( 'selector_for_closest_parent' ).find( 'span' ).each( function () { 
 found_text_content += $(this).text();  // maybe with a + "\r\n" ?
} 

If that gets more than you want you may need to wrap the appending inside conditionals. Once “found_text_content” is what you want it to be, then write it to the clipboard.


(h_four_ever) #6

:grinning: Nice analogy.