0
I have an html code, well polluted with Style in almost all tags, plus tags <font><span>
unnecessary.
How can I use beautifulsoup, to remove only attrs=style in <p>
and the tags <font><spam>
without removing its contents, and preserving the other parents and Children tags?
I need to browse all html code elements automatically and still can’t, but I’ve already managed to remove an element with the encoding below:
print(soup.span)
print(soup.span.name)
del soup.span['style']
Follows the html:
<html xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
<style>
<!--
h1
{margin-top:6.0pt;
margin-right:0cm;
margin-bottom:0cm;
margin-left:0cm;
margin-bottom:.0001pt;
text-align:justify;
text-indent:76.85pt;
page-break-after:avoid;
tab-stops:99.75pt;
font-size:14.0pt;
font-family:"Times New Roman";
}
table.MsoNormalTable
{mso-style-parent:"";
font-size:10.0pt;
font-family:"Times New Roman"}
div.Section1
{page:Section1;}
h4
{margin-bottom:.0001pt;
text-align:center;
page-break-after:avoid;
tab-stops:70.9pt;
font-size:12.0pt;
font-family:"Times New Roman";
font-weight:bold; margin-left:0cm; margin-right:0cm; margin-top:0cm}
h2
{margin-bottom:.0001pt;
text-align:center;
page-break-after:avoid;
font-size:10.0pt;
font-family:Arial;
font-weight:bold;
margin-left:0cm; margin-right:0cm; margin-top:0cm}
h3
{margin-bottom:.0001pt;
text-align:center;
text-indent:17.85pt;
page-break-after:avoid;
font-size:10.0pt;
font-family:Arial;
font-weight:bold; margin-left:0cm; margin-right:0cm; margin-top:0cm}
div.Section2
{page:Section2;}
div.Section3
{page:Section3;}
h6
{margin-bottom:.0001pt;
text-align:center;
page-break-after:avoid;
font-size:12.0pt;
font-family:"CG Times";
margin-left:0pt; margin-right:0pt; margin-top:0pt}
h5
{margin-bottom:.0001pt;
page-break-after:avoid;
font-size:12.0pt;
font-family:"Times New Roman";
font-weight:normal; margin-left:0cm; margin-right:0cm; margin-top:0cm}
span.msoDel
{mso-style-name:"";
text-decoration:line-through;
color:red}
span.msoIns
{mso-style-name:"";
text-decoration:underline;
text-underline:single}
span.Ttulo1Car
{font-family:Arial;
font-weight:bold;
}
span.CaracteresdeNotadeRodap
{vertical-align:super}
span.WW-Refdenotaderodap1234
{mso-style-parent:"";
vertical-align:super}
span.msoins0
{}
span.MsoFootnoteReference
{vertical-align:super;}
div.Section4
{page:Section4;}
span.MsoCommentReference
{}
span.Hiperlink
{mso-style-parent:"";
color:blue;
text-decoration:underline;
text-underline:single}
span.txtterm1
{font-family:"Times New Roman";
color:black;
font-weight:bold}
span.Absatz-Standardschriftart
{}
span.apple-style-span
{}
span.apple-converted-space
{}
div.Section5
{page:Section5;}
div.Section6
{page:Section6;}
div.Section7
{page:Section7;}
div.Section8
{page:Section8;}
div.section1
{margin-right:0cm;
margin-left:0cm;
font-size:8.0pt;
font-family:"Arial Unicode MS";
}
span.msoChangeProp
{mso-style-name:"";
color:black}
span.texto
{}
span.highlightedsearchterm
{}
span.MsoHyperlink
{color:blue;
text-decoration:underline;
text-underline:single;}
div.Section9
{page:Section9;}
span.MsoHyperlinkFollowed
{mso-style-parent:"";
color:purple;
text-decoration:underline;
text-underline:single;}
span.WW-Fontepargpadro
{}
span.Fontepargpadro2
{}
span.Internetlink
{mso-style-parent:"";
color:navy;
text-underline:#000000;
text-decoration:underline;
text-underline:single}
span.Refdecomentrio1
{}
span.Refdecomentrio2
{}
span.font0020style31char
{font-family:"Times New Roman","serif";
}
span.style10char
{font-family:"Times New Roman","serif";
}
span.centralizadochar
{font-family:"Times New Roman","serif";
}
span.texto0020normalchar
{font-family:"Times New Roman","serif";
}
span.normalchar
{font-family:"Times New Roman","serif";
}
span.estilochar
{font-family:"Times New Roman","serif";
}
span.style21char
{font-family:"Times New Roman","serif";
}
span.style18char
{font-family:"Times New Roman","serif";
}
span.style15char
{font-family:"Times New Roman","serif";
}
span.mw-headline
{}
span.hlhilite
{}
span.field1
{mso-style-parent:"";
font-family:"Verdana","sans-serif";
color:black;
border:1.0pt solid windowtext;
padding:0cm;
background:white}
span.hps
{font-family:"Times New Roman","serif";
}
span.themebody
{font-family:"Times New Roman","serif";
}
span.FootnoteSymbol
{font-family:"Times New Roman","serif";
position:relative;
top:0pt;
vertical-align:super}
span.nfase1
{mso-style-parent:"";
font-family:"Lucida Grande","serif";
color:black;
}
span.Forte1
{mso-style-parent:"";
font-family:"Lucida Grande","serif";
color:black;
font-weight:bold}
span.texto8
{font-family:"Times New Roman","serif";
}
span.bumpedfont15
{mso-style-parent:"";
color:black;
}
span.Heading4Char
{mso-style-parent:"";
font-family:"Verdana","sans-serif";
font-weight:bold}
span.atn
{mso-style-parent:"";
font-family:"Times New Roman","serif";
}
span.longtext
{mso-style-parent:"";
font-family:"Times New Roman","serif";
}
span.linkdestaque
{}
span.Ttulo1Char
{font-family:"Cambria","serif";
font-weight:bold}
span.Fontepargpadro1
{}
span.MsoBookTitle
{font-variant:small-caps;
letter-spacing:.25pt;
font-weight:bold;
}
span.doltraduztrad
{font-family:"Times New Roman","serif";
}
span.MquinadeescribirHTML
{mso-style-parent:"";
font-family:"Courier New";
}
table.MsoTableGrid
{border:1.0pt solid windowtext;
font-size:10.0pt;
font-family:"Calibri","sans-serif";
}
span.MsoSubtleEmphasis
{font-family:"Times New Roman","serif";
color:#404040;
font-style:italic}
span.nfaseSutil1
{mso-style-parent:"";
color:gray;
font-style:italic;
}
div.WordSection1
{page:WordSection1;}
span.CharacterStyle4
{}
span.highlight
{}
span.StrongEmphasis
{mso-style-parent:"";
font-weight:bold;
}
span.scayt-misspell-word
{mso-style-parent:"";
font-family:"Times New Roman","serif";
}
div.WordSection2
{page:WordSection2;}
div.WordSection3
{page:WordSection3;}
div.WordSection4
{page:WordSection4;}
div.WordSection5
{page:WordSection5;}
div.WordSection6
{page:WordSection6;}
div.WordSection7
{page:WordSection7;}
div.WordSection8
{page:WordSection8;}
div.WordSection9
{page:WordSection9;}
-->
</style>
<title>D9255</title>
</head>
<body id='view' style="text-align: center">
<div align="center"><center>
<table border="0" cellpadding="0" cellspacing="0" width="70%">
<tr>
<td width="14%">
<p align="center" style="margin-top: 13px; margin-bottom: 13px">
<font SIZE="2" face="Arial">
<img SRC="../../../_Ato2007-2010/2008/Decreto/Image4.gif" WIDTH="76" HEIGHT="82"></font></td>
<td width="86%">
<p align="center" style="margin-top: 13px; margin-bottom: 13px"><font color="#808000" face="Arial"><strong><big><big>
Presidência da República</big></big><br>
<big>Casa Civil<br>
</big>Subchefia para Assuntos Jurídicos</strong></font></td>
</tr>
</table>
</center></div>
<blockquote>
<p class='epigrafe' style="margin-top: 20px; margin-bottom: 20px">
<font
face="Arial" color="#000080"><small><strong>
<a href="http://legislacao.planalto.gov.br/legisla/legislacao.nsf/Viw_Identificacao/DEC%209.255-2017?OpenDocument">
<font color="#000080">
DECRETO Nº 9.255, DE 29 DE DEZEMBRO DE 2017</font></a></strong></small></font></p>
</blockquote>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td width="51%">
<font face="Arial" size="2"><span style="color: black"><a href="#art2">
Vigência</a></span></font></td>
<td width="49%">
<p align="justify">
<font FACE="Arial" SIZE="2">
<span style="color: #800000">Regulamenta a <a href="../../2015/Lei/L13152.htm">
<font color="#800000">Lei n<s>º</s> 13.152, de 29 de julho
de 2015</font></a>, que dispõe sobre o valor do salário mínimo e a sua política de
valorização de longo prazo.</span></font></td>
</tr>
</table>
<font FACE="Arial" SIZE="2">
<p style="margin-bottom:0cm;margin-bottom:.0001pt;text-align:
justify;text-indent:1.0cm"><b><span style="color: black">O PRESIDENTE DA
REPÚBLICA</span></b><span style="color: black">, no uso da atribuição que lhe
confere o art. 84, <b>caput</b>, inciso IV, da Constituição, e tendo em vista o
disposto no art. 2<s>º</s> da Lei n<s>º</s> 13.152, de 29 de julho de 2015,</span></p>
<p style="margin-bottom:0cm;margin-bottom:.0001pt;text-align:
justify;text-indent:1.0cm"><span style="color: black"> </span><b><span style="color: black">DECRETA</span></b><span style="color: black">:</span></p>
<p style="margin-bottom:0cm;margin-bottom:.0001pt;text-align:
justify;text-indent:1.0cm"><span style="color: black"> <a name="art1"></a>Art. 1<s>º</s> A partir
de 1<s>º</s> de janeiro de 2018, o salário mínimo será de R$ 954,00 (novecentos
e cinquenta e quatro reais).</span></p>
<p style="margin-bottom:0cm;margin-bottom:.0001pt;text-align:
justify;text-indent:1.0cm"><span style="color: black"> Parágrafo único. Em
virtude do disposto no <b>caput</b>, o valor diário do salário mínimo
corresponderá a R$ 31,80 (trinta e um reais e oitenta centavos) e o valor
horário, a R$ 4,34 (quatro reais e trinta e quatro centavos).</span></p>
<p style="margin-bottom:0cm;margin-bottom:.0001pt;text-align:
justify;text-indent:1.0cm"><span style="color: black"> <a name="art2"></a>Art. 2<s>º</s> Este
Decreto entra em vigor em 1<s>º</s> de janeiro de 2018.</span></p>
<p style="margin-bottom:0cm;margin-bottom:.0001pt;text-align:
justify;text-indent:1.0cm"><span style="color: black"> Brasília, 29 de dezembro
de 2017; 196<s>º</s> da Independência e 129<s>º</s> da República.</span></p>
<p style="margin-bottom:0cm;margin-bottom:.0001pt;text-align:
justify"><span style="color: black">MICHEL TEMER<br>
</span><i><span style="color: black">Eduardo Refinetti Guardia<br>
Esteves Pedro Colnago Junior<br>
Helton Yomura</span></i></p>
<p style="margin-bottom:0cm;margin-bottom:.0001pt;text-align:
justify"><font color="#FF0000">Este texto não substitui o publicado no DOU de
29.12.2017 - Edição extra "D"</font></p>
<p><font color="#FF0000" face="Arial" size="2">*</font></p>
</font>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
<p> </p>
</body>
</html>