Python: Cleaning html code

Question

Python: Cleaning html code

Asked 7 years, 4 months ago

Viewed 168 times

1

Using python, what would be the easy way to clear tag parameters coming from microsoft tools?

Initially I’m trying to transform via Beautiful Soup, but I’m open to all suggestions! :D

In this way:

<p style="text-decoration: underline;">Hello <strong>World!</strong></p>
<p style="color: #228;">How are you today?</p>
<table style="width: 300px; text-align: center;" border="1" cellpadding="5">
<tr>
<th width="75"><strong><em>Name</em></strong></th>
<th colspan="2"><span style="font-weight: bold;">Telephone</span></th>
</tr>
<tr>
<td>John</td>
<td><a style="color: #F00; font-weight: bold;" href="tel:0123456785">0123 456 785</a></td>
<td><img width="25" height="30" src="images/check.gif" alt="checked" /></td>
</tr>
</table>

For this form:

<p>Hello <strong>World!</strong></p>
<p>How are you today?</p>
<table border="1" cellpadding="5">
<tr>
<th width="75"><strong><em>Name</em></strong></th>
<th colspan="2"><span>Telephone</span></th>
</tr>
<tr>
<td>John</td>
<td><a href="tel:0123456785">0123 456 785</a></td>
<td><img width="25" height="30" src="images/check.gif" alt="checked" /></td>
</tr>
</table>

Manage to put your attempt with the Beautiful Soup? By the way, basically what you need is to remove the properties style?

– Woss

2018/02/22 at 09:55
Yes. Remove all of them.

– britodfbr

2018/02/22 at 10:53

1 answer

Browser other questions tagged html python python-3.x

You are not signed in. Login or sign up in order to post.

by Pagotti • **3,042** points · Answer 1 · 2018-02-22T11:14:37+00:00

You can use the `re.sub()`

Example to remove attributes style:

import re

html_string = "[coloque aqui seu HTML]"
html_no_style = re.sub(r' style="[^"]+"', '', html_string)

It is important to test with several different HTML files to know if you will not need to improve Regex capture.

Python: Cleaning html code

1 answer

You can use the re.sub()

You can use the `re.sub()`