Remove XML file line using Python and create TXT file with result

Question

Remove XML file line using Python and create TXT file with result

Asked 4 years, 7 months ago

Viewed 234 times

-1

I have several XML files inside a folder that is called FILES and it is local in Windows.

All XML files follow the same structure as below:

<catalog>
<product description="Cardigan Sweater" product_image="cardigan.jpg">
<catalog_item gender="Men's">
***<item_number>QWZ5671</item_number>***
<price>39.95</price>
<size description="Medium">
<color_swatch image="red_cardigan.jpg">Red</color_swatch>
<color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
</size>
<size description="Large">

I would like to remove from this XML the information:

<price>39.95</price>

That is, the 39.95 between <price> and </price>

And create another file in CSV or TXT format. This for all files in this folder in automated way.

I tried to create the following code:

search = 'print'

def check():
    datafile = open('C:\\ARQUIVOS\example.xml')
    for line in datafile:
        if search in line:
            found = </price>
            print(line)
            break
        else:
            found = price
    return found


check()

I couldn’t get past it, and I don’t know how to finish it. Could someone please help me? Remembering that they are for various xml inside a folder!

1

There will only be one tag <price> per file or each file may have more of that tag? And what is the output file structure?

– Augusto Vasques

2020/12/08 at 19:35

1 answer

Browser other questions tagged python xml windows

You are not signed in. Login or sign up in order to post.

by Paulo Marques • **3,739** points · Answer 1 · 2020-12-08T20:14:10+00:00

Solution using Beautifulsoup

texto = """
        <catalog>
        <product description="Cardigan Sweater" product_image="cardigan.jpg">
        <catalog_item gender="Men's">
        ***<item_number>QWZ5671</item_number>***
        <price>39.95</price>
        <size description="Medium">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
        </size>
        <size description="Large">
        """

Load the Beautifulsoup

from bs4 import BeautifulSoup

Create the "soup"

soup = BeautifulSoup(texto, 'xml')

Look for what you want

>>> preco = soup.find("price")
>>> preco
<price>39.95</price>

If you only want the value, use:

>>> preco = soup.find("price").text
>>> preco
'39.95'

Another example

Imagining that you have a large structure with various prices on it, as below:

<items>
    <item>
        <nome>Carro</nome>
        <preco>55000.00</preco>
    </item>
    <item>
        <nome>Moto</nome>
        <preco>25000.00</preco>
    </item>
</items>

To get all the prices at once, there is the findAll

>>> soup = BeautifulSoup(texto, 'xml')
>>> precos = soup.findAll("preco")
>>> precos
[<preco>55000.00</preco>, <preco>25000.00</preco>]

Then just iterate over the list

>>> for preco in precos:
...     print(float(preco.text))

Note: I changed the string value to float, but if the goal is to save to disk, this is not necessary.

The result will be:

55000.0
25000.0

I hope it helps