0
I have several HTML files that I need to capture the data inside the tables, to launch in the database, but I’m not able to navigate in the html tree to find the tags that are cells, the html is this:
<div class="details">
<div class="title-table"><h2> BEAUNE</h2>
<div class="table-responsive">
<div class="table-towers">
<div id="table472dc5e9b46304cf95865f7db6c459aa" class="collapse in table-content">
<div class="table-towers">
<div class="table-row">
<div class="table-cell build_type">Apartamento</div>
<div class="table-cell area_useful">220m²</div>
<div class="table-cell rooms">3</div>
<div class="table-cell garage">4</div>
<div class="table-cell bird_estimate_average">R$ 2.816.344,33*
<p><small>(R$ 2.393.892,68 a R$ 3.238.795,98)</small></p>
</div>
<div class="table-row">
<div class="table-cell build_type">Cobertura</div>
<div class="table-cell area_useful">396m²</div>
<div class="table-cell rooms">3</div>
<div class="table-cell garage">5</div>
<div class="table-cell bird_estimate_average">R$ 5.069.419,80*
<p><small>(R$ 4.309.006,83 a R$ 5.829.832,77)</small></p>
</div>
<div class="title-table"><h2>BERGERAC</h2>
<div class="table-responsive">
<div class="table-towers">
<div id="table0b60c9a0a450b921186c91102da447d9" class="collapse table-content">
<div class="table-towers">
<div class="table-row">
<div class="table-cell build_type">Apartamento</div>
<div class="table-cell area_useful">220m²</div>
<div class="table-cell rooms">3</div>
<div class="table-cell garage">4</div>
<div class="table-cell bird_estimate_average">R$ 2.816.344,33*
<p><small>(R$ 2.393.892,68 a R$ 3.238.795,98)</small></p>
</div>
<!-- asdasd -->
</div>
</div>
Then I have 10 more tables, inside an HTML file, which follows the same structure, so I thought of doing a "for" to bring the tag "title-table" which is the name of the table like this:
for id_torre in soup.find("div",{"class":"details"}).findAll("div",{"class":"title-table"}):#.findAll("h2"):
nm = id_torre.find("h2")
print(nm)
And with the list of titles of the tables, I thought of putting in the "while" so that it finds the table with each title and then captures the data of the cells in each row, and then I launch in the database:
while len(id_torre) >0:
nm = id_torre
print(nm)
tipo = soup.find("div",{"class":id_torre}).find("div",{"class":"table-cell build_type"})
print(tipo)
m2_util = soup.find("div",{"class":id_torre}).find("div",{"class":"table-cell area_useful"})
print(m2_util)
dt = soup.find("div",{"class":id_torre}).find("div",{"class":"table-cell rooms"})
print(dt)
But he brings "None" in all fields and keeps looping endlessly. what’s wrong with the code? I’m new to programming and python is the first language I’m learning.
All tables have the title in a <H2> within the table-title class? In this case it is BEAUNE, right?
– Miguel
yes, they all have title within <H2>, in this case they are two tables, one with name BEAUNE and the other with BERGERAC, but each table has one (<div id="table...) also
– Jeu Domingos
And you want the text within the elements whose class is table-Cell and to which table it belongs correctly?
– Miguel
correct, I need the class=table-Cell of each table related to the table name, to launch in the bank
– Jeu Domingos
I have the complete HTML file, here only put a part want q send you?
– Jeu Domingos
I don’t need it, I’ll do it according to what you put here, no problem
– Miguel
'\nBERGERAC': [' nApartamento n', ' n220m n', ' N3', ' N4', ' Nr$ 2,816,344,33* n n n n(R$ 2,393,892,68 to R$ 3,238,795,98) n n n',' nCobertura n', ' n396m n', ' N3', ' N5', ' Nr$ 5,069,419,80* n n n n n n(R$ 4,309,006,83 to R$ 5,829,832,77) n n n n n n n n n n n n n n n n n n n n n n n n n n n'],
– Jeu Domingos
He brought it, but how do I take the " n" and eliminate the spaces of the texts?
– Jeu Domingos
I’m still seeing a solution, see if this down helps
– Miguel