Text.strip() Python

Asked

Viewed 123 times

1

I am trying to make a code that extracts some information from a page. The file has the following format:

<tr class="impar">
    <td class="id">
        <a href="/judge/en/runs/code/5046623">5046623</a>
    </td>
    <td></td>
    <td class="tiny">
        <a href="/judge/en/problems/view/2173">2173</a>
    </td>
    <td class="wide">
        <a href="/judge/en/problems/view/2173">Slush Fund</a>
    </td>
    <td class="semi-wide answer a-1">
        <a href="/judge/en/runs/code/5046623">Accepted</a>
    </td>
    <td class="center">C++</td>
    <td class="tiny">0.084</td>
    <td class="center">8/27/16, 10:10:39 PM</td>
</tr>

I implemented the following Python code to extract the information: id = 2173, name = Slush Fund, language: C++. However I get the following error:

File "C:/Users/diego/Desktop/main.py", line 35, in extractProblems
 "language": td[4].text.strip()

AttributeError: 'NoneType' object has no attribute 'text'

My code:

def extractProblems(self, soup, problems):
    itens = soup.find_all('table')[0].find_all('tbody')[0].find_all('tr')
    for tr in itens:
        td = tr.find_all('td')
        if len(td) != 8:
            return 
        else:                
            problems[td[1].find('a').text.strip()] = {
                "id_code": td[2].find('a').text.strip(),
                "name": td[3].find('a').text.strip(),
                "language": td[4].text.strip()
            }

Could someone help me?

  • Most likely, one of your own td[i].find('a') You’re not finding anything there td[i].find('a') would be None. And as the error points, None does not have an attribute called text.

  • Solved the problem?

1 answer

0

This is your td[1], where you apply the td[1].find('a').text.strip() to get the code and generate the index for the dictionary problems:

<td>
</td>

That is, it has nothing. It is empty and will give error when you search for the tag <a> and the text of it. The data you want is in td[0]:

<td class="id">
<a href="/judge/en/runs/code/5046623">5046623</a> </td>
<td>

Change td[1] for td[0]:

problems[td[0].find('a').text.strip()] = {
    "id_code": td[2].find('a').text.strip(),
    "name": td[3].find('a').text.strip(),
    "language": td[4].text.strip()
}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.