Getting HTML attributes with python

Question

Getting HTML attributes with python

Asked 5 years, 4 months ago

Viewed 97 times

-2

I’m wanting to get the information from arial-label, href and title tag a down:

    <a aria-label="AS MAIS TOCADAS NO BAILE FUNK 2019 #1 - SET DE FUNK by Funk 24por48 10 months ago 39 minutes 3,186,126 views" class="yt-simple-endpoint style-scope ytd-video-renderer" href="/watch?v=vTakYj4802U" id="video-title" title="AS MAIS TOCADAS NO BAILE FUNK 2019 #1 - SET DE FUNK">
        <yt-formatted-string aria-label="AS MAIS TOCADAS NO BAILE FUNK 2019 #1 - SET DE FUNK by Funk 24por48 10 months ago 39 minutes 3,186,126 views" class="style-scope ytd-video-renderer">AS MAIS TOCADAS NO BAILE FUNK 2019 #1 - SET DE FUNK</yt-formatted-string>
    </a>

I got this HTML snippet through selenium and BeautifulSoap (code down)

self.driver = webdriver.Firefox(options=self.options)
self.driver.get('https://www.youtube.com/results?search_query=funk+baile')
self.html = self.driver.find_elements_by_xpath('//*[@id="contents"]')[0].get_attribute('outerHTML')
self.html_musicas = self.soap.findAll(id="video-title", href=True)

How could I achieve the values of the attributes mentioned above? (arial-label, href and title)

1 answer

Browser other questions tagged html python pandas selenium web-scraping

You are not signed in. Login or sign up in order to post.

by Albuquerquess • 1 point · Answer 1 · 2020-03-05T02:52:09+00:00

I only managed by splitting tag stored in self.html_musicas and storing them in a vector:

self.dados_musicas[self.generos] = []
        for dados_tag in self.html_musicas:
            self.dados = str(dados_tag).split('"')
            self.dados_musicas[self.generos].append({self.dados[9]:self.dados[5]})

The exit was this here:

[{'MC João - Baile de Favela (KondZilla)': '/watch?v=kzOkza_u3Z8'},]