0
I’m trying to make a web scraping, but if you view the site you notice that certain titles are on certain columns. What my program does is take the table, create two full columns of Nan and assign them a column title from the first effective column of the site. What could be wrong??? It seems that it jumps 2 columns, fills with the data and only then starts in the right way.
I need the first title to go to the third column, the second to the fourth, the third to the fifth and so on.
How does it look:
import pandas as pd
import lxml
import html5lib
from bs4 import BeautifulSoup
from pandas import DataFrame
import numpy as np
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
url = "https://www.sunoresearch.com.br/acoes/itsa4/"
option = Options()
option.headless = True
driver = webdriver.Firefox()
driver.get(url)
time.sleep(10)
driver.find_element_by_xpath('//*[@id="demonstratives"]/div[2]/div[1]/div[1]/ng-select/div').click()
driver.find_element_by_xpath('//*[@id="demonstratives"]/div[2]/div[1]/div[1]/ng-select/div/ul/li[2]').click()
driver.find_element_by_xpath('//*[@id="demonstratives"]/div[2]/div[1]/div[2]/ng-select/div').click()
driver.find_element_by_xpath('//*[@id="demonstratives"]/div[2]/div[1]/div[2]/ng-select/div/ul/li[4]').click()
driver.find_element_by_xpath('//*[@id="demonstratives"]/div[2]/div[2]/div/button[2]').click()
element = driver.find_element_by_xpath('//*[@id="demonstratives"]/div[3]/div[2]')
html_content = element.get_attribute('outerHTML')
soup = BeautifulSoup(html_content, 'html.parser')
table = soup.find(name='table')
df_full = pd.read_html(str(table))[0]
pd.set_option('display.max_columns', None)
print(df_full)
driver.quit()