How to move the title from one column to another?? (web scraping-python)

Question

How to move the title from one column to another?? (web scraping-python)

Asked 5 years ago

Viewed 53 times

0

I’m trying to make a web scraping, but if you view the site you notice that certain titles are on certain columns. What my program does is take the table, create two full columns of Nan and assign them a column title from the first effective column of the site. What could be wrong??? It seems that it jumps 2 columns, fills with the data and only then starts in the right way.

I need the first title to go to the third column, the second to the fourth, the third to the fifth and so on.

How it should look:

How does it look:

import pandas as pd
import lxml
import html5lib
from bs4 import BeautifulSoup
from pandas import DataFrame
import numpy as np
from selenium import webdriver
from selenium.webdriver.firefox.options import Options

url = "https://www.sunoresearch.com.br/acoes/itsa4/"

option = Options()
option.headless = True
driver = webdriver.Firefox()

driver.get(url)
time.sleep(10)

driver.find_element_by_xpath('//*[@id="demonstratives"]/div[2]/div[1]/div[1]/ng-select/div').click()
driver.find_element_by_xpath('//*[@id="demonstratives"]/div[2]/div[1]/div[1]/ng-select/div/ul/li[2]').click()
driver.find_element_by_xpath('//*[@id="demonstratives"]/div[2]/div[1]/div[2]/ng-select/div').click()
driver.find_element_by_xpath('//*[@id="demonstratives"]/div[2]/div[1]/div[2]/ng-select/div/ul/li[4]').click()
driver.find_element_by_xpath('//*[@id="demonstratives"]/div[2]/div[2]/div/button[2]').click()
element = driver.find_element_by_xpath('//*[@id="demonstratives"]/div[3]/div[2]')
html_content = element.get_attribute('outerHTML')

soup = BeautifulSoup(html_content, 'html.parser')
table = soup.find(name='table')

df_full = pd.read_html(str(table))[0]
pd.set_option('display.max_columns', None)
print(df_full)

driver.quit()

1 answer

Browser other questions tagged python pandas selenium-webdriver web-scraping beautifulsoup

You are not signed in. Login or sign up in order to post.

by gabriel pereira • 1 point · Answer 1 · 2020-08-09T02:17:52+00:00

Bro, I would recommend you drop these columns, because this df_full is a data frame pandas you can use the command,

df_full = df_full.drop(labels = ["Descrição", "1T2020"], axis = 1])

Ai following this old column name reasoning (old_column_name) for new column name (new_column_name), as in the example below,

df.rename({"old_column_name1":"new_column_name1","old_column_name2":"new_column_name2"}).

This if the problem is just renaming the columns.