move data left in pandas

Question

move data left in pandas

Asked 4 years, 5 months ago

Viewed 43 times

0

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize 

tabelas = pd.read_html('https://www.msn.com/pt-br/esportes/basquete/nba/estatisticas-de-times')
for i in tabelas:
  print(i)

tabelas[0].head()

The data is different from the site, the rows are 'out' of the correct columns. I need to move the line to the left, so the data will be correct.

    Posição  TIME         TIME.1   GP  ...  LLC  TDAL  %LL  Unnamed: 14
0         1   NaN           Nets  Bkn  ...  409   576  705          817
1         2   NaN          Bucks  Mil  ...  395   444  608          730
2         3   NaN  Trail Blazers  Por  ...  389   486  591          822
3         4   NaN           Jazz  Uta  ...  393   499  643          776
4         5   NaN       Clippers  LAC  ...  422   525  624          841
5         6   NaN        Nuggets  Den  ...  378   451  581          776
6         7   NaN       Warriors   GS  ...  374   509  655          777
7         8   NaN          76ers  Phi  ...  362   611  779          784
8         9   NaN          Bulls  Chi  ...  380   443  558          794
9        10   NaN       Pelicans   NO  ...  368   509  694          733
10       11   NaN        Raptors  Tor  ...  385   524  635          825
11       12   NaN        Wizards  Was  ...  342   530  695          763
12       13   NaN          Kings  Sac  ...  368   467  651          717
13       14   NaN         Pacers  Ind  ...  371   463  598          774
14       15   NaN          Hawks  Atl  ...  354   608  741          821
15       16   NaN      Mavericks  Dal  ...  350   515  656          785
16       17   NaN         Lakers  LAL  ...  358   518  691          750
17       18   NaN      Grizzlies  Mem  ...  371   364  462          788
18       19   NaN        Hornets  Cha  ...  379   444  579          767
19       20   NaN           Suns  Pho  ...  362   414  496          835
20       21   NaN          Spurs   SA  ...  358   460  580          793
21       22   NaN        Celtics  Bos  ...  376   471  619          761
22       23   NaN        Rockets  Hou  ...  340   466  621          750
23       24   NaN   Timberwolves  Min  ...  357   435  579          751
24       25   NaN        Pistons  Det  ...  354   532  679          784
25       26   NaN        Thunder  OKC  ...  343   433  587          738
26       27   NaN           Heat  Mia  ...  352   487  618          788
27       28   NaN          Magic  Orl  ...  350   454  581          781
28       29   NaN      Cavaliers  Cle  ...  337   464  646          718
29       30   NaN         Knicks   NY  ...  357   493  644          766

[30 rows x 15 columns]

"lines are 'out' of the correct columns", could point out exactly what is wrong?

– Woss

2021/02/19 at 15:02
The data is out of order, the correct would be in the column GP would be the value of 31 in the column PPG the value 121.3 so on. Same as the website I exported the data https://www.msn.com/pt-br/esportes/basketball/nba/statisticss-de-times/sp-s-pts

– Thiago Ramos De Oliveira

2021/02/19 at 15:07
I believe that the values of the columns are wrong because a new column appeared, TIME.1, and the correct values were in the next column. Ex: 817 should be in column %LL, that’s it?

– Bernardo Lopes

2021/02/19 at 15:09
Correct, delete these columns and even then the data remains out of order.

– Thiago Ramos De Oliveira

2021/02/19 at 15:11
Already tried to delete the TIME column and rename the other columns

– Bernardo Lopes

2021/02/19 at 15:13

2 answers

1

Hello, my solution was to renown the columns, as follows:

I changed your dataset "tables" to the name df.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize 

tabelas = pd.read_html('https://www.msn.com/pt-br/esportes/basquete/nba/estatisticas-de-times')
for i in tabelas:
  print(i)

df = tabelas[0]
df.head()

Then I renamed the columns

df = df.drop(columns = ['TIME'])
df.rename(columns = {'TIME.1':'TIME', 'GP':'Abrev_Time', 'PPG':'GP','AC':'PPG', 'TDA':'AC', '%AC':'TDA', '3PM':'%AC', '3PA':'3PM', '3P%':'3PA', 'LLC':'3P%', 'TDALL':'LLC', '%LL':'TDALL', 'Unnamed: 14':'%LL'}, inplace = True)
df

P.S - You need to divide %AC column by 10, as python does not accept comma.

1

Smart solution at extremely low processing cost.

– Augusto Vasques

2021/02/19 at 15:31
1

Thank you very much for the help, interesting this way too.

– Thiago Ramos De Oliveira

2021/02/19 at 15:36

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by lmonferrari • **3,550** points · Answer 1 · 2021-02-19T15:19:36+00:00

Follow a possible solution by slicing, then using shift to move the columns

import pandas as pd 
import numpy as np 

tabelas = pd.read_html('https://www.msn.com/pt-br/esportes/basquete/nba/estatisticas-de-times') 

df = tabelas[0].copy()

df.loc[:,'TIME':'%LL'] = df.loc[:, 'TIME':'Unnamed: 14'].shift(-1, axis=1).drop('Unnamed: 14', axis=1)
df.drop('Unnamed: 14', axis=1, inplace=True)

Exit

Position	TEAM	TEAM.1	GP	PPG	AC	TDA	%AC	3PM	3PA	3P%	LLC	TDAL	%LL
1	Nets	Bkn	31	121.3	1.356	2,702	502	471	1.151	409	576	705	817
2	Bucks	A thousand	29	119.6	1,295	2.652	488	435	1.102	395	444	608	730
3	Trail Blazers	For	28	115.8	1.15	2.574	447	457	1.176	389	486	591	822
4	Jazz	Uta	29	115.6	1,186	2.543	466	482	1,227	393	499	643	776
5	Clippers	LAC	30	115.4	1.254	2.584	485	430	1,019	422	525	624	841
6	Nuggets	Den	28	115.4	1,204	2.51	480	371	981.0	378	451	581	776
7	Warriors	GS	29	114.7	1.199	2.588	463	418	1,119	374	509	655	777
8	76ers	Phi	29	114.6	1.205	2,507	481	302	834.0	362	611	779	784
9	Bulls	Chi	27	114.6	1,143	2.389	478	364	958.0	380	443	558	794
10	Pelicans	IN THE	28	114.6	1.183	2.459	481	335	911.0	368	509	694	733

...