move data left in pandas

Asked

Viewed 43 times

0

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize 

tabelas = pd.read_html('https://www.msn.com/pt-br/esportes/basquete/nba/estatisticas-de-times')
for i in tabelas:
  print(i)

tabelas[0].head()

The data is different from the site, the rows are 'out' of the correct columns. I need to move the line to the left, so the data will be correct.

    Posição  TIME         TIME.1   GP  ...  LLC  TDAL  %LL  Unnamed: 14
0         1   NaN           Nets  Bkn  ...  409   576  705          817
1         2   NaN          Bucks  Mil  ...  395   444  608          730
2         3   NaN  Trail Blazers  Por  ...  389   486  591          822
3         4   NaN           Jazz  Uta  ...  393   499  643          776
4         5   NaN       Clippers  LAC  ...  422   525  624          841
5         6   NaN        Nuggets  Den  ...  378   451  581          776
6         7   NaN       Warriors   GS  ...  374   509  655          777
7         8   NaN          76ers  Phi  ...  362   611  779          784
8         9   NaN          Bulls  Chi  ...  380   443  558          794
9        10   NaN       Pelicans   NO  ...  368   509  694          733
10       11   NaN        Raptors  Tor  ...  385   524  635          825
11       12   NaN        Wizards  Was  ...  342   530  695          763
12       13   NaN          Kings  Sac  ...  368   467  651          717
13       14   NaN         Pacers  Ind  ...  371   463  598          774
14       15   NaN          Hawks  Atl  ...  354   608  741          821
15       16   NaN      Mavericks  Dal  ...  350   515  656          785
16       17   NaN         Lakers  LAL  ...  358   518  691          750
17       18   NaN      Grizzlies  Mem  ...  371   364  462          788
18       19   NaN        Hornets  Cha  ...  379   444  579          767
19       20   NaN           Suns  Pho  ...  362   414  496          835
20       21   NaN          Spurs   SA  ...  358   460  580          793
21       22   NaN        Celtics  Bos  ...  376   471  619          761
22       23   NaN        Rockets  Hou  ...  340   466  621          750
23       24   NaN   Timberwolves  Min  ...  357   435  579          751
24       25   NaN        Pistons  Det  ...  354   532  679          784
25       26   NaN        Thunder  OKC  ...  343   433  587          738
26       27   NaN           Heat  Mia  ...  352   487  618          788
27       28   NaN          Magic  Orl  ...  350   454  581          781
28       29   NaN      Cavaliers  Cle  ...  337   464  646          718
29       30   NaN         Knicks   NY  ...  357   493  644          766

[30 rows x 15 columns]
  • "lines are 'out' of the correct columns", could point out exactly what is wrong?

  • The data is out of order, the correct would be in the column GP would be the value of 31 in the column PPG the value 121.3 so on. Same as the website I exported the data https://www.msn.com/pt-br/esportes/basketball/nba/statisticss-de-times/sp-s-pts

  • I believe that the values of the columns are wrong because a new column appeared, TIME.1, and the correct values were in the next column. Ex: 817 should be in column %LL, that’s it?

  • Correct, delete these columns and even then the data remains out of order.

  • Already tried to delete the TIME column and rename the other columns

2 answers

2

Follow a possible solution by slicing, then using shift to move the columns

import pandas as pd 
import numpy as np 

tabelas = pd.read_html('https://www.msn.com/pt-br/esportes/basquete/nba/estatisticas-de-times') 

df = tabelas[0].copy()

df.loc[:,'TIME':'%LL'] = df.loc[:, 'TIME':'Unnamed: 14'].shift(-1, axis=1).drop('Unnamed: 14', axis=1)
df.drop('Unnamed: 14', axis=1, inplace=True)

Exit

Position TEAM TEAM.1 GP PPG AC TDA %AC 3PM 3PA 3P% LLC TDAL %LL
1 Nets Bkn 31 121.3 1.356 2,702 502 471 1.151 409 576 705 817
2 Bucks A thousand 29 119.6 1,295 2.652 488 435 1.102 395 444 608 730
3 Trail Blazers For 28 115.8 1.15 2.574 447 457 1.176 389 486 591 822
4 Jazz Uta 29 115.6 1,186 2.543 466 482 1,227 393 499 643 776
5 Clippers LAC 30 115.4 1.254 2.584 485 430 1,019 422 525 624 841
6 Nuggets Den 28 115.4 1,204 2.51 480 371 981.0 378 451 581 776
7 Warriors GS 29 114.7 1.199 2.588 463 418 1,119 374 509 655 777
8 76ers Phi 29 114.6 1.205 2,507 481 302 834.0 362 611 779 784
9 Bulls Chi 27 114.6 1,143 2.389 478 364 958.0 380 443 558 794
10 Pelicans IN THE 28 114.6 1.183 2.459 481 335 911.0 368 509 694 733

...

  • 2

    If I could give another +1 just for the trouble of putting this table together.

  • 2

    Thank you so much for your help, you solved my problem. Now I’m off to analysis

1


Hello, my solution was to renown the columns, as follows:

I changed your dataset "tables" to the name df.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize 

tabelas = pd.read_html('https://www.msn.com/pt-br/esportes/basquete/nba/estatisticas-de-times')
for i in tabelas:
  print(i)

df = tabelas[0]
df.head()

Then I renamed the columns

df = df.drop(columns = ['TIME'])
df.rename(columns = {'TIME.1':'TIME', 'GP':'Abrev_Time', 'PPG':'GP','AC':'PPG', 'TDA':'AC', '%AC':'TDA', '3PM':'%AC', '3PA':'3PM', '3P%':'3PA', 'LLC':'3P%', 'TDALL':'LLC', '%LL':'TDALL', 'Unnamed: 14':'%LL'}, inplace = True)
df

P.S - You need to divide %AC column by 10, as python does not accept comma.

  • 1

    Smart solution at extremely low processing cost.

  • 1

    Thank you very much for the help, interesting this way too.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.