I can’t understand my mistake with FOR and Dataframe

Asked

Viewed 73 times

0

I have a problem in my for, where all the results are coming out as the same value within Dataframe, but I can’t understand the error, someone could help me

import pandas as pd

BRICS=pd.DataFrame({'País':["Brasil","Russia","India","China","Africa do Sul"],'Capital':['Brasilia','Moscou','Nova Deli','Beijing','Pretoria'],'Area':[8.516,17.100,3.286,9.597,1.221],'População':[200.40,143.50,1252.00,1357.00,52.98]})

print(BRICS,'\n')

for i in range(len(BRICS)):
  BRICS=BRICS.assign(Densidade=BRICS['População'].values[i]/BRICS['Area'].values[i])

print(BRICS)
  • tried to make BRICS['Densidade'] = BRICS['População'] / BRICS['Area'] or BRICS.assign(Densidade=BRICS['População']/BRICS['Area']) without the for ?

2 answers

1

The most performative solution would be:

BRICS["Densidade"] = BRICS['População'] / BRICS['Area']

Use for, apply, assign with the aid of a function (named or lambda) is slower.

For more details see here

  • In fact, your solution seems to make sense, dear @Paulo Marques. I appreciate your suggestion. But in a data frame as small as that presented, this may not have much impact.

  • @Hiltonfernandes, I agree with you regarding the example, but the purpose of the questions is to have an example Minimum, Complete and Verifiable both code and data. I understand that our answers should answer the questions, but be useful for other cases.

  • 1

    Accordingly, @Paulomarques. I agree that your answer -- very well chosen -- takes into account the vectorization that Numpy can promote. Still, it is always relevant to be able to compare several alternatives. Rarely in technology a single solution solves all problems.

0


Apparently, to occupy the column Densidade you are creating every iteration of for a vector that has only the density calculation value for the element i. Since this vector has a single element, it is replicated by all lines of the data frame.

The value 43.390663 that your code puts is just the value of Densidade to the last country of data frame.

A suggested code would be

import pandas as pd

BRICS=pd.DataFrame({'País':["Brasil","Russia","India","China","Africa do Sul"],'Capital':['Brasilia','Moscou','Nova Deli','Beijing','Pretoria'],'Area':[8.516,17.100,3.286,9.597,1.221],'População':[200.40,143.50,1252.00,1357.00,52.98]})

print(BRICS,'\n')

BRICS = BRICS.assign(Densidade = lambda x: BRICS['População'] / BRICS['Area'])

print(BRICS)

That is: replace your loop for by a lambda expression that Pandas will apply to all elements of the vector.

This solution will enable optimizations if Panda can distribute lambda expression invocations between multiple processors.

  • It seems that in the example shown lambda x: not being used, perhaps it should be changed to: BRICS.assign(Densidade = lambda x: x['População'] / x['Area'])

  • Your comment has some reason, dear Icaro. Apparently, whoever edited the question did not observe this.

  • 1

    But note that even though there is no use of x in the body of the lambda expression, everything applies to the data frame BRICS. That’s why the example worked correctly when I ran it.

  • Just commenting that by checking the revision of response editions there seems to be no change in the code on the part of the editors. That is, the answer was created with this code (the question also).

  • It is: I also did not understand which revision was made. Or what it was made for.

  • The purpose of the edition is to adapt the question to the format of the site and not to make corrections to the code. The code corrections are the responsibility of the author of the answer.

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.