I can’t understand my mistake with FOR and Dataframe

Question

I can’t understand my mistake with FOR and Dataframe

Asked 3 years, 12 months ago

Viewed 73 times

0

I have a problem in my for, where all the results are coming out as the same value within Dataframe, but I can’t understand the error, someone could help me

import pandas as pd

BRICS=pd.DataFrame({'País':["Brasil","Russia","India","China","Africa do Sul"],'Capital':['Brasilia','Moscou','Nova Deli','Beijing','Pretoria'],'Area':[8.516,17.100,3.286,9.597,1.221],'População':[200.40,143.50,1252.00,1357.00,52.98]})

print(BRICS,'\n')

for i in range(len(BRICS)):
  BRICS=BRICS.assign(Densidade=BRICS['População'].values[i]/BRICS['Area'].values[i])

print(BRICS)

tried to make BRICS['Densidade'] = BRICS['População'] / BRICS['Area'] or BRICS.assign(Densidade=BRICS['População']/BRICS['Area']) without the for ?

– Icaro Martins

2021/07/08 at 15:40

2 answers

0

Apparently, to occupy the column Densidade you are creating every iteration of for a vector that has only the density calculation value for the element i. Since this vector has a single element, it is replicated by all lines of the data frame.

The value 43.390663 that your code puts is just the value of Densidade to the last country of data frame.

A suggested code would be

import pandas as pd

BRICS=pd.DataFrame({'País':["Brasil","Russia","India","China","Africa do Sul"],'Capital':['Brasilia','Moscou','Nova Deli','Beijing','Pretoria'],'Area':[8.516,17.100,3.286,9.597,1.221],'População':[200.40,143.50,1252.00,1357.00,52.98]})

print(BRICS,'\n')

BRICS = BRICS.assign(Densidade = lambda x: BRICS['População'] / BRICS['Area'])

print(BRICS)

That is: replace your loop for by a lambda expression that Pandas will apply to all elements of the vector.

This solution will enable optimizations if Panda can distribute lambda expression invocations between multiple processors.

It seems that in the example shown lambda x: not being used, perhaps it should be changed to: BRICS.assign(Densidade = lambda x: x['População'] / x['Area'])

– Icaro Martins

2021/07/08 at 20:01
Your comment has some reason, dear Icaro. Apparently, whoever edited the question did not observe this.

– Hilton Fernandes

2021/07/08 at 20:11
1

But note that even though there is no use of x in the body of the lambda expression, everything applies to the data frame BRICS. That’s why the example worked correctly when I ran it.

– Hilton Fernandes

2021/07/08 at 20:12
Just commenting that by checking the revision of response editions there seems to be no change in the code on the part of the editors. That is, the answer was created with this code (the question also).

– fernandosavio

2021/07/08 at 20:35
It is: I also did not understand which revision was made. Or what it was made for.

– Hilton Fernandes

2021/07/08 at 20:45
The purpose of the edition is to adapt the question to the format of the site and not to make corrections to the code. The code corrections are the responsibility of the author of the answer.

– Augusto Vasques

2021/07/09 at 06:43

Show 1 more comment

Browser other questions tagged python pandas for

You are not signed in. Login or sign up in order to post.

by Paulo Marques • **3,739** points · Answer 1 · 2021-07-08T19:52:09+00:00

1

The most performative solution would be:

BRICS["Densidade"] = BRICS['População'] / BRICS['Area']

Use for, apply, assign with the aid of a function (named or lambda) is slower.

For more details see here

In fact, your solution seems to make sense, dear @Paulo Marques. I appreciate your suggestion. But in a data frame as small as that presented, this may not have much impact.

– Hilton Fernandes

2021/07/08 at 20:15
@Hiltonfernandes, I agree with you regarding the example, but the purpose of the questions is to have an example Minimum, Complete and Verifiable both code and data. I understand that our answers should answer the questions, but be useful for other cases.

– Paulo Marques

2021/07/08 at 20:46
1

Accordingly, @Paulomarques. I agree that your answer -- very well chosen -- takes into account the vectorization that Numpy can promote. Still, it is always relevant to be able to compare several alternatives. Rarely in technology a single solution solves all problems.

– Hilton Fernandes

2021/07/08 at 20:50