Filter column by string specifies

Asked

Viewed 651 times

0

I’m trying to perform a filter on my dataframe (df_movies_usa) to eliminate all other types of currency not represented by "$". The coin is in the column df_movies_usa["budget"].

I’m using the following code:

import pandas as pd
df_movies = pd.read_csv("IMDb movies.csv", sep = ",")
df_movies["country"] = df_movies["country"].str.replace("UK, USA", "USA")
df_movies["country"] = df_movies["country"].str.replace("USA, Canada", "USA")
df_movies["country"] = df_movies["country"].str.replace("Canada, USA", "USA")
df_movies["country"] = df_movies["country"].str.replace("USA, Germany", "USA")
df_movies_usa = df_movies[df_movies['country'] == "USA"]
df_movies_usa = df_movies_usa[df_movies_usa["budget"].str.contains("$")]

But the dataframe still contains all lines with other currency types (e.g., RUR)

  • Can you post the code so we can do the proper tests and try to help you?

  • Modified code!

  • There was no code change, just the one I added BB Code...

  • I’m new here, when you say to post the code, you want me to attach in extension ipynb the notebook here?

  • You have more pieces than you posted, right? We need as much information as possible so that we can simulate your mistake and try to help you, with that, without the code, with just the part that’s making a mistake, we can’t help you.

  • I edited the question containing all the code

  • Can give a var_dump(df_movies_usa[df_movies_usa["budget"]) and post the result?

  • Take a look at the documentation, I think this can help you and a lot: Documentation

  • Keyerror: "None of [Index([' 45000', ' 5700', ' 23500', ' 40000', ' 25000', ' 20000', ' 10000', n ' 50000', ' 17022', ' 50000', n ... n ' 5000000', ' 1500000', ' 130000', ' 95000', ' 1000', ' 100000', n ' 1500000', ' 3000000', ' 7000', ' 500000'], n dtype='Object', length=11132)] are in the [Columns]"

Show 4 more comments

2 answers

1

You are using the contains method with the symbol $ what pandas interprets as regex. Try adding regex=false:

df_movies_usa = df_movies_usa[df_movies_usa["budget"].str.contains("$", regex=False)]

0

Note the use of the method contains, this method will return all results containing the filtered element, that is, in your case, all occurrences of $, may be just the $, can be R$, or other currency having a dollar.

You can filter with the Equal method (eq) that will return within your dataframe only the corresponding equals.

Take a look at this documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.eq.html

Browser other questions tagged

You are not signed in. Login or sign up in order to post.