Working with two different Datasets - Filter the data

Question

Working with two different Datasets - Filter the data

Asked 5 years ago

Viewed 58 times

0

Dataset - Players

Dataset - Results

Good night,

I’m stuck in a data wipe. I am working on a project to predict the winners of a particular Counter Strike Global Offensive match, for my final course project.

But the databases have data that does not converge. In the dataset players there are values in the "match_id" column that are not present in the "match_id" column of the dataset Results and vice versa

I have difficulty manipulating the data: How do I make both datasets to have the same "match_id"? Because there are divergent "match_id" in both datasets

P.S removed the datasets from this topic in Kaggle:

https://www.kaggle.com/mateusdmachado/cs-go-professional-matches-analysis

Q2 - I tried the following solution without result, apart from the others that went wrong:

The result was as follows:

1 answer

Browser other questions tagged database pandas

You are not signed in. Login or sign up in order to post.

by renatomt • **194** points · Answer 1 · 2020-07-20T00:17:21+00:00

0

You can use the pandas isin method to filter the data that is in a column using a series or array (documentation here).

For example, to get all the games on the dataframe loc_results appearing in the loc_players just filter the dataframe using loc_results["match_id"].isin(loc_players["match_id"]). Then just use this series to filter the dataframe.

filter = loc_results["match_id"].isin(loc_players["match_id"])
loc_results_filtered = loc_results[filter]

Thank you @renatomt, matches_results = pd.DataFrame() for i in range (0, loc_results.shape[0]): if results[["match_id"]].isin(players[["match_id"]])==True matches_results = results.append(results.iloc[[i]]) I believe you have brought me closer to the point I want to make, but in performing this procedure another error appeared: "The Truth value of a Dataframe is ambiguous. Use a.Empty, a.bool(), a.item(), a.any() or a.all()."

– Luis Henrique Batista

2020/07/20 at 01:10
1

To filter the dataframe if you do not need to make a for loop. Suppose you have a series of true and false called serie_bool can use df[serie_bool] and it returns you df with only the elements that are true in serie_bool

– renatomt

2020/07/20 at 01:38
For the form you gave me, the answer was the comparison of the dataframe that returned True or False. I need to remove the divergent items from both dataframes. So I used this condition of for loop, which is probably wrong.

– Luis Henrique Batista

2020/07/20 at 01:41
1


filter = df["series"].isin(df2["series"])
df3 = df[filter]
 df3 in this case will be the filtered df where the series column has the same value in df and df2. If you need to do this pros two dataframes and you can get the dataframes only with the matches you have in both.

– renatomt

2020/07/20 at 01:52
filter = players["match_id"]. isin(Results["match_id"]) matches_players = pd.Dataframe() matches_players = players[filter] matches_players THANK YOU! WORKED OUT!

– Luis Henrique Batista

2020/07/20 at 02:03