Separating a dataframe by some python pandas criterio

Asked

Viewed 540 times

2

I have a database that has 789 reviews of people on a particular product, it has the columns reviews and stars. I normalized the data to positive (star >= 3) 1 and negative 0.

outputs = data_frame['estrelas']

rotulo = list()

for output in outputs:
  if output >= 3:
    rotulo.append(1)
  else:
    rotulo.append(0)

Then I counted the number of positives and negatives of the dataset and it came that it has 738 positives and 51 negatives. What I need is for them to be equal to 51 negatives and 51 positives, in other words, 102 records. I’m using python and pandas.

  • I don’t know if I understand the problem. Dataframe has 789 lines, of which 738 have the column estrela with value >=3 and 51 with value <3. The goal is to catch, of those 738, only 51? Have any criteria to choose these 51?

  • That’s right! No, it just needs to be >= 3.

1 answer

1


One way is to take the index of the positive lines, select only 51 values, join with the index of the negative lines and keep only the selected lines:

# Pego os ids das linhas com estrelas positovas e negativas
ids_positivos = df[df['estrelas'] >= 3].index.values
ids_negativos = df[df['estrelas'] < 3].index.values

# Opcionalmente posso embaralhar os ids para pegar linhas aleatórias
#numpy.random.shuffle(ids_positivos)

# Seleciono os primeiros 51 valores
ids_positivos = ids_positivos[:51]
ids_negativos = ids_negativos[:51] # No caso seleciono todos os negativos

# Concateno todos os ids em um array só
ids_para_manter = numpy.concatenate((ids_positivos, ids_negativos))

# Crio um novo DataFrame com os ids selecionados
novo_df = df.iloc[ids_para_manter]

In a dry way we have:

ids_positivos = df[df['estrelas'] >= 3].index.values[:51]
#ids_positivos = numpy.random.shuffle(df[df['estrelas'] >= 3].index.values)[:51]
ids_negativos = df[df['estrelas'] < 3].index.values[:51]


novo_df = df.iloc[numpy.concatenate((ids_positivos, ids_negativos))].reset_index()
  • Thank you so much! That’s right

Browser other questions tagged

You are not signed in. Login or sign up in order to post.