Empty table ("Empty Dataframe") and ambiguity of comparison in Pandas

Asked

Viewed 24 times

0

Hello, I’ve been trying for a few days to set up a program, the bulk of it is working but there are two demands that do not come true at all. who are they:

  • Amount and total amount of tests involving Doppler and Doppler that were performed;
  • Number of tests with unit values ("vl_unit") 30% above or below the reference value ("vl_ref").

In the first I already tried to use all type of command to perform the search of the words "Doppler" and "Ecodoppler", but it only returns me Empty Dataframe; if I put in the search for the service column >= Doppler it returns me a list of words that even contain the Doppler but also return words that do not contain at the same time, but if I put == it gives an empty table and if I put <= it works also put in the same scheme of >=.

And in the second I’ve tried to use every kind of separation for the condition if compare but he always says that the term is ambiguous, I don’t know what else to do to check if there is any number in the column "vl_unit" that is 30% higher, or lower, in "vl_ref".

It follows the Code: import pandas as pd

accounts= pd.read_csv('.csv accounts', Sep = ';')

#Amount and total amount of tests involving Doppler and Doppler were performed Doppler = .query accounts("servico == 'Doppler'") dopplerMaior = .query accounts("servico >= 'Doppler'") print(" n nTable empty: n",Doppler) print(" n nTable showing dopplerMaior (which is not what I want): n",dopplerMaior['servico']) #Amount of examinations with unit values 30% above or below the reference value Maiorquetrinta = [] if ((accounts['vl_unit']) == (accounts['vl_ref']*(1.3)): Majororquetrinta.append(accounts['vl_unit']) print(Majororquetrinta)

I would just like to know what is going on and what is going wrong. To malhor help I will put a treixo of the CSV file to test:

matric;sexo; idade ;atend;tuss;servico;plano;vl_unit;vl_ref;qtde
14993;F; 60 ;06/12/2017;20201010;ACOMPANHAMENTO CLINICO DE TRANSPLANTE RENAL NO PERIODO DE INTERNACAO DO RECEPTOR;1077 ;210.00;202.01;1
10258;F; 27 ;14/03/2016;31602037;ANESTESIA GERAL OU CONDUTIVA PARA REALIZACAO DE BLOQUEIO NEUROLITICO;1145 ;492.99;316.02;1
12343;M; 86 ;14/02/2016;16020065;ANESTESIA P/ANGIOGRAFIA CAROTIDA BILAT;1132 ;338.02;160.20;2
10535;M; 50 ;14/07/2016;16020065;ANESTESIA P/ANGIOGRAFIA CAROTIDA BILAT;1091 ;325.21;160.20;11
11500;F; 38 ;31/08/2016;16020065;ANESTESIA P/ANGIOGRAFIA CAROTIDA BILAT;1112 ;257.92;160.20;4
10352;F; 14 ;05/09/2017;10101020;CONSULTA EM DOMICILIO;1127 ;107.07;101.01;1
11096;F; 62 ;08/09/2017;10101020;CONSULTA EM DOMICILIO;1190 ;208.08;101.01;1
10616;F; 45 ;01/10/2016;40901360;DOPPLER COLORIDO DE VASOS CERVICAIS ARTERIAIS BILATERAL (CAROTIDAS E VERTEBRAIS);1069 ;625.79;409.01;1
11861;M; 57 ;02/11/2016;40901360;DOPPLER COLORIDO DE VASOS CERVICAIS ARTERIAIS BILATERAL (CAROTIDAS E VERTEBRAIS);1075 ;392.65;409.01;1
13610;F; 58 ;07/03/2016;40901076;ECODOPPLERCARDIOGRAMA COM ESTRESSE FARMACOLOGICO;1085 ;797.57;409.01;2
10191;M; 29 ;24/10/2016;40901076;ECODOPPLERCARDIOGRAMA COM ESTRESSE FARMACOLOGICO;1065 ;813.93;409.01;1
14771;M; 56 ;27/09/2017;31602614;PORTE ANESTESICO 2;1011 ;445.60;316.03;4
10023;M; 59 ;28/09/2017;31602614;PORTE ANESTESICO 2;1003 ;347.63;316.03;1

Please, I really need just these two conditions. inserir a descrição da imagem aqui

1 answer

0

You can search for entries in the Dataframe as follows:

df_with_filter = df[ df['coluna'] == 'condicao' ]

In the matter of checking two values of the same line that satisfy a condition, you need to iterate over each value within a dataframe.

To get the values of a Dataframe, just use df.values.

The code below uses these methods I mentioned earlier. I believe it is enough to solve your problem.

import numpy as np
import pandas as pd

contas= pd.read_csv('contas.csv', sep = ';')

# Obtém todas as entradas que contém Doppler ou EcoDoppler na coluna "servico"
df_doppler = contas[(contas['servico'] == 'Doppler') |  contas['servico'] == 'EcoDoppler']

# Quantidade de exames com valores unitários("vl_unit") 30% acima ou abaixo do valor de referência("vl_ref").
# Obtém os valores do dataframe em formado de array (numpy)
values = contas.values
# Obtém o nome das colunas
column_names = contas.column


# Obtém os índices das colunas val_unit e val_ref
val_unit_idx = np.where(column_names == 'val_unit')
val_ref_idx = np.where(column_names == 'val_ref')
# Itera os dados presentes do dataframe para buscar valores que satisfaçam a condição
val_match = []
for entry in values:
  if( (entry[val_unit_idx] == entry[val_ref_idx] * 1.3) or  (entry[val_unit_idx] == entry[val_ref_idx] * 0.7) ):
    # Adiciona a linha do dataframe (em formato de array) na lista val_match
    val_match.append(entry)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.