Posts by Terry • 889 points
42 posts
-
1
votes1
answer48
views -
0
votes2
answers41
viewsA: Compare lines Dataframa Pandas
I do not know if I understood correctly what is the expected output, but I believe that to calculate the difference of the smallest date in each number with the other dates, you will need to use the…
-
3
votes1
answer40
viewsA: How to turn every two records of a Datframe into a single in Python based on two columns
One way would be to divide the DF into 2, between patrons and visitors, rename the columns of each DF according to your wishes and with merge unite them again, in this way: mask = df['mandante'] ==…
-
0
votes1
answer70
viewsA: How to join three or more CSV files with something like PROCV and concatenating certain columns
My solution proposal involves the use of functions concat, groupby, map and drop_duplicates. First merge all your files with concat, with the groupby for Code concatenate the column strings Year and…
-
2
votes1
answer97
viewsA: How to compare two string values with pandas?
Try using the function isin, in this way: mask = cursos['codigo_unidade_ensino'].isin(unidade_ensino['cod_unidade_ensino']) cursos = cursos[mask].copy()…
-
3
votes2
answers88
viewsA: Replacing Nan values with the subsequent not Nan of another column
It is possible to create a temporary Series with column values only groundwork where values is not null with commands .mask, .isna and.bfill. With this Series in a variable it is possible to pass it…
-
1
votes2
answers82
viewsA: Group closer values in Postgresql
I believe that what you seek is possible using the command .merge_asof() in pandas. It allows it to be possible to join Dataframes by approximation in the data (in your case, by approximate dates).…
-
2
votes2
answers312
viewsA: How to fill a column of a DF Pandas using, as a comparison, a specific column between this and another DF?
My suggestion is to make a LEFT JOIN using the function merge(). And for it to work properly it is not necessary to create the column codproject with zeros in Dataframe 'DF' # aqui vou excluir a…
-
2
votes2
answers150
viewsA: Cross-reference two different dataframes with different line numbers
You can solve this using the function .map() passing a Series to her, this way: dataset_original = pd.DataFrame({'Grau de instrucao': ['1','2','3','4','5','6','7','8','9','10','11','-1']}) s =…
-
0
votes2
answers509
viewsA: Filter data from a Dataframe pandas by a specific column and the last four dates of a set of dates
First it will be necessary to use the shift() and fillna to fill the audience of programs that have only one occurrence. To calculate the average of these programs the function will be used rolling…
-
2
votes2
answers50
viewsA: Deleting lines with repeated Labels on a Dataframe
Tends to use the .drop_duplicates() in this way: df = df.drop_duplicates(subset=['B'])…
-
0
votes1
answer381
viewsA: Frequency Table with two variables
To do this you will need to use the groupby by the columns Region of Origin and Instruction Degree, use command size to take the size of each of these groups. After this, it is possible to remove…
-
0
votes1
answer83
viewsA: Complete values in a table, with values of the table itself?
It is possible to make this substitution by grouping by active with the function groupby together with the ffill, in this way: df['data'] = pd.to_datetime(df['data'], dayfirst=True)…
-
0
votes2
answers280
viewsA: Replace use of dataframe with pandas . apply
One way to do this would be by using the function .isin() pandas. It returns a boolean list when it finds some value within the array cargos_to_display_photo df['display_foto'] =…
-
0
votes1
answer282
viewsA: How to make a cumulative sum in Bigquery?
I think for your problem, you can divide it into 2 parts, first using groupby to find the ids in each month, and then using over to make the cumulative sum. Follow an example with t1 as ( select…
-
0
votes2
answers76
viewsA: How do I remove alphabetic characters from a column of a pd. Series? (Python)
You can do this using the function .split() for space and selecting the first position of the array, after that use the function value_counts(). data = {'Idade': ['80 ANOS', '80 ANOS', '80 ANOS',…
-
0
votes1
answer531
viewsA: How to delete line from a dataframe based on a python list?
Can do using .isin(), in this way: df_saida = df.loc[~df['cod'].isin(lista)] #output: cod letra 0 101 a 2 303 c 3 404 d The .isin() here returns a Boolean list showing which values are found in the…
-
0
votes1
answer136
viewsA: Remove lines less frequently from pandas.dataframe
Combine the value_counts with a head(3).index to create a mask with the elements that most appear in the Dataframe. After, with isin select them. mask = df['variedade'].value_counts().head(3).index…
-
0
votes2
answers301
viewsA: python np.Where with two conditions
You could also have used the command .any(1), that would check on each Dataframe line if any of the values is True, this way: x['D'] = (x[['A','B']] > 0).any(1) A B D 0 1 5 True 1 2 0 True 2 3 0…
-
2
votes2
answers343
viewsA: Getting maximum value of each grouping with groupby pandas
You can do this using the groupby with idxmax. The idea is to select the indices where the largest population of each country is. df.iloc[df.groupby('pais')['populacao'].idxmax()] #saida pais cidade…
-
0
votes2
answers93
viewsA: How to create multiple columns using the values of one in pandas?
You can use the function reshape with the transpose numpy without the need to manually write a function or call it several times. pd.DataFrame(df.values.reshape(-1, 10).T, columns=['A','B', 'C',…
-
0
votes2
answers665
viewsA: Load multiple concatenated CSV at once in Python
You can do it using glob import glob arquivos = glob.glob('arquivo*.csv') # 'arquivos' agora é um array com o nome de todos os .csv que começam com 'arquivo' array_df = [] for x in arquivos: temp_df…
-
9
votes6
answers19845
viewsA: How to change the name of the pandas dataframe column?
According to the documentation of command rename, return of this function is a new Dataframe with the renamed column(s) (s). To get to the desired answer, (1) simply assign the function return to a…
-
1
votes1
answer370
viewsA: Group by com Python[Nympy or Pandas] - Bring the 1st line and last line by date
The function first belongs to pandas, and not to numpy. Just change your np.first for "first"(as string) that will work :) df2 = df[["DATA","MAXIMA","MINIMA"]] df2['maxDia'] =…
-
1
votes2
answers351
viewsA: Compare Dataframes and show different information between them
If you need to select different values between columns at the same index position, it can be done using (1) .loc and (2) taking the values 'not equal' with the command .ne…
-
1
votes1
answer138
viewsA: Pandas: Acceptance and rejection percentage
You can do this by creating 2 temporary Dataframes using functions such as groupby, value_counts and pivot, one of them will have the total requisitions per city, and another will have the amount of…
-
0
votes1
answer234
viewsA: How does "parse" work for handling dates in Python?
I like (personal opinion) to parse after loading the data for two reasons: (1) Make the code more readable and (2) with the command pd.to_datetime it is possible to handle errors that may occur…
-
0
votes2
answers434
viewsA: Get the maximum value of each row in a grouped pandas dataframe
With groupby and transform it is possible to select the highest value of each class per state df = dfAcidentesPorMunicipiosPorUF.copy() df.loc[df['Total'] ==…
-
0
votes1
answer746
viewsA: Add and subtract according to a criterion in another column
If I understand you, you can do it by adding up the whole column Valor and subtract by twice the sum where Código Rubrica is equal to 352. sub352 = f0519_grouped.loc[f0519_grouped['Código Rubrica']…
-
1
votes3
answers1079
viewsA: transform list items into separate columns or extend dataframe to the end
A better (faster) alternative would be to create a new Dataframe by converting the column Inventory for an array numpy with value, thus: df = pd.DataFrame(idf["Inventory"].values.tolist()) df.index…
-
0
votes1
answer112
viewsA: How do I delete lines by cell specific content in pandas
You can do it using groupby with count df = df.loc[df.groupby('role_name')['role_name'].transform('count') >= 100]
-
2
votes1
answer1184
viewsA: Manipulating 3 GB Dataset with Pandas using Chunks
The idea of chunksize is that you can work on the data in 'blocks', using some of the existing loop systems. My tip is you pre-define your goals before reading the data using Chunk, since it…
-
2
votes1
answer154
viewsA: Validate date as holiday or not
You can do it with the following sequence: Merge Dataframes with merge passing the command Indicator = True Check which lines were in the 2 Dataframes with np.where Delete extra column created by…
-
1
votes2
answers173
viewsA: Reading of multiple datasets
This can be done using the library glob import glob arquivos = glob.glob('dataset/*.csv') # 'arquivos' agora é um array com o nome de todos os .csv existentes na pasta 'dataset' array_df = [] for x…
-
1
votes3
answers2125
viewsA: Format Time in a Python data frame
You can convert the column Original Time for datetime using the function (1)to_datetime and extracting only the hour, minute and second with (2)strftime df['somente Horas'] = pd.to_datetime(df['Hora…
-
5
votes3
answers16188
viewsA: Removing lines from a dataframe that meet a certain condition
To contribute to the thread, I suggest a solution using a mask to select the desired data, follow the performance tests: Using Loc and drop %%timeit df_remove =…
-
1
votes2
answers6875
viewsA: Dataframe - Pandas. Assigning values in columns from comparing another column
You can solve this using the function select of numpy, passing an array of conditions, a result array and a value to default condicao = [df['return_percentagem'] < 50, df['return_percentagem']…
-
1
votes1
answer42
viewsA: Mathematical operation with CSV files
It is complicated to suggest a solution when only an image of an excel is available and not a real sample of the data. But if what you’re looking for is multiplication between columns and saving it…
-
0
votes2
answers577
viewsA: Group three commands into one
You can reduce the 3 commands in a single line by joining the first and third conditions within a (1)loc using (2)groupby with (3)transform, thus: f0219.loc[(f0219.Tiporubrica == 2) &…
-
0
votes2
answers3465
viewsA: How to invert the order of columns of a Dataframe with Python
A very generic way of reversing the order of the columns is selected the columns back to front inside the loc df.loc[:,::-1]
-
0
votes2
answers105
viewsQ: Transition in CSS sequentially
I’m developing a quiz where Divs transactions are done via CSS. The problem is that the transaction to add the next Div is running parallel to the withdrawal of the current one. I would like only at…
-
1
votes1
answer886
viewsQ: Create a list of python Dict
I have the following python function def playersID(self, listDetals): listPlayersID = [] tempDict = {} for x in listDetals: for y in x['result']['players']: tempDict.clear() tempDict['match_id'] =…