Posts by AlexCiuffa • 2,402 points
102 posts
-
1
votes3
answers84
viewsA: Dataframe Pandas - How to use a previous value other than NA for calculation
Use the function diff() to calculate the difference between lines. How NaN should be ignored, just remove them first with df[~df['C'].isnull()]: >>> df[~df['C'].isnull()]['C'].diff() 0 NaN…
-
3
votes1
answer51
viewsA: How to transform the subtraction field between dates in python into int?
The column Days_Considered is from guy dtype: timedelta64[ns]. To access the value of days as an integer, use .dt.days: df2['Days_Considered'] = (df2['End_Date'] - df2['Start_Date']).dt.days…
-
0
votes1
answer715
viewsA: Error Typeerror: __init__() Missing 1 required positional argument: 'output_dim' when instantiating convulutional neural network object
Error points to line: ---> self.embedding = layers.Embedding(vocab_size, emb_dim = 128) Checking out the Keras documentation on Embedding layer, we can see that this layer receives as mandatory…
-
0
votes0
answers22
viewsQ: Post Request does not pass `getParams()`meters with Volley’s Stringrequest
When I make a POST Request according to my function postRequest() using the StringRequest of Volley I cannot access the parameters passed on getParams(). However, the method getRequest() works as…
-
1
votes1
answer874
viewsA: Select the top 10 values of a dataframe variable in python?
Pandas has the function .nlargest(). This function takes as parameters: n (int): Number of rows to return columns (list or str): Column(s) used in sorting keep (‘first’, ‘last’ or False): Decides…
pythonanswered AlexCiuffa 2,402 -
3
votes1
answer1112
viewsA: Count number of unique records in a Data Frame
The pandas has the function .nunique(), which returns the amount of unique values in a Data Frame or a Series. In your case, just do: df['customer_id'].nunique()…
-
0
votes1
answer141
viewsA: Sklearn - Difference between preprocessing.Scale() and preprocessing.Standardscaler()
1) What is the difference when using preprocessing.scale() Vs preprocessing.StandardScaler()? Both the StandardScaler().fit(X_train) and the scale(X_train) perform the same operation, but the first…
-
2
votes1
answer67
viewsA: Tensorflow Convolucional Neural Networks
The function .fit() has as a standard a batch_size=32. This means that for a weight upgrade (backpropagation), 32 images will be used at a time. As your dataset has 649 images, will be realziados 21…
-
0
votes2
answers85
viewsA: sklearn library accuracy_score error
In doing accuracy_score(teste_y,teste_x), you are comparing the expected output (teste_y) with the entry of the model (teste_x). It is right to compare the expected output with the model response:…
-
1
votes1
answer164
viewsA: Transform column with Nan and string to integer
The pandas won’t turn np.NaN in int, because he considers it a float. But he can turn into Int64 (or Int16 and Int32). The NaN is transformed into <NA> (pd.NA), which is the null for integers,…
-
5
votes3
answers289
viewsA: How to print a Python operation by adding zero to the left?
If you want the print() double-digit output, for example 02, you can use the function .format(): >>> c = 2 >>> print('{:02}'.format(c)) 02 The problem scored by @Jeanextreme002 is…
pythonanswered AlexCiuffa 2,402 -
4
votes1
answer95
viewsA: Get equivalent expression to list(zip(list, Heights)) using the map() function
The mistake is in the zip(x,y). What the function zip() makes is, for each element of the list x , joins with the corresponding element at the position of the list y in a tuple and adds it to a…
-
1
votes1
answer364
viewsA: Add midline with Seaborn - python
There is a function of matplotlib which inserts a vertical line into the graph, is the .axvline(x=0, ymin=0, ymax=1, **kwargs), that has the documentation here or here. In your case, just calculate…
-
0
votes1
answer38
viewsA: Is it possible to use K-Means (or another Clusterization method) with point limits?
Maybe you should change your approach. If the goal is to have "values that are not close enough to achieve a 'vacancy' in the cluster", a density Clusterization approach seems more appropriate. I…
-
0
votes1
answer111
viewsA: Is there a relationship between data science and data analysis?
Although the necessary knowledge and tools used by both are very similar (statistics, mathematics, programming and business knowledge), they are not the same thing. Data analysis is to process the…
terminologyanswered AlexCiuffa 2,402 -
2
votes1
answer197
viewsA: I have a neural net, and now?
Once trained your model, it is possible to save it with the module pickle. To save a modelo already trained, just do: import pickle filename = 'modelo_final.pkl' with open(filename, 'wb') as file:…
-
2
votes2
answers53
viewsA: Regression curve with x-axis equal to 0
The regressor LinearRegression() sklearn has the attribute intercept_, that returns the y where the regressor intercepts the Y axis, that is, in x = 0. Including, in the example of the documentation…
-
6
votes1
answer572
viewsA: Why should we scale/standardize values of variables and how to reverse this transformation?
Should I schedule my entries? The answer is: depends. The truth is that scheduling your data will not worsen the result, so in doubt, scale. Cases to be staggered If the model is based on the…
ranswered AlexCiuffa 2,402 -
2
votes3
answers2843
viewsA: How does train_test_split work in Scikit Learn?
Why data needs to be divided? An ML algorithm is expected to learn from the training set, but then how do we know if the model is working? If it works with new data? How we compare with other…
-
3
votes3
answers2524
viewsA: Divide date (day, month, year) into new columns - Dataframe Pandas
This error happens because your column data is not the type str, and yes of the type datetime64. To see the column types of your Data Frame, just do >>> df.dtypes data datetime64[ns]…
-
3
votes1
answer101
viewsA: Data similarity with various pandas values
One approach is to calculate the distance of a new data and the data from the Data Frame by calculating the dissimilarity. For this, I suggest using the Distance from Gower. It works as follows:…
-
5
votes1
answer1046
viewsA: How to filter lines that have a certain string?
The solution is to use the function .contains. dados[dados.Value.str.contains("disease", regex=False)] It is worth noting that this function assumes that the string passed is a regular expression,…
-
2
votes1
answer213
viewsA: What is Validation_data for in Keras fit() function
The validation_data It is only used for the test samples, it is not used to train the model. That is, it is not done backpropagation with this data. Its main function is to help find the point where…
-
1
votes2
answers1413
viewsA: How to turn list into array
Given a Data Frame, for example: import pandas as pd import numpy as np df = pd.DataFrame(data = {'listas':[[1,2],[3,4],[5,6]],'nomes':['nome 1','nome 2','nome 3']}) >>> df listas nomes 0…
-
0
votes1
answer37
viewsA: Passing a list like loss_weights, it should have one input per model output. Keras tells me that the model has 1 output, but I thought I had more
Confusion is in the parameter loss_weights of compile with the class_weight of fit. Error says that the model has only one output because only a loss fução was passed:…
-
2
votes1
answer401
viewsA: How to use a quadratic regression model?
Using only the statsmodels: With the statsmodels it is possible to write the desired formula, for example: target ~ np.power(X1, 2) + X2 In this example, it means that we are searching for the…
-
2
votes1
answer221
viewsA: Set steps_per_epoch is dramatically increasing training time
The (default) default of .fit of Keras is: batch_size: if not specified, assumes 32; steps_per_epoch: number of samples (samples) divided by batch size. In the first case, it does 78800 amostras /…
-
0
votes1
answer638
viewsA: How to change only one color in Pillow (Python)
Basing myself in this answer of the OS in English, basically you need to open the image, move to a numpy.array, take the RGB channels, set a condition (e.g., where the color is white) and replace…
-
1
votes1
answer540
viewsA: Separating a dataframe by some python pandas criterio
One way is to take the index of the positive lines, select only 51 values, join with the index of the negative lines and keep only the selected lines: # Pego os ids das linhas com estrelas positovas…
-
0
votes1
answer399
viewsA: Adding column with filter in Pandas
The first step is to filter the Dataframe where vlrLiquido is "CASCOL COMBUSTIVEIS PARA VEICULOS LTDA": df[df['txtDescricaoEspecificacao'] == 'CASCOL COMBUSTIVEIS PARA VEICULOS LTDA'] Then we take…
pandasanswered AlexCiuffa 2,402 -
0
votes1
answer194
viewsA: How to activate 2 if’s at the same time in Python
To press 2 buttons together, just do pyautogui.hotkey('1', '2'). This causes it to press in order and release in reverse order as described here. Now, just take what colors are active and move on to…
-
4
votes1
answer2027
viewsA: Python - Run two scripts at once
A Python library for running parallel processes is threading. Basically, it is possible to declare and execute a process like this: # Define um processo a partir de uma `função_a_ser_executada(arg1,…
-
2
votes1
answer487
viewsA: Concatenating equal data from a column into rows in Python
One way is to: df.groupby('Categoria').agg({'Descricao':lambda col: ', '.join(col)}).reset_index() This way you are grouping the same data from the column Categoria and using for the column…
-
1
votes1
answer61
viewsA: Doubt in models Predict
First, the error occurs in treino, teste, classe_treino, classe_teste = train_test_split(vetor_tfidf2, resenha["classificacao"], random_state = 42) This is because vetor_tfidf2 has only 1 item, and…
-
1
votes3
answers69
viewsA: Transform a list into several within the same
With list comprehension just do: nova_lista = [[i] for i in X] Using numpy, we can change the shape of array: import numpy X = [1,2,3,4,5] nova_lista = numpy.array(X).reshape(len(X),1)…
-
0
votes1
answer41
viewsA: Linearregression score
From the very example of sklearn.linear_model.LogisticRegression: >>> from sklearn.datasets import load_iris >>> from sklearn.linear_model import LogisticRegression >>> X,…
pythonanswered AlexCiuffa 2,402 -
0
votes1
answer665
viewsA: Nameerror: name 'mostrar_urls' is not defined
One way is to specify the function you want to use from the file arquivo_exemplo thus: from arquivo_exemplo import função_que_quero_chamar. In this case, when calling the function, you do not need…
-
1
votes2
answers6875
viewsA: Dataframe - Pandas. Assigning values in columns from comparing another column
On pandas, when we do: df['coluna'] = 'valor', all fields of 'coluna' are filled with 'valor'. So much so that in doing df['classification_roi'] = '', all rows in the column classification_roi…
-
0
votes3
answers46
viewsA: error in creating subplots
The subplot Python takes as parameters the number of rows and columns to draw the grid with the graphs. For its code, fig_2, axes = plt.subplots(2,2, figsize=(20, 5)), the grid will have 4 graphics…
-
3
votes1
answer73
viewsA: Single values average filter in dictionary list
Using pandas, a Python library that works with Dataframes, it is possible to solve this problem easily. import pandas as pd a = [ {'linha': 0, 'porcentagem': 1.0, 'id': 3, 'nome': 'bruno'},…
-
1
votes2
answers377
viewsA: Multiply elements of the first matrix by 3 and decrease elements of the second matrix by 3 in Python
Using numpy operations can be performed directly with arrays and numbers: import numpy as np mt1 = input().split() # Matriz 1 mt1 = np.array(mt1, dtype='int') # passa o input para um numpy.array mt1…
pythonanswered AlexCiuffa 2,402 -
0
votes1
answer466
viewsA: A = A.astype('double') Attributeerror: 'list' Object has no attribute 'astype'
The mistake says, Like list has no attribute astype. I think the variable A should be the numpy.ndarray, and not the type list: A = np.array(A) # passa a lista para um numpy.ndarray A =…
-
1
votes2
answers3879
viewsA: How to separate the year from a date with Python and Pandas?
First step to the column DT_INGRESSO for type datetime (and no longer string). Note that dates are in the format dd/mm/aaaa, then its shape is %d/%m/%Y: df['DT_INGRESSO'] =…
-
2
votes1
answer1848
viewsA: Recognize the color of a python image
One of the Python libraries for working with image is Pillow. With it, you can pick up the colors that appear in an image. from PIL import Image # Abre a imagem img =…
-
2
votes1
answer41
viewsA: It’s not stacking graphic
I think the kind of chart you want is a histogram, where the number of ages per band will be shown. idades = [19, 20, 44, 55, 77, 88, 39] #faixaEtaria=["0-20","21-30","30-45","45-90"]…
pythonanswered AlexCiuffa 2,402 -
0
votes3
answers2910
viewsA: How to delete the first line in a python CSV file
With Python alone, it is possible to read like this: import csv # Lê um CSV com o cabeçalho with open("arquivo.csv") as f: reader = csv.reader(f) #next(reader) # skip header data = [r for r in…
-
0
votes1
answer292
viewsA: how to get the minimum average and maximum time of a timeseries with pandas, based on a column where the value is boleano?
From that your column timestamp are dates. If they are not, you can transform them like this: df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S') One solution is to create…
-
0
votes0
answers52
viewsQ: Calculate the weight of each class of an unbalanced multi-label dataset
I would like to calculate the weight of each class of a dataset multi-label to pass to fit_generator of Keras the parameter class_weight. In the case of a dataset single label, as my output is…
-
3
votes1
answer6569
viewsA: How to join the lines of two Dataframes with Python?
There is the function .concat() of pandas that concatenates two dataframes, but to use it, the columns must have the same name. So we can rename them with the .rename(), another function of pandas.…
-
2
votes1
answer169
viewsA: Take the probability of belonging to each class
To know the percentage of belonging to each class, use the function .predict_proba(). She is similar to .predict(), but returns the probabilities to belong to each class in the form of an array.…