Posts by Rafael Cunha • 4,954 points
101 posts
-
2
votes2
answers24
viewsA: Include "." or "," from right to left of integers
Using the package str, str_c(str_sub(x, start = 1, end = -3), str_sub(x, start = -2), sep = ".") The function str_c concatenate strings, with a certain separator (I used ,). The function str_sub…
ranswered Rafael Cunha 4,954 -
3
votes1
answer90
viewsA: Merge into two worksheets format . csv in R
The merge is doubling the lines by the fact that there are different positions and levels between the 2 databases for the same person. For example, in prof1, Fulano de Tal 1 possesses CARGO P3G and…
-
7
votes2
answers1438
viewsA: What is wide/long data?
Difference between wide and long Wide format In broad format (wide format), the responses of the same individuals will be in a single row and each response will be in a separate column. For example,…
ranswered Rafael Cunha 4,954 -
4
votes2
answers74
viewsA: Problem in generating multiple charts
First I re-structured your data into the format long (columns with months, years and values) using the function gather package tidyr: library(tidyr) dados_long <- gather(dados, ano, valor,…
-
5
votes2
answers661
viewsA: How to transform the class of a "factor" column into "date" within a data.frame?
When importing the file csv include the argument stringAsFactors = FALSE, so your column with the dates will be read as character. Then just apply the mutate as you are doing. Just note that as the…
-
1
votes2
answers689
viewsA: How to create Time Series using Start and End in R?
Using the package zoo with the data you have made available: library(zoo) plot(zoo(dados_dia$QTDE_COMPRAS, seq(from = as.Date("2018-07-19"), to = as.Date("2018-09-06"), by = 1))) If you repair, in…
-
5
votes2
answers455
viewsA: Filter data frame according to indexes (lines) stored in a vector
If I understand correctly, you want to select the data frame lines with the information about the tourist spots from the dynamic vector, correct? Be then df your data frame: df <- data.frame(X =…
ranswered Rafael Cunha 4,954 -
2
votes2
answers75
viewsA: Split Column into Other Two Lat Long
I couldn’t get a direct solution, I used the function separate package tidyverse twice and made some manipulations on the resulting variables to get the result you expect. Being dados your bank,…
-
4
votes1
answer50
viewsA: Problem creating a data.frame
As in your example you have the variables that will build the data.frame, in a very generic way you can do as follows: data.frame(c(x,z,y), row.names = c(x1, z1, y1)) However, I think it’s smarter…
-
4
votes1
answer523
viewsA: How to create dummy variables
The closest I got to the result you expect was using the function dummyVars package caret. The result was not equal because the example you gave does not have the number 1 in the column X2, so it is…
ranswered Rafael Cunha 4,954 -
2
votes2
answers121
viewsA: calculate a probability function in R
The way you solve this problem is by using the inverse cumulative distribution function of this variable. You know the probabilities of the variable X assume each of the possible values (of 1 to 6),…
ranswered Rafael Cunha 4,954 -
1
votes1
answer122
viewsA: How to return the most prevalent category associated with a group?
Using the package dplyr: library(dplyr) dataset %>% group_by(a, b) %>% summarise(count = n()) %>% mutate(percent = count/sum(count)) %>% filter(count == max(count))…
ranswered Rafael Cunha 4,954 -
6
votes1
answer48
viewsA: Mean of repeated lines
Using the package dplyr, you can use the functions group_by (which will group by genes) and summarise_all (which will summarize all columns according to a predetermined function). library(dplyr)…
-
3
votes1
answer577
viewsA: How to validate a CPF using a function in R?
I made changes to the three functions created by @Rui Barradas in this post. I modified the algorithm generating the check digits since when I put my CPF, the generated values did not match. The new…
ranswered Rafael Cunha 4,954 -
3
votes2
answers48
viewsA: Matrix of Expression
Being dados its headquarters, ifelse(dados < 0, -1, ifelse(dados > 0, 1, 0) ) will return what you expect. The function ifelse is testing if the values of dados are negative (dados < 0). If…
-
2
votes1
answer73
viewsA: Error with lapply function
Without the data it becomes very complicated to replicate the problem you are encountering. Using the structure you have already posted in other questions mylist [[1]] number group sexo 1 26.12186 a…
ranswered Rafael Cunha 4,954 -
3
votes1
answer144
viewsA: How to remove lines based on the values of another variable?
The function filter package dplyr meets what you want library(dplyr) data %>% filter(!is.na(a)) a b 1 1 1 2 3 NA 3 4 NA 4 5 4 5 6 6 6 6 NA In this case I have studied the elements that are not NA…
ranswered Rafael Cunha 4,954 -
4
votes1
answer328
viewsA: Stopword does not work
The words são, até and é are turning NA when the function is applied iconv. If you omit this line from your code, the end result will be what you expect. ## Cria lista de stop words para portugues…
-
3
votes1
answer47
viewsA: Make the variables created with lapply be allocated in their respective dataframe within a list
Using the following example: lista <- list( A = data.frame(Sigla = sample(LETTERS, 20, rep = T), Município = sample(letters, 20, rep = T)), B = data.frame(Sigla = sample(LETTERS, 20, rep = T),…
ranswered Rafael Cunha 4,954 -
1
votes1
answer160
viewsA: Convert number to minute and subtract those minutes from a column containing date and time R
I created a database with the example of the variables you reported to facilitate. dados <- data.frame( inicio = c("2018-06-01 12:00", "2018-06-01 11:00", "2018-06-01 15:20"), minuto1 = c(105,…
ranswered Rafael Cunha 4,954 -
7
votes2
answers78
viewsA: Deleting columns containing NA s in their last 5 rows
Using the dplyr: df.final %>% select_if(colSums(is.na(tail(., 5))) == 0)
ranswered Rafael Cunha 4,954 -
3
votes1
answer321
viewsA: How to turn a given 1:30 hour into 90 minutes in R
Using the following vector as an example horas <- c("1:00:00", "0:45:00", "0:30:00", "1:30:00") Follow two options: using the package chron library(chron) ch <- times(horas) 60 * hours(ch) +…
ranswered Rafael Cunha 4,954 -
1
votes1
answer373
viewsA: Insert variable names in the first line of the dataframe
There is probably a more practical way but it follows code that returns the structure you expect: data[nrow(data)+1,] <- names(data) data <- data[c(nrow(data), 1:(nrow(data)-1)),]…
ranswered Rafael Cunha 4,954 -
8
votes3
answers166
viewsA: Find an expression in several elements of a list
Using the function which inside lapply lapply(lista, function(x) which(x == "José da Silva")) [[1]] [1] 1 [[2]] integer(0) This is an option to search for an exact term, as in his example the "José…
ranswered Rafael Cunha 4,954 -
4
votes3
answers1376
viewsA: Fill column of a data frame with data from another data frame in R
Ideally you would have made your database available (through the function dput), or a part of it at least. With this example you passed, if these blank lines are NA, you can use the function FillIn…
-
3
votes1
answer143
viewsQ: Optimization of R code
I wonder if anyone has any suggestions so I can optimize the code below. The idea I took from that website. You have a permutation of n cards, for example [2, 4, 1, 3] (where the 2 is the top card).…
rasked Rafael Cunha 4,954 -
9
votes2
answers639
viewsA: How to split the dataframes of a list based on a group variable, common in all of them?
You can apply the function filter package dplyr within a lapply lapply(mylist, dplyr::filter, group %in% c("a", "c")) lapply will apply the function filter, with specific arguments: select groups a…
-
4
votes2
answers711
viewsA: Issue discriminant function constant - linear discriminant analysis [R]
Doing a search of the OS, I ended up finding this topical that is calculated constant based on the mathematical formula. The code below will return you the value of 4.437946, which differs from the…
-
4
votes1
answer126
viewsA: Replace NA with data from another column
When creating your example, the variable TIPO comes as factor. I had to turn her into character to assign a number in the empty position. dados <- data.frame(NOME = c("ABC", "ADD", "AFF", "DDD",…
-
5
votes2
answers1529
viewsA: Remove NA in a Data Frame
In office write.table. write.csv and write.csv2, there is an option (na) where you define how you want the missing data to be exported. Try write.table(x, ..., na = "")…
-
6
votes2
answers817
viewsA: Overlay graphics in R with ggplot
I made some changes to your code. I changed the name of the variables df2 for Q.1, Q.2, Q.3, Q.4 and Q.5. In long2 I called the variable gabarito because it represents the number of students who…
-
8
votes3
answers4047
viewsA: A: How to count and sum the amount of a certain "factor" in the observations (lines) of a data.frame?
Using the package dplyr: library(dplyr) Base <- Base %>% mutate(Total_Sim = rowSums(. == "Sim"))
-
4
votes1
answer524
viewsA: Pdf reading via R
Using the package tabulizer, I extracted the information only from the first page to test: library(tabulizer) library(dplyr) library(stringi) url <-…
-
4
votes3
answers91
viewsA: Is it possible to pair values of two dataframes with different observation numbers?
Using the packages dplyr and tidyr: library(dplyr) library(tidyr) data2 <- data2 %>% gather(Sexo, Taxa, TaxaHomens:TaxaMulheres) data2$Sexo <- ifelse(data2$Sexo == "TaxaHomens", 1, 2)…
ranswered Rafael Cunha 4,954 -
4
votes2
answers60
viewsA: Changing variable value
You can create an index to know in which positions cv1 possess 0, ind <- which(cv1 == 0) then just replace with NA: cv2 <- cv1 cv2[ind] <- NA…
ranswered Rafael Cunha 4,954 -
3
votes1
answer658
viewsA: How to count repeated arguments in R
Take a look at the function table. But what’s more, it would be interesting to improve your question by providing your data set, or a part of it (function dput). table(ColunadeInteresse)…
-
3
votes2
answers574
viewsA: Does not create the label on the ggplot2 chart
I managed to fix it. I first added the option fill = "v?" within each aes of geom_area (for each variable of interest). I removed the following line from your code…
-
4
votes1
answer198
viewsA: select data according to dates
First I create a data.frame two-column, coluna1 (containing random values) and coluna2 containing some dates. dados <- data.frame(coluna1 = rnorm(10), coluna2 = as.Date(c("2016/12/15",…
-
4
votes1
answer83
viewsA: Change chart fill orientation
I got the answer on ONLY. Just add the command scale_y_reverse ().…
-
5
votes1
answer83
viewsQ: Change chart fill orientation
I would like to know if there is a possibility to change the orientation of the filling of the colors of the following chart What I want is for the colors to be filled from the outside in. Does…
-
9
votes1
answer181
viewsA: Multiple join on R
You can specify the columns by argument by, by.x and by.y (if the names of the variables are different between the data.frame). Thus, merge(dados, dados_aux, by = c("CIDADE", "UF")) should give you…
-
4
votes1
answer86
viewsA: Negative age in R
A solution using the package chron library(chron) base.teste <- c("04/03/73", "10/09/67", "21/12/74", "17/04/76", "25/03/66", "11/03/73", "06/08/79") base.teste <- chron(base.teste, format =…
-
2
votes2
answers133
viewsA: Indicator on R with more than one condition with duplicate values
As Rui said, his source database is different from the database with the expected result. Also, I had a different understanding because I think the municipality of RIOBOM would have the indicator 0…
-
2
votes2
answers144
viewsA: separate columns and remove the letter t
Try to modify the separator. In your case, by output, the comma is not the column separator. dados<-read.table("especies identificadas.txt",header=T,sep="\t")
-
1
votes1
answer176
viewsA: Convert char to time
Using the package chron you can transform your vector of Character for the guy teams , that will allow you to perform operations as average, median. cpc_call <- c("00:01:20", "00:01:46",…
ranswered Rafael Cunha 4,954 -
1
votes1
answer146
viewsA: How to insert column average in all NA values
Follow code to replace the NA by the average of the column where they meet: for(i in 1:nrow(df)){ for(j in 1:ncol(df)){ if(is.na(df[i,j])){ df[i,j] <- mean(df[,j], na.rm = T) } } }…
ranswered Rafael Cunha 4,954 -
5
votes1
answer612
viewsA: Import data from central bank to R
The package rbcb. The title of the package is R Interface to Brazilian Central Bank Web Services. You can get more details on github of the creator.…
ranswered Rafael Cunha 4,954 -
3
votes1
answer55
viewsQ: Delete totalizing lines
I have the following structure of a database: MES EST.DET1 EST.DET2 EST.DET3 DIAS 2 Curso 1 Turma A Manha 5 2 Curso 1 Turma A Tarde 5 2 Curso 1 Turma B <NA> 5 2 Curso 1 <NA> <NA>…
rasked Rafael Cunha 4,954 -
4
votes1
answer755
viewsA: Multi-line chart
Using the package ggplot2: dados <- read.table(text = "Animal Dia Ganho 5 6 0.792598868 5 7 0.69531978 5 8 0.69249055 5 9 0.67807778 5 10 0.671494999 5 11 0.655610838 6 7 0.837702569 6 8…
ranswered Rafael Cunha 4,954 -
3
votes1
answer94
viewsA: Organize a time series
I used the packages dplyr and tidyr to solve your problem. dados <- data.frame( Animal = c(rep(5,6), rep(6,6)), Dia = c(2,9,16,23,30,37,5,10,17,24,33,38), Ganho =…