Posts by Rafael Cunha • 4,954 points

101 posts

2
votes

2
answers

24
views

A: Include "." or "," from right to left of integers

Using the package str, str_c(str_sub(x, start = 1, end = -3), str_sub(x, start = -2), sep = ".") The function str_c concatenate strings, with a certain separator (I used ,). The function str_sub…

r
answered 6 years, 3 months ago Rafael Cunha 4,954
3
votes

1
answer

90
views

A: Merge into two worksheets format . csv in R

The merge is doubling the lines by the fact that there are different positions and levels between the 2 databases for the same person. For example, in prof1, Fulano de Tal 1 possesses CARGO P3G and…

r dplyr merge tidyverse
answered 6 years, 5 months ago Rafael Cunha 4,954
7
votes

2
answers

1438
views

A: What is wide/long data?

Difference between wide and long Wide format In broad format (wide format), the responses of the same individuals will be in a single row and each response will be in a separate column. For example,…

r
answered 6 years, 5 months ago Rafael Cunha 4,954
4
votes

2
answers

74
views

A: Problem in generating multiple charts

First I re-structured your data into the format long (columns with months, years and values) using the function gather package tidyr: library(tidyr) dados_long <- gather(dados, ano, valor,…

r ggplot2
answered 6 years, 6 months ago Rafael Cunha 4,954
5
votes

2
answers

661
views

A: How to transform the class of a "factor" column into "date" within a data.frame?

When importing the file csv include the argument stringAsFactors = FALSE, so your column with the dates will be read as character. Then just apply the mutate as you are doing. Just note that as the…

r classes
answered 6 years, 6 months ago Rafael Cunha 4,954
1
votes

2
answers

689
views

A: How to create Time Series using Start and End in R?

Using the package zoo with the data you have made available: library(zoo) plot(zoo(dados_dia$QTDE_COMPRAS, seq(from = as.Date("2018-07-19"), to = as.Date("2018-09-06"), by = 1))) If you repair, in…

r time-series
answered 6 years, 6 months ago Rafael Cunha 4,954
5
votes

2
answers

455
views

A: Filter data frame according to indexes (lines) stored in a vector

If I understand correctly, you want to select the data frame lines with the information about the tourist spots from the dynamic vector, correct? Be then df your data frame: df <- data.frame(X =…

r
answered 6 years, 6 months ago Rafael Cunha 4,954
2
votes

2
answers

75
views

A: Split Column into Other Two Lat Long

I couldn’t get a direct solution, I used the function separate package tidyverse twice and made some manipulations on the resulting variables to get the result you expect. Being dados your bank,…

string r
answered 6 years, 8 months ago Rafael Cunha 4,954
4
votes

1
answer

50
views

A: Problem creating a data.frame

As in your example you have the variables that will build the data.frame, in a very generic way you can do as follows: data.frame(c(x,z,y), row.names = c(x1, z1, y1)) However, I think it’s smarter…

r rstudio
answered 6 years, 9 months ago Rafael Cunha 4,954
4
votes

1
answer

523
views

A: How to create dummy variables

The closest I got to the result you expect was using the function dummyVars package caret. The result was not equal because the example you gave does not have the number 1 in the column X2, so it is…

r
answered 6 years, 9 months ago Rafael Cunha 4,954
2
votes

2
answers

121
views

A: calculate a probability function in R

The way you solve this problem is by using the inverse cumulative distribution function of this variable. You know the probabilities of the variable X assume each of the possible values (of 1 to 6),…

r
answered 6 years, 9 months ago Rafael Cunha 4,954
1
votes

1
answer

122
views

A: How to return the most prevalent category associated with a group?

Using the package dplyr: library(dplyr) dataset %>% group_by(a, b) %>% summarise(count = n()) %>% mutate(percent = count/sum(count)) %>% filter(count == max(count))…

r
answered 6 years, 9 months ago Rafael Cunha 4,954
6
votes

1
answer

48
views

A: Mean of repeated lines

Using the package dplyr, you can use the functions group_by (which will group by genes) and summarise_all (which will summarize all columns according to a predetermined function). library(dplyr)…

r date
answered 6 years, 9 months ago Rafael Cunha 4,954
3
votes

1
answer

577
views

A: How to validate a CPF using a function in R?

I made changes to the three functions created by @Rui Barradas in this post. I modified the algorithm generating the check digits since when I put my CPF, the generated values did not match. The new…

r
answered 6 years, 9 months ago Rafael Cunha 4,954
3
votes

2
answers

48
views

A: Matrix of Expression

Being dados its headquarters, ifelse(dados < 0, -1, ifelse(dados > 0, 1, 0) ) will return what you expect. The function ifelse is testing if the values of dados are negative (dados < 0). If…

r matrix
answered 6 years, 9 months ago Rafael Cunha 4,954
2
votes

1
answer

73
views

A: Error with lapply function

Without the data it becomes very complicated to replicate the problem you are encountering. Using the structure you have already posted in other questions mylist [[1]] number group sexo 1 26.12186 a…

r
answered 6 years, 10 months ago Rafael Cunha 4,954
3
votes

1
answer

144
views

A: How to remove lines based on the values of another variable?

The function filter package dplyr meets what you want library(dplyr) data %>% filter(!is.na(a)) a b 1 1 1 2 3 NA 3 4 NA 4 5 4 5 6 6 6 6 NA In this case I have studied the elements that are not NA…

r
answered 6 years, 10 months ago Rafael Cunha 4,954
4
votes

1
answer

328
views

A: Stopword does not work

The words são, até and é are turning NA when the function is applied iconv. If you omit this line from your code, the end result will be what you expect. ## Cria lista de stop words para portugues…

r filing-cabinet
answered 6 years, 10 months ago Rafael Cunha 4,954
3
votes

1
answer

47
views

A: Make the variables created with lapply be allocated in their respective dataframe within a list

Using the following example: lista <- list( A = data.frame(Sigla = sample(LETTERS, 20, rep = T), Município = sample(letters, 20, rep = T)), B = data.frame(Sigla = sample(LETTERS, 20, rep = T),…

r
answered 6 years, 10 months ago Rafael Cunha 4,954
1
votes

1
answer

160
views

A: Convert number to minute and subtract those minutes from a column containing date and time R

I created a database with the example of the variables you reported to facilitate. dados <- data.frame( inicio = c("2018-06-01 12:00", "2018-06-01 11:00", "2018-06-01 15:20"), minuto1 = c(105,…

r
answered 6 years, 10 months ago Rafael Cunha 4,954
7
votes

2
answers

78
views

A: Deleting columns containing NA s in their last 5 rows

Using the dplyr: df.final %>% select_if(colSums(is.na(tail(., 5))) == 0)

r
answered 6 years, 10 months ago Rafael Cunha 4,954
3
votes

1
answer

321
views

A: How to turn a given 1:30 hour into 90 minutes in R

Using the following vector as an example horas <- c("1:00:00", "0:45:00", "0:30:00", "1:30:00") Follow two options: using the package chron library(chron) ch <- times(horas) 60 * hours(ch) +…

r
answered 6 years, 10 months ago Rafael Cunha 4,954
1
votes

1
answer

373
views

A: Insert variable names in the first line of the dataframe

There is probably a more practical way but it follows code that returns the structure you expect: data[nrow(data)+1,] <- names(data) data <- data[c(nrow(data), 1:(nrow(data)-1)),]…

r
answered 6 years, 10 months ago Rafael Cunha 4,954
8
votes

3
answers

166
views

A: Find an expression in several elements of a list

Using the function which inside lapply lapply(lista, function(x) which(x == "José da Silva")) [[1]] [1] 1 [[2]] integer(0) This is an option to search for an exact term, as in his example the "José…

r
answered 6 years, 10 months ago Rafael Cunha 4,954
4
votes

3
answers

1376
views

A: Fill column of a data frame with data from another data frame in R

Ideally you would have made your database available (through the function dput), or a part of it at least. With this example you passed, if these blank lines are NA, you can use the function FillIn…

r rstudio
answered 6 years, 10 months ago Rafael Cunha 4,954
3
votes

1
answer

143
views

Q: Optimization of R code

I wonder if anyone has any suggestions so I can optimize the code below. The idea I took from that website. You have a permutation of n cards, for example [2, 4, 1, 3] (where the 2 is the top card).…

r
asked 6 years, 10 months ago Rafael Cunha 4,954
9
votes

2
answers

639
views

A: How to split the dataframes of a list based on a group variable, common in all of them?

You can apply the function filter package dplyr within a lapply lapply(mylist, dplyr::filter, group %in% c("a", "c")) lapply will apply the function filter, with specific arguments: select groups a…

r split
answered 6 years, 10 months ago Rafael Cunha 4,954
4
votes

2
answers

711
views

A: Issue discriminant function constant - linear discriminant analysis [R]

Doing a search of the OS, I ended up finding this topical that is calculated constant based on the mathematical formula. The code below will return you the value of 4.437946, which differs from the…

r discriminating-function
answered 6 years, 11 months ago Rafael Cunha 4,954
4
votes

1
answer

126
views

A: Replace NA with data from another column

When creating your example, the variable TIPO comes as factor. I had to turn her into character to assign a number in the empty position. dados <- data.frame(NOME = c("ABC", "ADD", "AFF", "DDD",…

r rstudio
answered 6 years, 11 months ago Rafael Cunha 4,954
5
votes

2
answers

1529
views

A: Remove NA in a Data Frame

In office write.table. write.csv and write.csv2, there is an option (na) where you define how you want the missing data to be exported. Try write.table(x, ..., na = "")…

r rstudio
answered 6 years, 11 months ago Rafael Cunha 4,954
6
votes

2
answers

817
views

A: Overlay graphics in R with ggplot

I made some changes to your code. I changed the name of the variables df2 for Q.1, Q.2, Q.3, Q.4 and Q.5. In long2 I called the variable gabarito because it represents the number of students who…

r ggplot2 graphic
answered 7 years, 2 months ago Rafael Cunha 4,954
8
votes

3
answers

4047
views

A: A: How to count and sum the amount of a certain "factor" in the observations (lines) of a data.frame?

Using the package dplyr: library(dplyr) Base <- Base %>% mutate(Total_Sim = rowSums(. == "Sim"))

r dplyr
answered 7 years, 4 months ago Rafael Cunha 4,954
4
votes

1
answer

524
views

A: Pdf reading via R

Using the package tabulizer, I extracted the information only from the first page to test: library(tabulizer) library(dplyr) library(stringi) url <-…

r pdf
answered 7 years, 4 months ago Rafael Cunha 4,954
4
votes

3
answers

91
views

A: Is it possible to pair values of two dataframes with different observation numbers?

Using the packages dplyr and tidyr: library(dplyr) library(tidyr) data2 <- data2 %>% gather(Sexo, Taxa, TaxaHomens:TaxaMulheres) data2$Sexo <- ifelse(data2$Sexo == "TaxaHomens", 1, 2)…

r
answered 7 years, 4 months ago Rafael Cunha 4,954
4
votes

2
answers

60
views

A: Changing variable value

You can create an index to know in which positions cv1 possess 0, ind <- which(cv1 == 0) then just replace with NA: cv2 <- cv1 cv2[ind] <- NA…

r
answered 7 years, 4 months ago Rafael Cunha 4,954
3
votes

1
answer

658
views

A: How to count repeated arguments in R

Take a look at the function table. But what’s more, it would be interesting to improve your question by providing your data set, or a part of it (function dput). table(ColunadeInteresse)…

array r matrix argument spreadsheets
answered 7 years, 4 months ago Rafael Cunha 4,954
3
votes

2
answers

574
views

A: Does not create the label on the ggplot2 chart

I managed to fix it. I first added the option fill = "v?" within each aes of geom_area (for each variable of interest). I removed the following line from your code…

r ggplot2
answered 7 years, 5 months ago Rafael Cunha 4,954
4
votes

1
answer

198
views

A: select data according to dates

First I create a data.frame two-column, coluna1 (containing random values) and coluna2 containing some dates. dados <- data.frame(coluna1 = rnorm(10), coluna2 = as.Date(c("2016/12/15",…

r rstudio
answered 7 years, 5 months ago Rafael Cunha 4,954
4
votes

1
answer

83
views

A: Change chart fill orientation

I got the answer on ONLY. Just add the command scale_y_reverse ().…

r ggplot2
answered 7 years, 5 months ago Rafael Cunha 4,954
5
votes

1
answer

83
views

Q: Change chart fill orientation

I would like to know if there is a possibility to change the orientation of the filling of the colors of the following chart What I want is for the colors to be filled from the outside in. Does…

r ggplot2
asked 7 years, 5 months ago Rafael Cunha 4,954
9
votes

1
answer

181
views

A: Multiple join on R

You can specify the columns by argument by, by.x and by.y (if the names of the variables are different between the data.frame). Thus, merge(dados, dados_aux, by = c("CIDADE", "UF")) should give you…

r rstudio
answered 7 years, 6 months ago Rafael Cunha 4,954
4
votes

1
answer

86
views

A: Negative age in R

A solution using the package chron library(chron) base.teste <- c("04/03/73", "10/09/67", "21/12/74", "17/04/76", "25/03/66", "11/03/73", "06/08/79") base.teste <- chron(base.teste, format =…

r lubridate
answered 7 years, 6 months ago Rafael Cunha 4,954
2
votes

2
answers

133
views

A: Indicator on R with more than one condition with duplicate values

As Rui said, his source database is different from the database with the expected result. Also, I had a different understanding because I think the municipality of RIOBOM would have the indicator 0…

r dplyr
answered 7 years, 6 months ago Rafael Cunha 4,954
2
votes

2
answers

144
views

A: separate columns and remove the letter t

Try to modify the separator. In your case, by output, the comma is not the column separator. dados<-read.table("especies identificadas.txt",header=T,sep="\t")

r path-separator
answered 7 years, 6 months ago Rafael Cunha 4,954
1
votes

1
answer

176
views

A: Convert char to time

Using the package chron you can transform your vector of Character for the guy teams , that will allow you to perform operations as average, median. cpc_call <- c("00:01:20", "00:01:46",…

r
answered 7 years, 7 months ago Rafael Cunha 4,954
1
votes

1
answer

146
views

A: How to insert column average in all NA values

Follow code to replace the NA by the average of the column where they meet: for(i in 1:nrow(df)){ for(j in 1:ncol(df)){ if(is.na(df[i,j])){ df[i,j] <- mean(df[,j], na.rm = T) } } }…

r
answered 7 years, 7 months ago Rafael Cunha 4,954
5
votes

1
answer

612
views

A: Import data from central bank to R

The package rbcb. The title of the package is R Interface to Brazilian Central Bank Web Services. You can get more details on github of the creator.…

r
answered 7 years, 7 months ago Rafael Cunha 4,954
3
votes

1
answer

55
views

Q: Delete totalizing lines

I have the following structure of a database: MES EST.DET1 EST.DET2 EST.DET3 DIAS 2 Curso 1 Turma A Manha 5 2 Curso 1 Turma A Tarde 5 2 Curso 1 Turma B <NA> 5 2 Curso 1 <NA> <NA>…

r
asked 7 years, 8 months ago Rafael Cunha 4,954
4
votes

1
answer

755
views

A: Multi-line chart

Using the package ggplot2: dados <- read.table(text = "Animal Dia Ganho 5 6 0.792598868 5 7 0.69531978 5 8 0.69249055 5 9 0.67807778 5 10 0.671494999 5 11 0.655610838 6 7 0.837702569 6 8…

r
answered 7 years, 8 months ago Rafael Cunha 4,954
3
votes

1
answer

94
views

A: Organize a time series

I used the packages dplyr and tidyr to solve your problem. dados <- data.frame( Animal = c(rep(5,6), rep(6,6)), Dia = c(2,9,16,23,30,37,5,10,17,24,33,38), Ganho =…

r time-series
answered 7 years, 8 months ago Rafael Cunha 4,954