Posts by Rui Barradas • 15,422 points
432 posts
-
3
votes2
answers28
viewsA: Command to create multiple columns in a data.frame conditioned to other 2 columns
A basic R solution. # primeiro determinar as colunas "V" com # os mesmos números das colunas "G" v <- grep("^V", names(dd)) g <- match(sub("^V", "", names(dd)[v]), sub("^G", "", names(dd))) #…
-
2
votes1
answer26
viewsA: Modify data columns in R
If you install the package cellranger can specify the columns to read. colTypes <- rep("numeric", 5) cols <- cellranger::cell_cols(2:6) expec_anual_ipca <-…
ranswered Rui Barradas 15,422 -
2
votes1
answer40
viewsA: How to group and count lines in a database in R?
First the totals of pregnant women per CODMUNRES and by age group. idades <- c(0, 16, 18, 23, 30, 36, Inf) labels <- c("<16", "17-18", "19-23", "24-30", "31-36", ">36") tbl <-…
ranswered Rui Barradas 15,422 -
2
votes1
answer35
viewsA: Plot order scatterplot ggplot2
One way to plot the smallest points over the largest ones is to notice that the smallest points correspond to the Leaf_set == "With" and use two geom_point, filtering the Leaf_set corresponding to…
-
2
votes2
answers29
viewsA: Error compiling R 4.1 package on Debian-based linux
Start by editing the reporitory file for R version 4.*, see the official documentation on Debian Packages of R Software: sudo add-apt-repository 'deb http://cloud.r-project.org/bin/linux/debian…
-
0
votes1
answer31
viewsA: GLM with non-significant P values
I think you’re complicating the problem. lm and glm In the case of your question, you have the generalized Gaussian model with link identity. This is equivalent to the linear model. It is not worth…
-
4
votes1
answer27
viewsA: Combinatorics and Probability
A solution may be the following. Use the function to apply to each combination to obtain a logical vector if both (all) the letters are in each combination. Still use which to give a numerical…
ranswered Rui Barradas 15,422 -
2
votes1
answer27
viewsA: Barcode graph grouped with moving midline in R
The base graph is simple, it’s called barplot and then legend for the chart caption. The important step is to save the output from barplot, and then use in spline. cores <- c("#2e8b57",…
ranswered Rui Barradas 15,422 -
2
votes1
answer17
viewsA: Problem generating column chart grouped in R
Here are two solutions, R base and package ggplot2. 1. The data The reading of the data can be done with textConnection and scan. txt <- "7 2 3 3 2 4 3 4 4" riq1 <- scan(textConnection(txt))…
-
2
votes1
answer71
viewsA: Changing chart data label
1. First load the required packages without the package lubridate which is not used in the code below. library(ggplot2) library(gridExtra) library(ggpubr) And simplify both graphics by creating a…
-
0
votes1
answer31
viewsA: loop to replicate a dataframe in R
Expected output is unclear, may be the repeated basis times times, for each value in times; can be each of the lines repeated times times, also for each value in times. Here are two ways to do what…
-
3
votes1
answer35
viewsA: I calculate in R, data value with the mean and standard deviation of the column
R base Just apply the base function scale to each of the columns. res <- teste res[-1] <- lapply(res[-1], scale) res # ANO C1 C2 #1 2011 -0.7071068 -0.7071068 #2 2012 0.7071068 0.7071068…
-
0
votes1
answer23
viewsA: R language - Column separation
Here are three ways to separate words from numbers in the two columns. The first uses sub, the second and the third str_extract package stringr. library(dplyr) library(tidyr) library(stringr) df2…
-
3
votes1
answer30
viewsA: How to compare values of the same variable and generate another variable that has the result of this comparison in R
Here is a solution in a line with ave and seq_along. dados$Numero.do.Cargo.2 <- as.integer(ave(dados$Cargo, dados$Cargo, FUN = seq_along)) As you can see, the third column is identical to the…
ranswered Rui Barradas 15,422 -
1
votes2
answers41
viewsA: R, How to calculate the mean of a variable x for each group of other variables
Reformat to long format and then group and calculate the averages all at once. library(dplyr) library(tidyr) mtcars %>% select(mpg, vs, am, gear, carb) %>% pivot_longer(-mpg) %>%…
-
2
votes1
answer33
viewsA: Function in R is not taking the parameter correctly
The current way evaluates the variable with {{.}}. packageVersion("dplyr") #[1] ‘1.0.7’ df_funcao = function(PERIODO){ df %>% dplyr::select({{PERIODO}}) %>% filter(!is.na({{PERIODO}})) %>%…
-
2
votes1
answer46
viewsA: How not to display levels without values using scale_fill_manual?
Values that are not in the column CAATINGA appear in the caption because they are in the vector cols. And as only some of those values are also names vector labels, those who are not are in the…
-
1
votes3
answers61
viewsA: How to transform data frame variables into the indexes of a matrix with R?
The basic function xtabs solves the problem in a line of code but first you have to transform the columns userID and itemID in "factor" with the full levels. dados$userID <- factor(dados$userID,…
-
1
votes2
answers39
viewsA: Using If in R in a data frame where the column is string
Here are two ways without if or ifelse, one in R base and the other with package dplyr. First, to make the code more readable, vectors are created with the codes of UF. Sudeste <- c("SP", "RJ",…
ranswered Rui Barradas 15,422 -
3
votes3
answers102
viewsA: How to do linear interpolation on R?
I believe that the problem has more to do with regression than with interpolation. If so, a linear regression model will be fit <- lm(Income ~ ., data = dados, subset = YEAR %in% c(1960, 1980))…
ranswered Rui Barradas 15,422 -
3
votes1
answer26
viewsA: How to make a filter based on a condition?
Here’s a way with the package dplyr. Like CLASSWK can take various values used %in% and not equality. library(dplyr) x %>% filter((CLASSWK %in% 1:2 & IND != 0) | IND == 0) # CLASSWK IND #1 1…
-
2
votes1
answer55
viewsA: Quantitative position
Here are two solutions, one for calculating the percentages of values per year/month and the other only per month. Per year/month Lines are counted with count twice, without and with the column…
-
1
votes2
answers42
viewsA: Loop is not walking, does not leave the first position
Here’s an R-based solution without cycles for. Utilizes cumsum and findInterval to determine the values of y. f3 <- function(n, fmp){ prob_acum <- cumsum(fmp) p <- runif(n) findInterval(p,…
-
0
votes2
answers40
viewsA: I want to group 3 columns into one, separated by a comma in the R
A base R solution with apply. dados[[1]] <- apply(dados, 1, paste, collapse = ", ") dados <- dados[1] dados # x #A 1.1, 1.2, 1.3 #B 2.1, 2.2, 2.3 Dice dados <- read.table(text = " x y z A…
-
3
votes1
answer18
viewsA: Expand / insert new rows into a data frame based on the value of a discrete variable
The problem can be solved in a line of code: output <- input[rep(row.names(input), input$volume), ]
-
6
votes1
answer35
viewsA: How to check if a value is contained in another list
One can do what the question asks with a logical index created with %in%. i <- df$Situação %in% unlist(lista) df$resultado <- 0 df$resultado[i] <- df$valor[i] df # Nome Situação valor…
ranswered Rui Barradas 15,422 -
2
votes1
answer53
viewsA: Problems when grouping information through ifelse and case_when functions
Here’s a solution with the package dplyr. To make the code more readable, vectors are first created with the states of each region. Then, in a pipe, the case_when assigns a region to each state.…
-
1
votes3
answers44
viewsA: How to index subgroups in R
If the result of the processing is equal for all groups, then it does not depend on the group. This solution uses the fact that a logical condition corresponds to the integers 0/1. It is then…
ranswered Rui Barradas 15,422 -
3
votes2
answers98
viewsA: How to plot a graph of columns grouped with ggplot
This type of problem is usually related to data reformatting. The format should be long and the data is in broad format. See this post on how to reformat data from wide to long format. In the code…
-
4
votes2
answers56
viewsA: Apply a function to a dataframe R
Here’s a very simple answer to a frequent problem. When you want to create a condition-based binary variable, it is not necessary if, or ifelse. Like the logical values FALSE/TRUE are coded…
-
3
votes1
answer45
viewsA: Automatically creating new variables through interaction between two pre-existing variables
This base R solution is not a function, but gives an idea of how to get the desired result. It uses combn to apply interaction combinations 2 and 2 of the columns "letras", "numeros" and "cores".…
-
0
votes1
answer19
viewsA: Is there a way to replace the 0 of one column with values of another column in the dataframe in R?
A logical vector is created to index the elements of nu_funcio zero. E uses this vector to assign the values of qt_funcio. Then just remove the extra column. i <- dados$nu_funcio == 0…
ranswered Rui Barradas 15,422 -
2
votes2
answers39
viewsA: Picking a vector within an R matrix
The function below uses only one cycle apply, the rest is vectored. get_neighbor <- function(x, m, k){ # função auxiliar dist_xy <- function(x, y) sqrt(sum((x - y)^2)) # verificar se k é…
-
2
votes1
answer34
viewsA: how to determine the percentile that a given value has in a sample in R
There is no inverse function of quantile but it is not difficult to get the percentile of any value x0. x0 <- 35 First it is determined with findInterval where is x0 in the vector x ordered. To…
ranswered Rui Barradas 15,422 -
1
votes1
answer29
viewsA: Sort values without repeating them, in R
Here’s a solution with the package dplyr Assign to the identifier variable the numbers of the match between substancia and unique(substancia). library(dplyr) prec_med %>% mutate(cod_subs =…
ranswered Rui Barradas 15,422 -
3
votes1
answer32
viewsA: How to paint a graph in r from a specific value?
To paint the area below the curve, you can use the geom_ribbon. The basis has to be reduced to values of x from the minimum point, in this case 10. And should be used before geom_line not to overlap…
-
2
votes1
answer37
viewsA: looping with dplyr in R
Here is a function that divides the table by values of "classe1" and writes the corresponding files. Tested with raiz as defined below. fun_dplyr <- function(x, col, raiz){ x %>%…
-
4
votes1
answer33
viewsA: How to remove cluster data in the ggplot?
This solution is not exactly equal to the question since the blue background panel fills the area completely. Some differences in the code are as follows: The column titles started by "GPHY_G_" were…
-
3
votes1
answer50
viewsA: Is there any function in R, whose return is identical to the return of excel’s SEERRO() function?
Perhaps this artificial example, as is said in one of the examples of help("nls") in which you are inspired, you can give a helping hand. It shows how you can try several templates and continue even…
-
2
votes1
answer27
viewsA: Compare line values in a data frame - R
This can be solved with ave. No loops are required for or extra packages. And the ave is reasonably fast. qtde <- with(Count_Qualific, ave(key_find, key_find, FUN = seq_along))…
-
4
votes2
answers63
viewsA: Create hatched area below normal distribution curve in R
Here’s a solution with the package ggplot2. A function is defined to create a basis with values for the normal density and the desired range area. This area is given by limits lower and upper that…
-
5
votes1
answer36
viewsA: Heterocesarean mixed effects model via lmer function
In a response from R-Sig-Mixed-models, Douglas Bates (Googlescholar, Researchgate) gives the following answer to a question on the possibility of modeling the problem of the distribution…
-
2
votes1
answer39
viewsA: Diagnostic analysis in mixed effects models via Plot Half-Normal chart
Here is a solution to have graphics with waste "deviance" and "response", which seems to me to be the most useful. The code has also been simplified to show how to use facets directly, without…
-
4
votes2
answers36
viewsA: Count frequency of occurrences including zeroes
To include the values for which the frequency is zero, convert to factor with all levels and apply table. table(factor(x, levels = 1:12)) # # 1 2 3 4 5 6 7 8 9 10 11 12 # 6 6 5 3 3 8 5 5 5 0 2 2 The…
ranswered Rui Barradas 15,422 -
1
votes1
answer27
viewsA: Find higher row value and return the column title of a data frame
This solution is in R base. First a cycle apply determines the number of the numeric column with the highest value in each row. Then create the new column with the column names corresponding to…
-
2
votes1
answer45
viewsA: Mark the highest and lowest value of a column with kableExtra in R
To the ifelse function as intended, has to have a logical condition and has no. max(MEDIA$Média) is not being compared with the values of this vector. How this is not even a logical value:…
-
1
votes1
answer35
viewsA: Problems adjusting a mixed effects model using the gamlss package
The problem must be how the data is read from Google Drive. The link file is a CSV file with commas as marker for decimals. And the column separator is the semicolon, ";". To read files in this CSV…
-
2
votes1
answer46
viewsA: Time series, how to eliminate gaps between missing periods in ggplot2?
One possible way to solve the problem is with facet_*. The graph is divided into three, one for each group. For this, a variable is created faceta. The breaks are omitted since dias$Time is not in…
-
3
votes1
answer27
viewsA: How to plot expressions within facet_grid()?
For each facet to have different Labels, one can create a variable var class "factor" with the desired honeys. To make the code of ggplot more readable, this is done before with a pipe dplyr.…
-
1
votes1
answer53
viewsA: Graph of joint curves in ggplot2
There is nothing wrong with the graph, if the lines are separated, the x axis can vary between 5 and 17.5, in the first graph, and between 10 and 40, in the last two. And the axes of the y’s are…