Posts by Carlos Eduardo Lagosta • 5,497 points
162 posts
-
2
votes2
answers28
viewsA: Command to create multiple columns in a data.frame conditioned to other 2 columns
You can make the comparison directly between the "G*" and "V" column sets*": dd[grep("V", names(dd))] == dd[grep("G", names(dd))] #> V1 V2 V3 V4 #> [1,] FALSE FALSE FALSE TRUE #> [2,] FALSE…
-
3
votes1
answer36
viewsA: Graph ggplot R prints x-axis variables in non-cohesive charts when using facet_grid()
As pointed out by @Vinícius-Félix in the comments, use "free_x" in the options scales (to display only the X-axis factors of that facet) and space (for widths to be proportional to the number of…
-
1
votes2
answers76
viewsA: Detect outliers in grouped data
Identify outliers There are several procedures to identify outliers, the choice of method and cut-off criteria depend on your data and the purpose of the analysis. The most used are interquartile…
ranswered Carlos Eduardo Lagosta 5,497 -
2
votes1
answer36
viewsA: How to layer in ggplot2 with two graphs of different quantities?
Modify the values of the variable to be overwritten (PMI - 50, in this case). To ensure a better interpretation of the data, add a corresponding secondary axis and indicate to sec_axis the contrary…
-
2
votes1
answer34
viewsA: Group vertices as a function of Weights
Using igraph, can generate a graph from a subset and use clusters to check the groups formed: grafo <- graph_from_data_frame(subset(df, valor >= 3), directed = FALSE) # Ou, se já está com os…
-
3
votes2
answers31
viewsA: Edges with incorrect values
You are filling the simplified graph, but giving the label the values of the complete graph. Place the simplified graph on a new object, indicating to the simplify what to do with attributes if they…
-
2
votes3
answers53
viewsA: maximum values
Same principle as solution by Josiane Souza (sort the data and select the first 5 - or last, depending on the sort), but with R base: tail(sort(dados$km), 5) # ou head(sort(dados$km, decreasing =…
ranswered Carlos Eduardo Lagosta 5,497 -
4
votes1
answer44
viewsA: Extract position of values in a matrix in R
I’m not sure what you want with the first line of your job, but check out the help for all.equal. Along with use of apply for the matrix lines, it probably isn’t doing what it expects. And anyway,…
-
1
votes2
answers38
viewsA: how to identify and delete columns with characters and factor in R
Can apply is.numeric to the columns and use the resulting logical vector to index the data frame.: df[sapply(df, is.numeric)] #> j k #> 1 1 50 #> 2 2 2 #> 3 3 42 #> 4 4 3658 #> 5 5…
ranswered Carlos Eduardo Lagosta 5,497 -
1
votes1
answer80
viewsA: Combination of different values
A solution, using data table.: library(data.table) setDT(df)[, unique(expand.grid(destino, destino)), by = origem][Var1 != Var2] #> origem Var1 Var2 #> 1: A A B #> 2: A C B #> 3: A B A…
-
4
votes1
answer110
viewsA: Reorder levels of a categorical variable within panels according to the group to which they belong
The function tidytext::reorder_within do what you need to do: library(ggplot2) library(tidytext) ggplot(mpg, aes(x = reorder_within(class, by = hwy, within = year, fun = median), y = hwy)) +…
-
4
votes3
answers64
viewsA: Obtaining single records based on two columns
With R base, you can use aggregate. For more than one function, you can join the result of different aggregations: agg <- merge( aggregate(df1[3], by = df1[1:2], length), aggregate(df1[3], by =…
ranswered Carlos Eduardo Lagosta 5,497 -
3
votes2
answers55
viewsA: Label of data overlapping
You can use the option nudge_* to move the labels. For greater control, as lines cross, you can make the shift depending on the position of one Variety relative to another: adj <- with(dt,…
-
1
votes1
answer90
viewsA: Turn igraph.vs into dataframe
The functions *path igraph return a list of connections (edges). Some attributes can be extracted from it, but most importantly, you can use the indexes to index the graph or the data.frame with the…
ranswered Carlos Eduardo Lagosta 5,497 -
4
votes1
answer40
viewsA: Vanishing symbol when saving graphic image
Not all fonts and glyphs are supported by the driver postscript. The simplest way to solve is to use the driver cairo. It is not supported by ggsave, then open the device manually. Here’s a minimal…
-
0
votes1
answer46
viewsA: Iterative link of tables in R
A simpler way to do what you need is to represent your data as graph. The igraph package is great for this. Expand your example to include links that do not start from A: library(igraph) ligacoes…
-
2
votes2
answers47
viewsA: Plotted graphs separately in single window
Complementing the reply by Marcus Nunes (and taking advantage of the same data) with one more option: the ggpubr package. It is less flexible than gridExtra, but for multiple graphics has option to…
-
3
votes1
answer30
viewsA: Hiding values from the chart
Can use subset to specify the subset of the data to be plotted and expand the axis boundaries with the option limits of scale_*. Here’s a simplified example: ggplot(subset(dados, Mes %in% c("Jan",…
-
1
votes1
answer27
viewsA: How to rename a line?
Can use row.names with logical indexing to identify the line named "Null". Since you did not provide your data in a reproducible format, I am creating a simple example: # matriz de exemplo tabela…
ranswered Carlos Eduardo Lagosta 5,497 -
3
votes1
answer46
viewsA: Plot plots grouped bars associated with dots connected by rows
To have the bars grouped, you need a variable to indicate to the ggplot which row belongs to which group: library(ggplot2) dt$grupo <- rep(c("A", "B"), each = nrow(dt)/2) # Ordena os níveis da…
-
1
votes2
answers41
viewsA: R, How to calculate the mean of a variable x for each group of other variables
The question is with the tag dplyr and Rui Barradas already provided a great answer. But as the question cites "either by dplyr or another package", here are two alternatives, to get registered: R…
-
1
votes3
answers61
viewsA: How to transform data frame variables into the indexes of a matrix with R?
You can create an empty array with number of rows equal to the maximum value of userID and number of columns equal to a maximum of itemID and use a loop to fill in the values according to each row…
-
3
votes3
answers102
viewsA: How to do linear interpolation on R?
Interpolation is used for xy coordinates, not exactly your case; just average between YEAR-10 & COHORT+1 and YEAR+10 & COHORT-1: ano <- 1970 for (c in dados[dados$YEAR == ano, "COHORT"])…
ranswered Carlos Eduardo Lagosta 5,497 -
0
votes3
answers82
viewsA: Merge two dataframes of the same name keeping all columns
Can use cbind to put the two data.frames side by side and then sort the columns by name: df1 <- read.table( text = "Nom1 Nom2 15.1 20.3 45.5 40.1 32.1 50.2", header = TRUE) df2 <- read.table(…
ranswered Carlos Eduardo Lagosta 5,497 -
1
votes1
answer43
viewsA: Replace NA, based on other data frame by equal columns. R
Can use merge to create a data.frame with Table 1 data aligned with Table 2: TabelaM <- merge(Tabela1, Tabela2, by = c("Estado", "Loja"), all.x = TRUE, suffix = c("", ".y")) TabelaM #> Estado…
ranswered Carlos Eduardo Lagosta 5,497 -
1
votes2
answers39
viewsA: Create a table in R from a date frame grouping the values per month
With R base df$Mes <- format(as.Date(df$data), "%B") # ou %m para mês como número table(df$Mes) #> #> agosto outubro setembro #> 1 2 2 Can use as.data.frame(table(...)) if you need it as…
ranswered Carlos Eduardo Lagosta 5,497 -
1
votes1
answer35
viewsA: Repeat loop does not leave first position
The counter needs to be inside the repeat, before checking the condition: f10 <- function(n, mean,sigma) { lista <- numeric(1000) i <- 1 repeat { output <- rnorm(n, mean = mean, sd =…
-
2
votes2
answers100
viewsA: Correctly convert value into scientific notation for text in R
Can use format to specify the display format: n <- c(25351001641201357706367, 72952982679250725702754) as.character(n) #> [1] "2.53510016412014e+22" "7.29529826792507e+22"…
ranswered Carlos Eduardo Lagosta 5,497 -
3
votes2
answers35
viewsA: How to select rows that have text searching in all columns of a data frame
A solution, in a line: my.data[apply(my.data, 1, function(x) any(grepl("tryp", x, ignore.case = TRUE))), ] #> A B c d e #> 1 prot trypsina catalic 1416 b 1 please #> 3 123 123trypsina…
-
2
votes2
answers43
viewsA: Left_join returning dataframe with more lines than the original
As indicated in reply from ALS.Meyer, If in some year a code has been debunked into two or more (something common in official classification systems), then the union will return more lines than the…
-
2
votes3
answers54
viewsA: Replacing NA values of a column by the value of the top row of the same column of a dataframe
One more option, the function data.table::nafill. The variation setnafill can be used to modify by reference: library(data.table) setnafill(DADOS, "locf") DADOS #> a b #> 1 1 5 #> 2 2 6…
ranswered Carlos Eduardo Lagosta 5,497 -
4
votes1
answer36
viewsA: Create an index by searching part of the text only
As already indicated in the comments, grep and its variants serve for this: grep("a", x) #> [1] 4 7 The option value shows the values: grep("a", x, value = TRUE) #> [1] "ab" "aa" Use grepl to…
ranswered Carlos Eduardo Lagosta 5,497 -
5
votes2
answers63
viewsA: Create hatched area below normal distribution curve in R
Create a second set of data for the polygon. I took the titles, colors, etc to highlight the relevant parts of the code: # Plota a curva da FDP x <- seq(100, 200, length = 1000) y <- dnorm(x,…
-
3
votes1
answer42
viewsA: Create a repeat function using two data frames
In general, it is neither necessary nor advisable to use loops in R; the family apply is more efficient. In the case of group operations, combine with the use of split: # Separa os dados por mês…
ranswered Carlos Eduardo Lagosta 5,497 -
2
votes1
answer37
viewsA: How to obtain bordering municipalities from a geom_sf + ggplot
ggplot2 is just what you use to plot. What you need is space operations on your data. The latest versions of sf have implemented the joint operations of RGEOS. In your case, a combination of…
-
2
votes1
answer55
viewsA: Coloring map with ggplot_2
The scale is correct, it’s just that a lot of the data isn’t showing up on the map. The data are by municipality; when making the union each state is with several lines, each corresponding to a…
-
2
votes2
answers36
viewsA: Count frequency of occurrences including zeroes
table uses the function tabulate, which returns all ranges by default. You can use it directly: set.seed(2) x <- sample(12, 20, replace = TRUE) table(x) #> x #> 1 2 3 5 6 7 9 10 11 12 #>…
ranswered Carlos Eduardo Lagosta 5,497 -
3
votes1
answer46
viewsA: Hide part of a chart caption
ggplot includes in the captions all the variables specified within the aesthetics (except the axes). If the value used for some form (in your case, text size) is constant or independent of some…
-
2
votes1
answer54
viewsA: How to find larger increasing or decreasing subsequence in a vector?
Can use rle to save the step of creating the split list: long_seq_dec <- function(seq) { seq_inds <- split(1:length(seq), cumsum(c(0, diff(seq)) > 0)) ind_list <- lengths(seq_inds)…
-
1
votes1
answer41
viewsA: How to position geom_text above geom_errorbar?
Use for text the same coordinate as the upper error bar, with the option nudge_y so the text doesn’t get overwritten: library(ggplot2) # Dados de exemplo set.seed(2038) media <- sample(10:20, 5)…
-
3
votes1
answer36
viewsA: Problem with dots that don’t join lines in ggplot
Lines need a continuous X-axis; must use x = ano and not x = as.factor(ano). Use the option breaks of scale_x_continuous to check the intervals and make the brands be from year to year:…
ranswered Carlos Eduardo Lagosta 5,497 -
1
votes1
answer52
viewsA: How to change the layout of a Plot radar to Circumplex (Polar Bar) Charts on R
Can do it with ggplot2, it is a bar graph with polar coordinate on the X axis: library(ggplot2) dados <- data.frame(x = paste("Grupo", 1:9), y = mtcars$disp[1:9]) ggplot(dados, aes(x, y, fill =…
ranswered Carlos Eduardo Lagosta 5,497 -
2
votes4
answers285
viewsA: Separate data by values on the line?
Like answered by @Guilherme-Parreira use split is the best way to separate the data by a variable: dados <- data.frame( comprar = c(rep("a",times = 4), rep("b",times = 4), rep("c",times = 4),…
-
3
votes1
answer48
viewsA: Transformation of a dataframe column into date returns NA
@Rui-Arradas already indicated the problem in the comments, I will extend the answer. A date is a single day, the Dates class requires that day, month, and year be specified; partial formats are…
ranswered Carlos Eduardo Lagosta 5,497 -
1
votes1
answer33
viewsA: How to make a subset in a zoo-type series choosing certain years or months?
As indicated by @Jorge-Mendes in the comments, the lubridate package has several functions to easily work with dates and does exactly what you want. I will use a smaller example than yours to…
-
6
votes1
answer113
viewsA: Line segments leaving the interior of a map of brazil using package ggrepel
I do not use geobr, I used a shapefile that I already have, simplified IBGE shapefiles. As geobr accesses IBGE FTP, the result will be the same. library(sf) library(ggrepel) states <-…
-
2
votes1
answer61
viewsA: set a path according to the user to a folder
u is the object that is stored the string text, is not the string itself. You can use paste to compose the path name. Or better yet, as pointed out by @Rui-Arradas, file.path: file.path("C:/Users",…
ranswered Carlos Eduardo Lagosta 5,497 -
1
votes1
answer74
viewsA: How to create a data frame with vectors of different sizes?
As pointed out by @Guilherme-Parreira in the comments, if its function returns vectors with different lengths, it is better to use a list. Since it did not post a reproducible code, I wrote an…
ranswered Carlos Eduardo Lagosta 5,497 -
5
votes1
answer64
viewsA: How to extract only the first value from a row of concatenated values?
A regular expression that captures the content between strings resolves: a <- c("c(182752.414, 179107.7)", "c(200491.435, 195097.2)", "c(217566.642, 211641.4)") as.numeric(sub("^c\\((.*),.*",…
ranswered Carlos Eduardo Lagosta 5,497 -
1
votes1
answer84
viewsA: Adjusted line of the binomial regression model made in ggplot2
You are providing two points as xy coordinates; you must specify an equal amount of points to be plotted. But providing independent coordinates of the dataset is a bad practice; instead, let the…