Create column with sum and percentage of the maximum of other columns

Asked

Viewed 1,920 times

7

i am a beginner in the R language and wanted to know how I create a function mutate that creates a new column in my table and at the same time store in this new column the sum of the values contained in the other 6 columns of my table as in the example below, also wanted to create another column called Ratio on the side of the Total Soma_column that would take the largest number present between columns A and F and divide it by the value contained in the Total Soma_column. For example, in the first row of the ratio column I would have 100 (higher value between columns A and F)/150 (value contained in the Total Soma_column).

Important details: the columns of A: F are of the Character type and I have a series of NA’s in these columns that I would like to disregard when making my sum.

A   B   C   D   E   F   Soma_total Proporcao
100 50  NA  NA  NA  NA  150        (100/150)
49  51  1   NA  NA  NA  101        (51/101)
30  20  5   1   NA  NA  56         (30/56)
11  10  2   NA  NA  NA  23         (11/23)
7   3   5   1   1   1   18         (7/18)
0   10  NA  NA  NA  NA  10         (10/10)
1   2   NA  NA  NA  NA  3          (2/3)
5   6   2   NA  NA  NA  13         (6/13)
7   3   1   3   1   NA  15         (7/15)
12  3   1   2   NA  NA  18         (12/18) 

4 answers

8

recreating the table in code (so everyone can reproduce the example):

dt_so_bruna <- 
structure(list(A = c(100, 49, 30, 11, 7, 0, 1, 5, 7, 12), B = c(50, 
51, 20, 10, 3, 10, 2, 6, 3, 3), C = c(0, 1, 5, 2, 5, 0, 0, 2, 
1, 1), D = c(0, 0, 1, 0, 1, 0, 0, 0, 3, 2), E = c(0, 0, 0, 0, 
1, 0, 0, 0, 1, 0), F = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0)), .Names = c("A", 
"B", "C", "D", "E", "F"), row.names = c(NA, 10L), class = "data.frame")

solution with two mutate lines() :

   dt_so_bruna %>%
   mutate(Soma        = A + B + C + D + E + F)  %>%
   mutate(Proporcao_A = pmax(A, B, C, D, E, F, na.rm = TRUE) / Soma)

         A  B C D E F Soma Proporcao_A
    1  100 50 0 0 0 0  150   0.6666667
    2   49 51 1 0 0 0  101   0.4851485
    3   30 20 5 1 0 0   56   0.5357143
    4   11 10 2 0 0 0   23   0.4782609
    5    7  3 5 1 1 1   18   0.3888889
    6    0 10 0 0 0 0   10   0.0000000
    7    1  2 0 0 0 0    3   0.3333333
    8    5  6 2 0 0 0   13   0.3846154
    9    7  3 1 3 1 0   15   0.4666667
    10  12  3 1 2 0 0   18   0.6666667

Assumi que seus NAs eram 0
  • All you need to do is pack the proportion numerator, swapping the A for the maximum between A and F! abs

  • It’s true, @Athos. I didn’t really read the question! Thanks for your answer, I learned about Pmax(), and rowSums()

5

An alternative:

library(dplyr)


dados3 <- dados %>%
  mutate_at(c("A", "B", "C", "D", "E", "F"), as.numeric) %>%
  mutate(Soma_total_2 = data_frame(A, B, C, D, E, F) %>% rowSums(na.rm = TRUE),
         Maximo = pmax(A, B, C, D, E, F, na.rm = TRUE), 
         Proporcao_2 = sprintf("(%s/%s)", Maximo, Soma_total_2))

New functions there are

  • mutate_at - applies the same transformation for specified columns.
  • rowSums - Sum values of each row of a matrix or data.frame.
  • pmax - Take as much as possible from each row of an array or date..
  • sprintf - format texts.

Hugs!

4

Less verbiage response (not need to write the name of all columns), but with more code

library(tibble)
library(dplyr)
library(tidyr)

d %>% 
  # adiciona um "id" com os nomes das linhas
  rownames_to_column('id') %>% 
  # empilha as colunas A até F. ao invés de -id poderia ser A:F
  gather(key, val, -id) %>% 
  # convert val em numeric
  mutate(val = as.numeric(val)) %>% 
  # agrupa pelo id
  group_by(id) %>%
  # cria soma total e proporcao (desconsiderando NAs)
  mutate(Soma_total = sum(val, na.rm = TRUE),
         Proporcao = max(val, na.rm = TRUE) / Soma_total) %>% 
  # desagrupa
  ungroup() %>% 
  # joga A:F nas colunas novamente
  spread(key, val) %>% 
  # reordena colunas e retira o "id"
  select(A:F, Soma_total, Proporcao)

Upshot:

# A tibble: 10 × 8
       A     B     C     D     E     F Soma_total Proporcao
*  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>      <dbl>     <dbl>
1    100    50    NA    NA    NA    NA        150 0.6666667
2     12     3     1     2    NA    NA         18 0.6666667
3     49    51     1    NA    NA    NA        101 0.5049505
4     30    20     5     1    NA    NA         56 0.5357143
5     11    10     2    NA    NA    NA         23 0.4782609
6      7     3     5     1     1     1         18 0.3888889
7      0    10    NA    NA    NA    NA         10 1.0000000
8      1     2    NA    NA    NA    NA          3 0.6666667
9      5     6     2    NA    NA    NA         13 0.4615385
10     7     3     1     3     1    NA         15 0.4666667

0

You can do everything easily with the R base functions as well. In your case you are applying a line function.

Using the base dt_so_bruna created by Dan:

dt_so_bruna$soma <- rowSums(dt_so_bruna, na.rm = TRUE)
dt_so_bruna$prop <- apply(dt_so_bruna[-7], 1, max, na.rm = TRUE)/dt_so_bruna$soma
dt_so_bruna

     A  B C D E F soma      prop
1  100 50 0 0 0 0  150 0.6666667
2   49 51 1 0 0 0  101 0.5049505
3   30 20 5 1 0 0   56 0.5357143
4   11 10 2 0 0 0   23 0.4782609
5    7  3 5 1 1 1   18 0.3888889
6    0 10 0 0 0 0   10 1.0000000
7    1  2 0 0 0 0    3 0.6666667
8    5  6 2 0 0 0   13 0.4615385
9    7  3 1 3 1 0   15 0.4666667
10  12  3 1 2 0 0   18 0.6666667

Browser other questions tagged

You are not signed in. Login or sign up in order to post.