R Regression by condition using apply or for

Asked

Viewed 60 times

0

I have the following sample:

x <- structure(list(POP = structure(c(1L, 12L, 15L, 16L, 17L, 18L, 
19L, 20L, 21L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L, 
14L), .Label = c("pop1", "pop10", "pop11", "pop12", "pop13", 
"pop14", "pop15", "pop16", "pop17", "pop18", "pop19", "pop2", 
"pop20", "pop21", "pop3", "pop4", "pop5", "pop6", "pop7", "pop8", 
"pop9"), class = "factor"), a1 = c(91, 26.7, 51.9, 14, 0, 15.3, 
34.4, 19.1, 10.2, 52.5, 43.6, 13.1, 47.1, 34.7, 0, 58.9, 66.8, 
0, 0, 0, 0), a2 = c(92.9, 27.7, 54.1, 14.3, 0, 16.2, 35, 19.1, 
11.1, 52.5, 44.6, 13.4, 48.7, 34.4, 0, 59.5, 72.3, 0, 0, 0, 0
), a3 = c(92.6, 27.4, 54.7, 13.7, 0, 16.2, 36, 0, 11.1, 53.2, 
45.2, 13.7, 49.3, 0, 0, 59.5, 74.5, 0, 0, 0, 0), a4 = c(95.5, 
28.3, 57.3, 14.6, 0, 16.9, 36.9, 0, 11.8, 56.3, 47.1, 14, 53.2, 
0, 0, 62.7, 84.4, 0, 0, 0, 0), a5 = c(97.4, 28.6, 61.4, 14.3, 
0, 17.5, 36.9, 0, 12.4, 55.7, 47.4, 14.6, 53.8, 0, 0, 62.4, 0, 
0, 0, 0, 0), a6 = c(97.7, 29.3, 63.3, 14.6, 0, 18.5, 38.8, 0, 
13.1, 57.3, 49, 15.3, 55.4, 0, 0, 62.7, 0, 0, 0, 0, 0), a7 = c(102.2, 
0, 68.1, 14.6, 11.1, 20.1, 43.3, 0, 14.6, 64.9, 53.2, 0, 60.5, 
0, 0, 62.7, 0, 0, 0, 0, 0), a8 = c(106.3, 0, 71.9, 14.3, 0, 19.7, 
45.8, 0, 15.9, 70.7, 57.3, 0, 67.8, 0, 10.5, 0, 0, 0, 10, 0, 
0), a9 = c(113.2, 0, 75.5, 15, 0, 21.7, 49, 0, 18.5, 73, 59.8, 
0, 0, 0, 14.7, 0, 0, 0, 10.4, 0, 0), a10 = c(114.9, 0, 75.2, 
15, 0, 22.6, 49.6, 0, 19.8, 73.8, 59.9, 0, 0, 0, 16.6, 0, 0, 
10.5, 10.5, 0, 0), a11 = c(114.9, 0, 75.5, 15.1, 0, 23.2, 50.6, 
0, 19.8, 74.6, 59.2, 0, 0, 0, 18.2, 0, 0, 10.5, 10.6, 0, 0), 
    a12 = c(115, 0, 76, 15.9, 0, 26.1, 0, 0, 22.7, 75.4, 60.8, 
    0, 0, 0, 21, 0, 0, 10.3, 11.1, 0, 0), a13 = c(115.2, 11.6, 
    76, 16, 0, 26.6, 0, 0, 23.3, 75.5, 61.3, 0, 0, 0, 22.6, 0, 
    0, 10.7, 11.1, 0, 0), a14 = c(0, 11.6, 77.6, 0, 0, 29.5, 
    0, 0, 25.3, 76.2, 64, 0, 0, 0, 25.5, 0, 0, 11.6, 11.8, 10.2, 
    11)), class = "data.frame", row.names = c(NA, -21L))

And the annual data:

temp <- structure(list(ano = structure(c(1L, 7L, 8L, 9L, 10L, 11L, 12L, 
13L, 14L, 2L, 3L, 4L, 5L, 6L), .Label = c("a1", "a10", "a11", 
"a12", "a13", "a14", "a2", "a3", "a4", "a5", "a6", "a7", "a8", 
"a9"), class = "factor"), temp = c(0L, 2L, 2L, 6L, 2L, 3L, 13L, 
8L, 7L, 3L, 2L, 5L, 2L, 5L)), class = "data.frame", row.names = c(NA, 
-14L))

I can extract the regressions with the functions apply, when I have the complete data in the 14 years, doing a data cleaning and keeping only the pop who has the full 14 year series.

However I wanted to run the regression in the data that are not complete, I use the following code:

y <- temp$temp

log_x <- apply(x[-1], 2, log)

model_list <- apply(log_x, 1, function(x) lm(x ~ y))

coef_list <- t(sapply(model_list, coef))

model_smry <- lapply(model_list, summary)

the function applyfor the log error, which results in -inf since it has zero values at the base.

has to run the regression to that point before the error and calculate the regression with the y corresponding to the year that stopped?

For example:

It’s 14 years or columns the year 4 has -inf I calculate regression only with years 1, 2 and 3 with the y corresponding to those years (stopping the function apply in the -inf). Doing it for all lines.

maybe you can do it with a function for but I don’t know how to proceed, if you can help start or indicate something to study, it will already be a great help.

  • I’m unable to reproduce your mistake. By chance the line log_x <- apply(log_x[-1], 2, log) should be log_x <- apply(base[-1], 2, log)? Also, what is the data frame for temp? Please review these details and edit the question so that it is playable.

  • Oops, sorry. I just set the example.

2 answers

1

So I hope you understand correctly. It’s not the most sophisticated way, but it should work. Let’s look at the data

 x
 #     POP   a1   a2   a3   a4   a5   a6    a7    a8    a9   a10   a11   a12   a13  a14
 #1   pop1 91.0 92.9 92.6 95.5 97.4 97.7 102.2 106.3 113.2 114.9 114.9 115.0 115.2  0.0
 #2   pop2 26.7 27.7 27.4 28.3 28.6 29.3   0.0   0.0   0.0   0.0   0.0   0.0  11.6 11.6
 #3   pop3 51.9 54.1 54.7 57.3 61.4 63.3  68.1  71.9  75.5  75.2  75.5  76.0  76.0 77.6
 #4   pop4 14.0 14.3 13.7 14.6 14.3 14.6  14.6  14.3  15.0  15.0  15.1  15.9  16.0  0.0
 #5   pop5  0.0  0.0  0.0  0.0  0.0  0.0  11.1   0.0   0.0   0.0   0.0   0.0   0.0  0.0
 #6   pop6 15.3 16.2 16.2 16.9 17.5 18.5  20.1  19.7  21.7  22.6  23.2  26.1  26.6 29.5
 #7   pop7 34.4 35.0 36.0 36.9 36.9 38.8  43.3  45.8  49.0  49.6  50.6   0.0   0.0  0.0
 #8   pop8 19.1 19.1  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  0.0
 #9   pop9 10.2 11.1 11.1 11.8 12.4 13.1  14.6  15.9  18.5  19.8  19.8  22.7  23.3 25.3
 #10 pop10 52.5 52.5 53.2 56.3 55.7 57.3  64.9  70.7  73.0  73.8  74.6  75.4  75.5 76.2
 #11 pop11 43.6 44.6 45.2 47.1 47.4 49.0  53.2  57.3  59.8  59.9  59.2  60.8  61.3 64.0
 #12 pop12 13.1 13.4 13.7 14.0 14.6 15.3   0.0   0.0   0.0   0.0   0.0   0.0   0.0  0.0
 #13 pop13 47.1 48.7 49.3 53.2 53.8 55.4  60.5  67.8   0.0   0.0   0.0   0.0   0.0  0.0
 #14 pop14 34.7 34.4  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  0.0
 #15 pop15  0.0  0.0  0.0  0.0  0.0  0.0   0.0  10.5  14.7  16.6  18.2  21.0  22.6 25.5
 #16 pop16 58.9 59.5 59.5 62.7 62.4 62.7  62.7   0.0   0.0   0.0   0.0   0.0   0.0  0.0
 #17 pop17 66.8 72.3 74.5 84.4  0.0  0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  0.0
 #18 pop18  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0  10.5  10.5  10.3  10.7 11.6
 #19 pop19  0.0  0.0  0.0  0.0  0.0  0.0   0.0  10.0  10.4  10.5  10.6  11.1  11.1 11.8
 #20 pop20  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0 10.2
 #21 pop21  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0 11.0
 y <- temp$temp
 y
 # [1]  0  2  2  6  2  3 13  8  7  3  2  5  2  5
 log_x <- apply(x[-1], 2, log)

Here I create a list to receive your interpolation values, which will be done by a loop (for)

 # acumula os valores de interpolacao
 model_list <- list()

I must delete the lines that have no valid remarks

 # linhas em log_x com -Inf em todas as lacunas
 linhas <- c(3,6,9,10,11)
 # Linhas com Algum Valor (LAV) 
 lav <- (1:21)[-linhas]
 lav
 # [1]  1  2  4  5  7  8 12 13 14 15 16 17 18 19 20 21

Now I create a loop that i) removes the positions without observation, both in the log_x file and in y as well; ii) makes the interpolations

 for(i in (lav)){ # loop por linha
    # indices de coluna com valores -Inf
    indice_com_inf <- which(log_x[i,]<0)
    # novo log_x1, sem os indices de valores -Inf
    log_x1 <- log_x[i,-indice_com_inf]
    # novo y1, sem os indices de valores -Inf
    y1 <- y[-indice_com_inf]
    # novo 
   model_list[[i]] <- lm(log_x1 ~ y1) 
 }

See how the first five interpolations looks. Note that for cases where there was no observation (line 3), the list model_list is empty in this position

head(model_list,5)
#[[1]]
#
#Call:
#lm(formula = log_x1 ~ y1)
#
#Coefficients:
#(Intercept)           y1  
#   4.614607     0.005504  
# 
#
#[[2]]
#
#Call:
#lm(formula = log_x1 ~ y1)
#
#Coefficients:
#(Intercept)           y1  
#    3.23414     -0.04457  
#
#
#[[3]]
#NULL
#
#[[4]]
#
#Call:
#lm(formula = log_x1 ~ y1)
#
#Coefficients:
#(Intercept)           y1  
#    2.68385      0.00109  
#
#
#[[5]]
#
#Call:
#lm(formula = log_x1 ~ y1)
#
#Coefficients:
#(Intercept)           y1  
#      2.407           NA  
  • Oops, thank you. The process worked as expected. But in the end it is omitting the values NA, you know some way to make everyone appear in the list. At the end of the loop the following message Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (não-NA) casos

  • Thank you so much man. You gave me the right way. Abs

  • Oops, beauty. For nothing! I used head(model_list,5) only to show the first five regressions. About the error indicated, here did not appear at all. Abs

  • A question, when using the head(model_list,5), the observation 3 is of a value equal to NULL, has how to calculate the regression in it? because it presents the complete series of data, in case it would be possible to calculate with the y and his 14 observations. I tried to change the loop for this, but could not.

  • At least in the data of the past example, the observation 3 appeared with all the gaps -Inf. However, if you have other data with valid values, just include the index 3 in the vector lav --- so that position enters the loop.

0


I managed to supplement the code to not give this error

follows the solution of the function for:

for(i in (lav)){ # loop por linha
  # indices de coluna com valores -Inf
  indice_com_inf <- which(log_x[i,]<0)
  # criar lista de x sem os -inf
  x_list[[i]] <- log_x[i,-indice_com_inf]
  # novo y1 em lista, sem os indices de valores -Inf
  y_list[[i]] <- y[-indice_com_inf]
  # condição
  if (length(x_list[[i]]) >= 3) { #condição comprimento >=3
    #rodar regressão
    model_list[[i]] <- lm(x_list[[i]] ~ y_list[[i]])
  } else {
    model_list[[i]] <- NA #se não atender a condição, não fazer nada
}}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.