How to run a model for each subset of data of a time series to see

Question

How to run a model for each subset of data of a time series to see

Asked 7 years, 10 months ago

Viewed 97 times

1

I have a database that has feed consumption information (190 animals) every day (88 days).
Goes below

table1 <- read.csv("Data.csv", header = TRUE) 
table1

Animal	Dia	Consumo
1	1	245
1	2	256
1	3	300
1	4	450
2	1	245
2	2	256
2	3	300
2	4	450

x = table1$Dia

y = table1$Consumo

Animal=table1$Animal

First I need to run a DLM model (I already have the program) only for an animal.

runDLM = function(beta) { x, y,........ }

Then I need to rotate for all the animals. For all the animals I thought of the command described below, but I don’t know if it’s right

for (Animal in 1:190) {
    runDLM(beta)
    }

Seems to be wrong, the function runDLM does not depend on Animal, is not a function argument, the only argument is beta.

– Rui Barradas

2017/09/18 at 10:23

2 answers

0

Well, as they usually say in R tutorials, "if you’re using a FOR-loop, you’re doing it wrong". In addition to slowing down the code, you are not using one of the most important factors of the language, which is its functional orientation, extremely useful in problems like yours: applying complex functions structurally in data.

Robert’s answer is absolutely correct. I only offered a more modern version using the packages dplyr + tidyr, broom and purrr for:

generate data with the same structure that showed us
estimate a simple linear model in the generated data
extract the estimated coefficients from each model

The structure of my code should help you estimate your models smoothly.

library(tidyverse)

## gerando dados com a mesma estrutura apresentada

set.seed(134)

df <- 
  tibble(Animal = 1:190, taxa = runif(n = 190, min = 3, max = 23)) %>%  
  crossing(dia = 0:3) %>%
  mutate(consumo = 245 + dia*taxa)


df


# A tibble: 760 x 4
   Animal    taxa   dia  consumo
   <int>     <dbl> <int>    <dbl>
1      1  7.007272     0 245.0000
2      1  7.007272     1 252.0073
3      1  7.007272     2 259.0145
4      1  7.007272     3 266.0218
5      2 15.551850     0 245.0000
6      2 15.551850     1 260.5518
7      2 15.551850     2 276.1037
8      2 15.551850     3 291.6555
9      3 20.629976     0 245.0000
10     3 20.629976     1 265.6300
# ... with 750 more rows

Now let’s estimate a simple linear model with lm for the function consumo ~ dia and extract the coefficients of each model using the function broom::tidy, which extracts the coefficients and statistics in a data.frame "Tidy"(clean):

## gerando um modelo linear no dia para cada animal e extraindo os coeficientes  --------------

df %>% 
  group_by(Animal) %>% 
  nest(.key = dados_animal) %>% 
  mutate(model = map(dados_animal, ~lm(consumo ~ dia, data = .))) %>% 
  mutate(coef = map(model, broom::tidy)) %>% 
  unnest(coef)

Resulting in:

# A tibble: 380 x 6
   Animal        term   estimate    std.error    statistic      p.value
   <int>       <chr>      <dbl>        <dbl>        <dbl>        <dbl>
1      1 (Intercept) 245.000000 1.174950e-15 2.085196e+17 2.299886e-35
2      1         dia   7.007272 6.280370e-16 1.115742e+16 8.032903e-33
3      2 (Intercept) 245.000000 0.000000e+00          Inf 0.000000e+00
4      2         dia  15.551850 0.000000e+00          Inf 0.000000e+00
5      3 (Intercept) 245.000000 4.699798e-15 5.212990e+16 3.679818e-34
6      3         dia  20.629976 2.512148e-15 8.212086e+15 1.482836e-32
7      4 (Intercept) 245.000000 0.000000e+00          Inf 0.000000e+00
8      4         dia   8.025121 0.000000e+00          Inf 0.000000e+00
9      5 (Intercept) 245.000000 0.000000e+00          Inf 0.000000e+00
10     5         dia  15.198999 0.000000e+00          Inf 0.000000e+00
# ... with 370 more rows

To learn more about the "tidyverse" way of programming, I recommend taking a look at the articles on website of the pacute and in the book R for Data Sceince, especially in the part about how to work with many models, because that’s basically how I presented that answer.

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Robert • **887** points · Answer 1 · 2017-09-21T02:30:22+00:00

I invented data and a function, but you can use the same logic:

set.seed(123)
table1 = data.frame(Animal=rep(1:6,each=10),Dia=rep(1:4,15), Consumo=round(runif(60,240,500),0))

runDLM = function(beta,dat) { 
  x=dat$Dia 
  y=dat$Consumo
  coef(lm(y~I(x^beta)))
  }

sdat=split(table1,table1$Animal)
betat=2
sapply(sdat,runDLM,beta=betat)


#                         1          2          3          4          5          6
#    (Intercept) 355.804959 319.156589 376.897521 398.223256 313.494215 369.547287
#    I(x^beta)     5.322314   6.699225   3.538843  -2.167442   2.523967  -1.993798