A: Insert date difference in a time difference function

Asked

Viewed 270 times

4

I found a function made by J.Ahumada and found it super interesting, everything to do with my work. The function is to separate photographic records of a species into a given sample unit(ua) from a desired independence intervals.

I created an object with the information of a particular species in a single sample unit. The object is called "paca".

paca <- filter(meus.dados, ua==" GF", data, hora, especie=="Cuniculus paca")
paca

     ua       data     hora        especie
   (chr)     (time)    (chr)          (chr)
1     GF 2012-06-02 01:12:00 Cuniculus paca
2     GF 2012-06-11 23:50:00 Cuniculus paca
3     GF 2012-06-12 00:06:00 Cuniculus paca
4     GF 2012-06-12 01:16:00 Cuniculus paca
5     GF 2012-07-11 20:35:00 Cuniculus paca
6     GF 2012-07-24 23:52:00 Cuniculus paca
7     GF 2012-08-01 21:39:00 Cuniculus paca
8     GF 2012-08-09 02:37:00 Cuniculus paca
9     GF 2012-08-11 00:24:00 Cuniculus paca
10    GF 2012-08-13 00:55:00 Cuniculus paca
11    GF 2012-08-13 19:47:00 Cuniculus paca
12    GF 2012-08-15 19:16:00 Cuniculus paca
13    GF 2012-08-18 02:35:00 Cuniculus paca
14    GF 2012-08-18 22:28:00 Cuniculus paca
15    GF 2012-08-24 02:27:00 Cuniculus paca

When rotating the function, it returns the sequence of numbers corresponding to the number of the row created by R (1 to 15). And when the record does not respect the 60min interval it repeats the number of the line where the record is.

reg.independentes<-function(dados,independencia){

   l<-length(dados$data)
   intervalo<-diff(dados$data)
   intervalo<-intervalo/60 #informar intervalo de independência em minutos
   intervalo<-as.numeric(intervalo)
   ev<-1;res<-numeric()
   cond<-intervalo> independencia 
   for(i in 1:(l-1)){
   if(!cond[i]) ev<-ev
   else ev<-ev+1
   res<-c(res,ev)

   }
  c(1,res)
 }

 reg.independentes(paca, 60)
 [1]  1  2  3  3  4  5  6  7  8  9  9 10 11 11 12 12 13 14 15 

The function does not consider the fact that the record was on different dates, it is only considering the time. Generating two situations:

First: repeating lines where the record was on the same date, but with intervals greater than 60min. For example, it repeats line 3, when checking the records, they have an interval greater than 60min, as desired (date is equal and time is different - 00:06 and 01:16). I didn’t understand why, this record wasn’t supposed to be being flagged!!

Second: repeating lines where the record is on different dates only the time being similar. The function is not considering the question of the date being different, for example, signaled the line (9, 11 and 12) but the records are on different dates, becoming independent.

The record is considered not independent if it occurs on the same day and in an interval less than 60min. If the record is at similar times but at different dates are considered Independent (this I need the function to do)

I tried to change the function formula, but I’m not getting it. I would add that the function returns a table with only the independent records.... Can anyone help me?

  • well, let’s try, for that I need to better understand in code: first, what Voce meant by data set, ie, is an array? second, assuming that this "data set" of yours is an array I would compare the values separating each item with split(cjdados[index]) and how I know which column of the array the split function returns that I want, in case the date and time and would compare these values directly within the function

  • I expressed myself badly. Data set would be the information of the species I stored in an object, which I called "paca" .

  • still confused tah, this object returns what, a json list, a string or something like that with the values that are passed to Function as parameters? in order to be able to compare the value by the way I described.

  • See, I am a programmer Java, python and PHP, in excel without chance for me, thousand excuses!

  • I’m starting in R. My data is all in Excel spreadsheet, so I read it and start working on it. But, thanks for your attention.

  • I almost never use Excell for anything, but if you are going to program it in these languages I told you, have, "I kill you" in 2 sticks!

  • should have yes, Vb I think, never use excel, but in java for example, I would put this "data" in a separate array and traverse it in Function to separate each line in another array using something like if (linha[i].split()[2].equals('00:60')){ return 'independente';} and so on and so forth.

  • Um, I’ll try. Does anyone suggest another way? I have a spreadsheet with data of 4 years, in several different ua with photographic records in different date and time for different species.

Show 3 more comments

1 answer

2


Working with functions created by other people is not very simple (especially without the explanation of the algorithm), so I found it simpler to make it from scratch.

First, it is important that you turn your data into a time format that R understands, to simplify the measurement of the interval. Your data stop show that the date is as time, but not the time. As I started from everything in text format, the form would be the following:

paca <- read.table(text = "ua       data     hora        especie
     GF 2012-06-02 01:12:00 Cuniculus_paca
     GF 2012-06-11 23:50:00 Cuniculus_paca
     GF 2012-06-12 00:06:00 Cuniculus_paca
     GF 2012-06-12 01:16:00 Cuniculus_paca
     GF 2012-07-11 20:35:00 Cuniculus_paca
     GF 2012-07-24 23:52:00 Cuniculus_paca
     GF 2012-08-01 21:39:00 Cuniculus_paca
     GF 2012-08-09 02:37:00 Cuniculus_paca
     GF 2012-08-11 00:24:00 Cuniculus_paca
    GF 2012-08-13 00:55:00 Cuniculus_paca
    GF 2012-08-13 19:47:00 Cuniculus_paca
    GF 2012-08-15 19:16:00 Cuniculus_paca
    GF 2012-08-18 02:35:00 Cuniculus_paca
    GF 2012-08-18 22:28:00 Cuniculus_paca
    GF 2012-08-24 02:27:00 Cuniculus_paca", 
                   stringsAsFactors = FALSE, header = TRUE)

paca$data_completa <- strptime(paste(paca$data, paca$hora),
                              format = "%Y-%m-%d %H:%M:%S")

I pieced together the date and time information into a single string and used the function as.Date to transform into date format.

To duplicate the index of measurements that follow its criteria, we only need to check which intervals are smaller than the limit (in this case, 60 minutes), and repeat these positions. The final function looks like this:

reg_independentes <- function(dados, independencia) {
  intervalo <- diff(dados) #Apenas a informação de tempo é necessária. A função diff calcula o intervalo entre o valor e o seu valor anterior. 
  units(intervalo) <- "mins" # Precisamos disso para garantir que faremos a comparação em minutos, sempre.
  repetir <- which(intervalo < independencia) # Verifcamos quais intervalos são menores que o valor independencia.
  sort(c(0, seq_along(intervalo), repetir)+1) # Juntamos os valores por ordem crescente. Precisamos do 0 e do + 1 porque sempre há um intervalo a menos que o número de valores.
}

Using the function:

reg_independentes(paca$data_completa, 60)
# [1]  1  2  3  3  4  5  6  7  8  9 10 11 12 13 14 15

I think the result is now correct, but if it is not you should be able to make the necessary adjustment.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.