Filling lines with correct data in R, joining successive lines

Asked

Viewed 44 times

3

I have a database with the second structure:

a=as.data.frame(tribble(  
  ~a, ~texto, ~x2007, ~x2008,  
  "a","aa",0,0,  
   0,"--",12,13,  
  "b","bb",1,2,  
  "c","cc", 0, 0,  
  0,"dd", 0,0,  
  0,"ee",7,8))

The rows starting with zero are the continuation of the rows starting with letter, immediately preceding and the data for the third and fourth columns are the data which are in the third and fourth columns of the last row starting with zero (the data from row 4 are at the end of row 6. In addition, data in column 2 need to be combined. The desired result would be:

  a texto x2007 x2008  
1 a aa --      12    13  
3 b    bb       1     2  
4 c cc dd ee    5     6  

I tried the following:

b=vector()  
for (i in 2:nrow(a)) {  
  if(a[i,1]==0) {  
    a[i-1,2]=paste(a[i-1,2],a[i,2])  
    a[i-1,3:4]=a[i,3:4]  
    b=c(b,i)  
    }  
}  
a=a[-b,]  #exclui linhas

but only works for two consecutive lines:

  a texto x2007 x2008
1 a aa --    12    13
3 b    bb     1     2
4 c cc dd     5     6

someone can help me or has a simpler solution?

1 answer

4


A possible solution is the following.
Uses a package function zoo to modify the column a and then finished back to the zeroes where they were.

zeros <- a$a == 0
is.na(a$a) <- zeros
a$a <- zoo::na.locf(a$a)

res <- lapply(split(a, a$a), function(DF){
  data.frame(a = DF$a[1],
             texto = paste(DF$texto, collapse = " "),
             x2007 = max(DF$x2007),
             x2008 = max(DF$x2008))
})
res <- do.call(rbind, res)
row.names(res) <- NULL
res
#  a    texto x2007 x2008
#1 a    aa --    12    13
#2 b       bb     1     2
#3 c cc dd ee     7     8

a$a[zeros] <- 0
a

Browser other questions tagged

You are not signed in. Login or sign up in order to post.