In R, what is a Tibble?

Asked

Viewed 291 times

4

What is a tibble? How she differs from a data.frame?

The code below creates a .

set.seed(123)
df <- base::data.frame(
  id = 1:10,
  texto = letters[1:10],
  numero = rnorm(10)
)

df
#>    id texto      numero
#> 1   1     a -0.56047565
#> 2   2     b -0.23017749
#> 3   3     c  1.55870831
#> 4   4     d  0.07050839
#> 5   5     e  0.12928774
#> 6   6     f  1.71506499
#> 7   7     g  0.46091621
#> 8   8     h -1.26506123
#> 9   9     i -0.68685285
#> 10 10     j -0.44566197
class(df)
#> [1] "data.frame"

And this one with the same data.

set.seed(123)
tbl <- tibble::tibble(
  id = 1:10,
  texto = letters[1:10],
  numero = rnorm(10)
)

tbl
#> # A tibble: 10 x 3
#>       id texto  numero
#>    <int> <chr>   <dbl>
#>  1     1 a     -0.560 
#>  2     2 b     -0.230 
#>  3     3 c      1.56  
#>  4     4 d      0.0705
#>  5     5 e      0.129 
#>  6     6 f      1.72  
#>  7     7 g      0.461 
#>  8     8 h     -1.27  
#>  9     9 i     -0.687 
#> 10    10 j     -0.446
class(tbl)
#> [1] "tbl_df"     "tbl"        "data.frame"

Created on 2020-06-17 by the reprex package (v0.3.0)

  • In addition to Pretty print Tibble carries itself groupings, such as those created by group_by or rowise(), besides Tibble support in a very elegant way df and Tibbles within itself

3 answers

2


In a more extensive way

  1. Tibbles within Tibbles
  2. Automatic coverage to support tidyverse groupings
  3. Pretty Print
  4. Multi-dispatch Data.frame functions work on Tibble
library(tidyverse)

# Tibbles inside tibbles 

df_a <- data.frame(a = c(1,2),b = c("a","b"))
df_b <- data.frame(a = c(1,2,3),b = c("a","b","c"),c = c(3,4,5))

# Error
# df_c <- data.frame(a = list(df_a),b = list(df_b))


tb_a <- tibble(a = c(1,2),b = c("a","b"))
tb_b <- tibble(a = c(1,2,3),b = c("a","b","c"),c = c(3,4,5))

tb_c <- tibble(a = list(tb_a),b = list(tb_b))

tb_c
#> # A tibble: 1 x 2
#>   a                b               
#>   <list>           <list>          
#> 1 <tibble [2 x 2]> <tibble [3 x 3]>


# Conversion into tibble

df_a %>% 
  group_by(a)
#> # A tibble: 2 x 2
#> # Groups:   a [2]
#>       a b    
#>   <dbl> <fct>
#> 1     1 a    
#> 2     2 b

df_a %>% 
  rowwise(b)
#> # A tibble: 2 x 2
#> # Rowwise:  b
#>       a b    
#>   <dbl> <fct>
#> 1     1 a    
#> 2     2 b


# Pretty print, default more than 20 rows to 10

df_a <- data.frame(a = rep(1,21))

df_a
#>    a
#> 1  1
#> 2  1
#> 3  1
#> 4  1
#> 5  1
#> 6  1
#> 7  1
#> 8  1
#> 9  1
#> 10 1
#> 11 1
#> 12 1
#> 13 1
#> 14 1
#> 15 1
#> 16 1
#> 17 1
#> 18 1
#> 19 1
#> 20 1
#> 21 1


tibble(df_a)
#> # A tibble: 21 x 1
#>        a
#>    <dbl>
#>  1     1
#>  2     1
#>  3     1
#>  4     1
#>  5     1
#>  6     1
#>  7     1
#>  8     1
#>  9     1
#> 10     1
#> # ... with 11 more rows



# See multi-dispatch, a tibble is also a data.frame

class(df_a)
#> [1] "data.frame"

class(tb_a)
#> [1] "tbl_df"     "tbl"        "data.frame"
class(df_a)
#> [1] "data.frame"

Created on 2020-06-17 by the reprex package (v0.3.0)

2

An object Tibble is not data frame. and neither data frame. is Tibble. This is a mistake to say, since they are different data structures, but for the end user and the final code may not have major differences. What I mean is that basically what you know of data frame. you can use on class objects tbl_df. Note that R is an Object Oriented - OO language and the implementation of classes tbl_df inherit the classes tbland data.frame.

> library(tibble)
> as_tibble(iris)
# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# … with 140 more rows
> class(as_tibble(iris))
[1] "tbl_df"     "tbl"        "data.frame"

One of the objectives of Tibble is to be a more organized framework structure than those implemented in older classes of R, as is the case of data.frame. For example, in the Tibble you do not assign names to the lines. See also that a Tibble fits the size of your command prompt and prints only the first lines of the dataset, which would be very bad if it didn’t, if your dataset had several entries. In addition, observing a Tibble you understand well what data structure and/or data type of your variables. These are the main differences.

1

"Tibble are data.frames, but they adjust some old behaviors to make life a little easier". R is an old language, and some things that were useful 10 or 20 years ago are now getting in the way (Grolemund & Wickham).

The book R for Data Science informs that there are two differences between Tibble and Data.frame: Printing and subsetting.

Also, "Tibble are designed so that you don’t accidentally overload your console while printing large databases".

Source: R for Data Science https://r4ds.had.co.nz/tibbles.html

Reference Tibbles vs. data.frame

Browser other questions tagged

You are not signed in. Login or sign up in order to post.