As stated in the question link, a minimum reproducible example should have the following contents:
- A small data set;
- The smallest code possible that is executable and which reproduces the error in the mentioned small data set;
- Information on the version of
R
and the system on which the code is running, as well as the packages used;
- If using random data, ensure that the results are the same;
In this answer I will list some of the main functions in R
to fulfil these tasks.
It is worth remembering that the examples the help pages of the functions of R
can be of great value to have an idea of the structure of a minimum reproducible example. In general, the codes of the examples of the R
meet those requirements.
Producing the data set
To use your own data set, the function dput()
, along with head()
can be quite useful. For example the code below provides the first 10 observations of the database iris
already with the structure necessary to "reassemble" the database. So, for those who try to answer your question, just copy and paste the code into structure()
.
dput(head(iris, 10))
#> structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6,
#> 5, 4.4, 4.9), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9, 3.4,
#> 3.4, 2.9, 3.1), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7,
#> 1.4, 1.5, 1.4, 1.5), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2,
#> 0.4, 0.3, 0.2, 0.2, 0.1), Species = structure(c(1L, 1L, 1L, 1L,
#> 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("setosa", "versicolor", "virginica"
#> ), class = "factor")), .Names = c("Sepal.Length", "Sepal.Width",
#> "Petal.Length", "Petal.Width", "Species"), row.names = c(NA,
#> 10L), class = "data.frame")
Reproducing the data:
dados <- structure(list(Sepal.Length = c(
5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6,
5, 4.4, 4.9
), Sepal.Width = c(
3.5, 3, 3.2, 3.1, 3.6, 3.9, 3.4,
3.4, 2.9, 3.1
), Petal.Length = c(
1.4, 1.4, 1.3, 1.5, 1.4, 1.7,
1.4, 1.5, 1.4, 1.5
), Petal.Width = c(
0.2, 0.2, 0.2, 0.2, 0.2,
0.4, 0.3, 0.2, 0.2, 0.1
), Species = structure(c(
1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("setosa", "versicolor", "virginica"), class = "factor")), .Names = c(
"Sepal.Length", "Sepal.Width",
"Petal.Length", "Petal.Width", "Species"
), row.names = c(
NA,
10L
), class = "data.frame")
A less ideal solution than this would be to provide the data in text format, for example in the case below:
texto <- "Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa"
In this case, the user who will answer your question can reassemble the database using the function read.table()
:
dados <- read.table(text=texto)
Another way to produce a data set is by generating random values, for example with the function rnorm()
(you can also generate from other distributions without being normal, if relevant) or with the function sample()
for a sample of values of some vector. A useful case may be the function letters()
, to generate characters or factors. In this case, be sure to provide the seed
for the example to be reproducible.
Example:
set.seed(1) # garantir reproducibilidade
dados <- data.frame(x = rnorm(10), y = sample(letters, 10))
dados
#> x y
#> 1 -0.6264538 y
#> 2 0.1836433 f
#> 3 -0.8356286 p
#> 4 1.5952808 c
#> 5 0.3295078 z
#> 6 -0.8204684 i
#> 7 0.4874291 a
#> 8 0.7383247 h
#> 9 0.5757814 x
#> 10 -0.3053884 v
Other interesting functions in this case are the type functions as
, as as.factor()
, as.Date()
etc, for you to convert the data to the required format.
Producing the minimum code
Try to identify the smallest necessary part of your code that generates the error or doubt you have. Before sending the code, make sure that you listed the necessary packages for it to be playable. For this, it is good to test your code after restarting the R
, to make sure that everything necessary is there.
Example:
library(lattice) # a biblioteca utilizada
set.seed(1) # a seed
dados <- data.frame(x = as.character(rnorm(10)), y = sample(letters, 10)) # o conjunto de dados
densityplot(as.numeric(dados$x))
as.numeric(dados$x)
#> [1] 2 5 4 10 6 3 7 9 8 1
This example would correspond to a question of the type: "I’m trying to make a density graph with the lattice
as in the code above, because when I convert the data to Numeric they saw 2, 5, 4 ... and do not remain as the original data of the rnorm
?"
System information
Finally, when necessary, you can provide your system information with sessionInfo()
, which gives detailed information of your section. In my case, this information was:
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252
[3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lattice_0.20-15
loaded via a namespace (and not attached):
[1] grid_3.0.1 tools_3.0.1
Reprex package
To help create the playable example the reprex package can be quite useful, even previous examples have been generated in it. This is a package designed specifically to help create and run reproducible examples (the name reprex is short for Reproducible ExAmple), already with formatting for sites like Github and Stackoverflow.
A simple way to create a playable example with the package is to copy the code into R
to your clipboard. Then just load the package with library(reprex)
and rotate the command reprex(venue = "so")
that the code with the commented results already formatted will be available to be pasted to the chosen Venue (in this example "so" is Venue stackoverflow). All generated images are placed on Imgur and the link is generated automatically for posting, just paste the result.
The package has other quite useful functions. For example, you can automatically include system information with the argument si = TRUE
and also automatically format your code using the style suggested by Hadley with the argument style = TRUE
. For more information see the package page.
I think that asking several questions in one only escapes a little of the scope. You asked 6 questions in one (counting the title).
– Sam
@dvd This question has been discussed here
– Daniel Falbel
@Danielfalbel you do not want to put an answer explaining how to use the package reprex?
– Carlos Cinelli