Interaction graph in ggplot2

Asked

Viewed 650 times

4

I’m trying to adapt some standard R graphics to the style of ggplot2. One of the graphs for which I intend to do this is the interaction graph in a linear model adjustment study.

The following data were taken from Example 9-1 of the book Design and Analysis of Experiments, by Douglas C. Montgomery, 6th Edition.

montgomery <- structure(list(Nozzle = c("A1", "A1", "A1", "A1", "A1", "A1", 
"A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", 
"A1", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", 
"A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A3", "A3", "A3", 
"A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", 
"A3", "A3", "A3", "A3"), Speed = c("B1", "B1", "B1", "B1", "B1", 
"B1", "B2", "B2", "B2", "B2", "B2", "B2", "B3", "B3", "B3", "B3", 
"B3", "B3", "B1", "B1", "B1", "B1", "B1", "B1", "B2", "B2", "B2", 
"B2", "B2", "B2", "B3", "B3", "B3", "B3", "B3", "B3", "B1", "B1", 
"B1", "B1", "B1", "B1", "B2", "B2", "B2", "B2", "B2", "B2", "B3", 
"B3", "B3", "B3", "B3", "B3"), Pressure = c("C1", "C1", "C2", 
"C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", "C3", "C1", "C1", 
"C2", "C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", "C3", "C1", 
"C1", "C2", "C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", "C3", 
"C1", "C1", "C2", "C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", 
"C3", "C1", "C1", "C2", "C2", "C3", "C3"), Loss = c(-35, -25, 
110, 75, 4, 5, -45, -60, -10, 30, -40, -30, -40, 15, 80, 54, 
31, 36, 17, 24, 55, 120, -23, -5, -65, -58, -55, -44, -64, -62, 
20, 4, 110, 44, -20, -31, -39, -35, 90, 113, -30, -55, -55, -67, 
-28, -26, -62, -52, 15, -30, 110, 135, 54, 4)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -54L), .Names = c("Nozzle", 
"Speed", "Pressure", "Loss"))

According to the traditional way of creating the chart I want, I need to run

interaction.plot(montgomery$Nozzle, montgomery$Speed, montgomery$Loss)

Gráfico de Interação Tradicional

I can create a similar chart using ggplot2:

library(dplyr)
library(ggplot2)

interaction <- montgomery %>%
  select(Nozzle, Speed, Loss) %>%
  group_by(Nozzle, Speed) %>%
  summarise(Average = mean(Loss))

ggplot(interaction, aes(x=Nozzle, y=Average, colour=Speed, group=Speed)) + 
geom_line()

Gráfico de Interação com ggplot2

What I wish now is to create a function called interaction.plot.ggplot2 automatically do the previous chart. The problem is that I don’t know how to call the columns for the commands of dplyr to prepare the data to be plotted.

interaction.plot.ggplot2 <- function(response, predictor, group, data){

    interaction <- data %>%
      select(predictor, group, response) %>%
      group_by(predictor, group) %>%
      summarise(average = mean(response))

    p <- ggplot(interaction, aes(x=predictor, y=average, colour=group, group=group)) + 
    geom_line()

    print(p)
}

interaction.plot.ggplot2(Loss, Nozzle, Speed, montgomery)

Error in eval(expr, envir, enclos) : object 'Nozzle' not found

What should I do to make my job interaction.plot.ggplot2 create the chart I want?

1 answer

3


Make programs where variables vary with the dplyr and with the ggplot can be very boring.

Here’s a function that works for what you want:

library(dplyr)
library(ggplot2)
library(lazyeval)

interaction.plot.ggplot2 <- function(response, predictor, group, data){

  l_response <- lazy(response)
  l_predictor <- lazy(predictor)
  l_group <- lazy(group)

  interaction <- data %>%
    select_(.dots = list(l_predictor, l_group, l_response)) %>%
     group_by_(.dots = list(l_predictor, l_group)) %>%
     summarise_(
       .dots = setNames(list(interp(~mean(response), response = l_response)), "average")
       )

  p <- ggplot(interaction, aes_string(x=expr_text(predictor), y="average", colour=expr_text(group), group=expr_text(group))) +
    geom_line()

  print(p)
}


interaction.plot.ggplot2(Loss, Nozzle, Speed, montgomery)

inserir a descrição da imagem aqui

All this is reasonably well described in these links:


To accept variable names without quotes ex: select(data, nome_var), dplyr uses what is called Lazy Evaluation or non-standard Evaluation. It is called that because usually the R calculates/evaluates the arguments of the functions before using them within the function.

For example:

myfun <- function(x){
  return(x)
}
myfun(x = 1 + 1)
[1] 2

Lazy-Evaluation is a way to delay the evaluation of the argument, in order to make it possible to capture the expression that the user typed as the argument of the function.

myfun <- function(x){
  return(lazy(x))
}
myfun(x = 1 + 1)
<lazy>
  expr: 1 + 1
  env:  <environment: R_GlobalEnv>

This form of programming allows the non-standard scoping which is very useful for making interactive data analysis programs. The trade-off is the complexity of the code when the analysis is not interactive (for example its problem).

I leave here the relevant part of the lazyeval approach:

Non-standard scoping (NSS) is an Important part of R because it makes it easy to write tailored functions for Interactive data Exploration. These functions require Less Typing, at the cost of some ambiguity and "Magic". This is a good trade-off for Interactive data Exploration because you want to get Ideas out of your head and into the computer as quickly as possible. If a Function does make a bad Guess, you’ll spot it quickly because you’re Working Interactively.

Rebound that to understand well, it is very worth reading that document

  • 1

    Excellent answer, Daniel. I was unaware of this concept of lazyeval.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.