Plot of the CL50 Curve in R using ggplot2 with estimates obtained in Stan

Asked

Viewed 54 times

0

I am trying to make the following graph below in ggplot2. Note that I am adjusting a logistic regression model under the Bayesian approach using the Stan package.

dice: https://drive.google.com/file/d/1NNA6DFYFSVkwuL92RYe58E_Z99BXl7HE/view?usp=sharing

library(dplyr)
library(rstanarm)
library(ggplot2)
dados<- read.table("dados.csv", header = T, sep=";", dec = ",")
dados$periodo <-  as.factor(dados$periodo)
dados <- dados %>% mutate(proporcao =  (dados$resposta)/60)
dados <- dados %>% mutate(logdose = log(dados$concentracao))
dados<- mutate(dados, 
               C_resposta=60-resposta)
fitstanglm24 <- stan_glm(cbind(resposta, C_resposta)~ logdose, 
                         family = binomial(link = "logit"), 
                         data = dados,
                         subset=periodo=="24h")
summary(fitstanglm24)
plot(dados$proporcao[dados$periodo=="24h"]~dados$logdose[dados$periodo=="24h"], 
     pch=16, main="Experiment Duration: 24h", col="black", lwd=3, xlim=c(-8,-1),
     ylim=c(0,1),xlab="log(Concentrations g.a.i/L)",
     ylab="Proportion of bees killed")
lines(c(-3.53,-3.53),c(0,0.50),lty=3)
lines(c(-3.53,-8),c(0.50,0.50),lty=3)
legend(-7.8,0.52,c(expression(paste(LC[50], "= -3.53"))),bty="n",cex=1.1)
points(fitstanglm24$fitted.values~dados$logdose[dados$periodo=="24h"], 
       pch=16, col="red", lwd=3)
curve(plogis(coef(fitstanglm48)[1] + coef(fitstanglm48)[2]*x),-10,-1,
      add=TRUE, col="black",lty=2)
legend("topleft",c("Real Values","Adjusted Values"), 
       col=c("black","red"),pch=16)

However, the following error appears.

ggplot(data = dados, aes(x = dados$logdose[dados$periodo=="24h"], y = (dados$proporcao[dados$periodo=="24h"]))) + 
   geom_point(size = 2, colour = "#FF4D87") +
   labs(title="Experiment Duration: 24h",
        x ="log(Concentrations g.a.i/L)", y = "Proportion of bees killed") +
   theme(
      plot.title = element_text(hjust = 0.5, size = 15),
      plot.subtitle = element_text(hjust = 0.5, size = 12),
      axis.title.x = element_text(size = 13),
      axis.title.y = element_text(size = 13)) +
   geom_smooth(method = "glm", lwd = 1.4, 
               method.args = list(family = binomial(link = "logit")), 
               aes(weight = total), colour = "#FF4D87", se = TRUE, fill = "#FFA2C1")
Erro: Aesthetics must be either length 1 or the same as the data (15): x and y
Run `rlang::last_error()` to see where the error occurred.

Moreover:

Warning messages:
1: Use of `dados$logdose` is discouraged. Use `logdose` instead. 
2: Use of `dados$periodo` is discouraged. Use `periodo` instead. 
3: Use of `dados$proporcao` is discouraged. Use `proporcao` instead. 
4: Use of `dados$periodo` is discouraged. Use `periodo` instead. 

1 answer

1


The warnings give a good hint of what is happening:

Warning messages:
1: Use of `dados$logdose` is discouraged. Use `logdose` instead. 
2: Use of `dados$periodo` is discouraged. Use `periodo` instead. 
3: Use of `dados$proporcao` is discouraged. Use `proporcao` instead. 
4: Use of `dados$periodo` is discouraged. Use `periodo` instead.

It is recommended not to reference the columns of the datasets in the schema dados$coluna to make a chart on ggplot. It is always preferable to call directly the names of the variables. Therefore, I suggest filtering the data for the 24-hour period before making the chart, as follows:

dados %>%
  filter(periodo == "24h") %>%
  ggplot(aes(x = logdose, y = proporcao)) + 
  geom_point(size = 2, colour = "#FF4D87") +
  labs(title="Experiment Duration: 24h",
       x ="log(Concentrations g.a.i/L)", y = "Proportion of bees killed") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 15),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    axis.title.x = element_text(size = 13),
    axis.title.y = element_text(size = 13)) +
  geom_smooth(method = "glm", lwd = 1.4, 
              method.args = list(family = binomial(link = "logit")), 
              aes(weight = total), colour = "#FF4D87", se = TRUE, fill = "#FFA2C1")
#> `geom_smooth()` using formula 'y ~ x'

Created on 2020-12-31 by the reprex package (v0.3.0)

Just be aware that the function plotted on the graph is calculated in a frequentist way. Thus, its coefficients are slightly different from those estimated by the Bayesian model (which, in reality, will always be different due to the nature of the parameters estimation via MCMC).

fitstanglm24$coefficients
#> (Intercept)     logdose 
#>   2.2127430   0.6233138
dados_24 <- filter(dados, periodo == "24h")
glm(proporcao ~ logdose, data = dados_24, family = binomial(link = "logit"))
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
#> 
#> Call:  glm(formula = proporcao ~ logdose, family = binomial(link = "logit"), 
#>     data = dados_24)
#> 
#> Coefficients:
#> (Intercept)      logdose  
#>       2.184        0.616  
#> 
#> Degrees of Freedom: 4 Total (i.e. Null);  3 Residual
#> Null Deviance:       1.003 
#> Residual Deviance: 0.04174   AIC: 8.217

Created on 2020-12-31 by the reprex package (v0.3.0)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.