Error fitdist "should not have NA or Nan values"

Asked

Viewed 78 times

0

I have been trying to adjust the distributions and lognormal in the data and have been facing the following error:

--should not have NA or Nan values--.

Dice: https://drive.google.com/file/d/12fc38jWMFiAME3ImgED2I-nt69jJHfrh/view?usp=sharing

Follow below the codes:

rm(list = ls())
library(readxl)
library(survival)
library(muhaz)
library(fitdistrplus)

setwd("C:\\Users\\breni\\Google Drive\\Acadêmica\\Mestrado\\TrabalhosSubmetidos\\SubmissãoWASA\\CoxEstratificado")
dados = readxl::read_excel('dados1.xlsx')
dados$Estagio = dados$Extensão
dados$Estagio[dados$Extensão=='SEM INFORMAÇÃO'] <- NA
dados$Estagio[dados$Extensão=='IN SITU'] <- NA

dados$Grau.de.Instrução[dados$Grau.de.Instrução=='FUNDAMENTAL'] <- 'ENSINO FUNDAMENTAL'

dados <- data.frame(dados)
head(dados)
attach(dados)
x11()
hist(dados$tempo_vida_meses)

####################################
cbind(table(Estado.Civil),prop.table(table(Estado.Civil))*100)
cbind(table(Raca.Cor),prop.table(table(Raca.Cor))*100)
cbind(table(Grau.de.Instrução),prop.table(table(Grau.de.Instrução))*100)

######################################################################
Weibdist = fitdist(dados$tempo_vida_meses, "weibull")
Expdist = fitdist(dados$tempo_vida_meses, "exp")
lgnormdist = fitdist(dados$tempo_vida_meses, "lnorm")

1 answer

1


A function thatfitdist flame, startarg, error if there is some value equal to zero in the vector inserted in fitdist.

if (distr == "weibull") {
    if (any(x < 0)) 
      stop("values must be positive to fit an Weibull  distribution")
    m <- mean(log(x))
    v <- var(log(x))
    shape <- 1.2/sqrt(v)
    scale <- exp(m + 0.572/shape)
    start <- list(shape = shape, scale = scale)

Zero values, as they exist in the vector, create NAs in the input of other functions, even if in the initial vector there is no NA.
Same thing for lgnormdist. Input values have to be positive.

Then you use something like filter package dplyr or

entrada <- dados$tempo_vida_meses[dados$tempo_vida_meses > 0]
  • 1

    Excellent answer. But here comes a theoretical discussion: if these data are sampled and 0 is a possible result for them, should Weibull or log-normal distributions be an option for their modeling? In my opinion no. Perhaps it was necessary to look for another probability distribution, such as the zero inflated Weibull.

  • Reinforcing the observation of Marcus Nunes: the answer was great to identify the cause of the problem; but Breno, if the probability function he is using does not accept zeros and the data have zeros, look for a more appropriate one. The model should be adjusted to the data, never the other way around.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.