Correlation test on categorical data with low number of counts

Asked

Viewed 49 times

-2

I ran a correlation test on R and p-value was NA. Can someone explain this to me?

It was correlating schooling with labor occupation of the person. Some labor occupation data are missing (NA in the table) is due to that? if it is how should I proceed to run the new test? The command I executed was:

dput(head(bruno$ocupacao, 20))
structure(c(4L, 3L, 3L, 3L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
3L, 4L, 3L, 4L, 5L, 3L, 4L), .Label = c("agricultora_e_artesa", 
"aposentado", "atv.domesticas", "campo", "prof.liberais"), class = "factor")
dput(head(bruno$tempo.esc, 20))
structure(c(3L, 2L, 3L, 2L, 3L, 3L, 5L, 3L, 3L, 3L, 3L, 3L, 5L, 
3L, 5L, 2L, 2L, 2L, 3L, 3L), .Label = c("analfabeto", "fund.completo", 
"fund.incompleto", "med.completo", "med.incompleto"), class = "factor") 

    Tabela2<-table(dados$ocupacao,dados$tempo.esc)

    chisq.test(Tabela2)
    ## Pearson's Chi-squared test
    ## data:  Tabela2
    ## X-squared = NaN, df = 16, p-value = NA
  • Welcome to Stackoverflow! Unfortunately, this question cannot be reproduced by anyone trying to answer it. Please take a look at this link (mainly in the use of function dput) and see how to ask a reproducible question in R. So, people who wish to help you will be able to do this in the best possible way.

  • Can you please, edit the question with the departure of dput(dados) or, if the base is too large, dput(head(dados, 20))?

  • dput(head(Bruno$occupation, 20)) Structure(c(4L, 3L, 3L, 3L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 4L, 3L, 4L, 5L, 3L, 4L), . Label = c("agricultra_e_artesa", "retired", "atv.domesticas", "field", "prof.liberal"), class = "factor") dput(head(Bruno$tempo.Esc, 20)) Structure(c(3L, 2L, 3L, 3L, 3L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L, 5L, 2L, 2L, 3L, 3L), . Label = c("illiterate", "Fund.completo", "Fund.incompleto", "med.completo", "med.incompleto"), class = "factor")

2 answers

1

The number of cases is very low in several boxes (many contain the value 0, for example).

The message of alert when you apply the chisq.test is:

Warning message:

In chisq.test(Table2) : Chi-Squared approximation may be incorrect

The above warning occurs because several expected values will be very small and thus the p-values may not be correct (chisq.test is usually used when you have higher scores).

However, an alternative to this is to use Fisher’s exact test (by convention, if the count is below 5, use fisher.test):

fisher.test(Tabela2)

#   Fisher's Exact Test for Count Data

#data:  Tabela2
#p-value = 0.109
#alternative hypothesis: two.sided

-3

I remember that in past versions of R chisq.test had a bug that the data could not be percentage representation values (for example 0.12), I do not know if they solved it currently, because of the doubts put here the script you are running.

  • My data is not percentage values

Browser other questions tagged

You are not signed in. Login or sign up in order to post.