Chi-Square calculation for proportions

Question

Chi-Square calculation for proportions

Asked 5 years, 1 month ago

Viewed 59 times

2

I have a DF with death proportions in population A and B. I want to do the test to verify the independence of the populations. Follow my DF:

DATE<- c("2017/jan","2017/feb","2017/mar","2017/apr","2017/may","2017/jun","2017/jul","2017/aug","2017/sep","2017/oct","2017/nov","2017/dec","2018/jan","2018/feb","2018/mar","2018/apr","2018/may","2018/jun","2018/jul","2018/aug","2018/sep","2018/oct","2018/nov","2018/dec","2019/jan","2019/feb","2019/mar","2019/apr","2019/may","2019/jun","2019/jul","2019/aug","2019/sep","2019/oct","2019/nov","2019/dec")
POP_A<- c(0.0304,0.0394,0.0346,0.0331,0.0411,0.0453,0.0443,0.0476,0.0423,0.0331,0.0416,0.0368,0.0407,0.0439,0.0404,0.0414,0.0464,0.0414,0.0494,0.0497,0.041,0.0454,0.0372,0.0448,0.0464,0.034,0.0514,0.0462,0.0416,0.0428,0.058,0.0392,0.0397,0.051,0.0435,0.0437)
POP_B<- c(0.01,0.0242,0.031,0.0155,0.0324,0.0274,0.04,0.0251,0.0208,0.0255,0.0371,0.0211,0.0265,0.0291,0.0202,0.0233,0.019,0.0213,0.0103,0.034,0.0196,0.0175,0.0233,0.038,0.0327,0.0235,0.0236,0.0231,0.0228,0.0172,0.0211,0.0272,0.0398,0.0218,0.0301,0.031)
DF<- data.frame(DATE,POP_A,POP_B)

How would the Chi-Square test of populations A and B?

What are the population totals? And you want the test per month?

– Rui Barradas

2020/06/16 at 09:01

2 answers

1

I do not believe that the data as it is in the question is sufficient to carry out a chi-square independence test. In order to do this, data are needed to calculate counts (contingency table) and from these counts the proportions. Moreover, it is not clear whether you want to know the independence of the two variables, POP_A and POP_B over time, month by month (variable DATE). View the discussion in comments to reply of the user @Danielly Xavier.

I would start with view the data.
First, plot a graph of the relationship between the two continuous variables.

library(tidyverse)
library(lubridate)
library(Hmisc)

ggplot(DF, aes(POP_A, POP_B)) +
  geom_point()

There seems to be no remarkable regularity, populations appear to be independent.

Now a graph of the proportions is plotted in order to time. For this I will reformat the data to the long format with the function pivot_longer package tidyr that is part of the tidyverse.

DF %>%
  mutate(DATE = ymd(paste(DATE, '01'))) %>%
  pivot_longer(
    cols = matches('POP'),
    names_to = 'POP',
    values_to = 'VALOR'
  ) %>%
  ggplot(aes(DATE, VALOR, colour = POP)) +
  geom_point() +
  geom_smooth(method = 'lm', formula = y ~ x, se = FALSE)

Again there seems to be no relationship between the variables.

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Danielly Xavier • 1 point · Answer 1 · 2020-06-15T18:45:17+00:00

First, identify which categories you are comparing. What is the category you are comparing? To calculate the X² you will need 4 categories: success A, success B, failure A and failure B.
In this case, your data.frame should have 4 columns, with the categories you are comparing. Below is an example of how I calculated X² for a sample of deaths from dengue and other causes.

#teste X2

##pressupostos
## variáveis categóricas

#exemplo: houve maior mortalidade por dengue quando comparado às outras causas?

library(foreign)
obtdf18= read.dbf('OBTDF18.dbf')

dobt = subset(obtdf18, obtdf18$CAUSABAS == 'A91')

#tabela de contigência
## mortes por dengue
denobt = NROW(dobt)

## mortes por outras causas
obtr = NROW(obtdf18) - denobt

## casos de dengue
dtot = NROW(den_14)- denobt

## restante da população
popr = popdf$POP - dtot
tab_cont = c(denobt, obtr, dtot, popr)
tab_cont = data.frame(rbind(tab_cont, tab_cont, tab_cont))

#calculando X²
tab_cont2 <- cbind(tab_cont, t(apply(tab_cont, 1, function(x) {
ch <- chisq.test(x)
c(unname(ch$statistic), ch$p.value)})))
colnames(tab_cont2) = c('denobt', 'obtr', 'dtot', 'popr', 'x-squared', 'p-value')