Survey and count package

Asked

Viewed 513 times

3

I am working with the IBGE database - PNAD with the help of the Survey package. To count responses (for example, the proportion of people under 15 years of age in the northern region) is it necessary to use this package? in which case what would be the command?

  • 4

    you should use the Survey package not for the package itself, but as a means of including the sample plan in the analysis. You should do this because due to the PNAD sampling procedure individuals have different weights. I think your question, and others you may have, can be resolved in the following book:http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470284307.html

2 answers

2

I don’t know the package survey, but I like to stick to the basic R functions to solve any problem before resorting to an extra package.

Since you have not provided data as an example, I will use the built-in database mtcars

head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

To know how many cases there are for each value, just use the function table. Remember that counting/frequency is only spoken with discrete or categorical variables:

> table(mtcars$gear)

 3  4  5 
15 12  5 

That is, there are 15 cases with gear == 3, 12 with gear == 4 and 5 with gear == 5. The same can be done by "crossing variables":

> table(mtcars$carb , mtcars$gear)

    3 4 5
  1 3 4 0
  2 4 4 2
  3 3 0 0
  4 5 4 1
  6 0 0 1
  8 0 0 1

1


It is not necessary to use Survey, mainly for simpler procedures. In this case, a simple aggregate() would be enough for you. I don’t have the PNAD data on my computer right now, but follow an example:

library(survey)
dados <- data.frame(Peso = rchisq(100, 10), Idade = rnorm(100, 40, 10))

delineamento <- svydesign(ids = ~ 1, weights = ~ Peso, data = dados)
svytable(~ Idade > 30, delineamento)

aggregate(Peso ~ Idade > 30, dados, FUN = sum)

Note that the 2 results are equal. Importantly, a simple table() does not serve because each observation has different weight.

with(dados, table(Idade > 30))

I wrote a blog post about Survey, may be useful (especially the links in the comments).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.