Understanding the Survey

Asked

Viewed 361 times

2

Come on...

I’m studying the Survey package

I started studying this page

http://www.ats.ucla.edu/stat/r/faq/svy_r_scpsu.htm

but my questions are more basic

I have already uploaded the following database

> mydata
  id str clu     wt hou85 ue91 lab91
1  1   2   1  0.500 26881 4123 33786
2  2   2   1  0.500 26881 4123 33786
3  3   1  10  1.004  9230 1623 13727
4  4   1   4  1.893  4896  760  5919
5  5   1   7  2.173  4264  767  5823
6  6   1  32  2.971  3119  568  4011
7  7   1  26  4.762  1946  331  2543
8  8   1  18  6.335  1463  187  1448
9  9   1  13 13.730   675  129   927
> 

I would like to understand very well what is being done in the following code

mydesign <- 
svydesign(
    id = ~clu ,
    data = mydata ,
    weight = ~wt ,
    strata = ~str
)

What is the role of the argument id = ~Clu?

And what is the role of the strata argument= ~str ?

From the little I read, it seems that there is some sort of division or separation of the mydata file. But I can’t see it...

Now look at the following sequence of commands

> summary(mydata$ue91)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     129     331     760    1401    1623    4123 
>
> options(survey.lonely.psu = "adjust")
> svymean(~ue91, mydesign)
       mean     SE
ue91 445.18 185.56

First the media is 1401 and then the media is 445.18. Why?

What IT MEANS IF?

Good guys, for now my doubts are those

Thank you

1 answer

3


The package survey is used for the analysis of complex samples. That is, where not all elements have the same probability of being sampled, and that is where the parameters come in strata. weight and id come in.

In this bank of yours, it is difficult to explain what it is, because I did not understand it properly, but I will try to explain it through the sample bank of the IBGE. The sample works as follows: 5% of the households are sampled (this value may change from one city to another), where all residents of these households answer the sampling questionnaire, which is more complete than that of the universe. Then these households are gathered in Aeds (Sample Data Expansion Areas). The data (available here) present, among many, the following variables::

V0010 - Peso amostral
V0011 - AED
V0300 - Controle

The variable V00100 - Peso amostral is calculated after the completion of the sample and census, through variables in common with the sample and universe questionnaire. Using the survey, we must declare the following parameters:

 svydesign(ids = ~ V0300,  strata = ~ V0011, weights = ~ V0010, data = dados)

In this case, the parameter ids receives the variable V0300, for it is the code of the sampled house, and all the members of the house are interviewed (therefore, the house is a cluster, and not a stratum). The strata are the Aeds (V0011), because only one percentage of its population (of residences) was sampled. The sample weight (weights) receives V0010.

The difference in the results you obtained is because the 2nd is a weighted average, using as weight the variable wt. Da to get the same value using the command:

with(mydata, sum((wt * ue91)/sum(wt)))
[1] 445.1821

Already SE, is the standard error of the average estimate.

  • Thanks for the reply... gave to understand a little... And in the case of svytotal(~ue91, mydesign)? What does it mean? What would be an alternative way to achieve this same result without using the functions of the Survey package? like the one you gave: with(mydata, sum((wt * ue91)/sum(wt)))

  • The svytotal() is to estimate the sum of the variable. In this case, a with(mydata, sum(wt * ue91)) must return the same value (Note that you are no longer dividing by the sum of wt).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.