How to group microdata of census persons by residence?

Asked

Viewed 632 times

4

I am trying to answer the following question: How many couples with children under 18 both parents work outside?

Given a table censo of the 2010 census (as the one from Acre), first thing I did was filter the table by couples with children.

  censo <- read.csv("AC.csv", sep = "\t")

  # V5090 -- TIPO DE COMPOSIÇÃO FAMILIAR DAS FAMÍLIAS ÚNICAS E CONVIVENTES PRINCIPAIS
  #  1 - Casal sem filho(s)
  #  2 - Casal sem filho(s) e com parente(s)
  #  3 - Casal com filho(s)   <----------------------
  #  4 - Casal com filho(s) e com parente(s)  
  #  5 - Mulher sem cônjuge com filho(s)
  #  6 - Mulher sem cônjuge com filho(s) e com parente(s)
  #  7 - Homem sem cônjuge com filho(s)
  #  8 - Homem sem cônjuge com filho(s) e com parente(s)
  #  9 - Outro
  #  Branco

 censo_cf <- censo[which(censo$"V5090"  == 3),]

Then I trained so that at least one of the children was under 18:

# V6660 IDADE DO ÚLTIMO FILHO TIDO NASCIDO VIVO ATÉ 31 DE JULHO DE 2010:
censo_cf18 <- censo_cf[which(censo_cf$V6660  < 18),]

my next step would be to group the interviewees by household (and then check which households both worked). Although I don’t see it documented anywhere for the 2010 census, according to documentation of the 2000 census (page 83) the variable controle would be:

Household identification

Thus, I would expect that within this subset of mine (couples with children) all households had at least three interviewees (husband, wife and child). However, only three households had this:

# V0300 CONTROLE
table_V0300 <- table(censo_cf18$V0300)
pessoas_por_domicilio  <- table(table_V0300)
pessoas_por_domicilio

   1    2    3 
9340   57    3

What is my mistake?

2 answers

3


Your mistake is in that part:

censo_cf18 <- censo_cf[which(censo_cf$V6660  < 18),]

The moment you do that, you’re cutting off 1) men (this variable only exists for women) and children. Therefore, counting how many times the variable V0300 (which is also the control in the 2010 census) is being done in the wrong way, and so the unexpected result.

What you should do is store that variable (V0300) of the cases you want (households with at least 1 child under 18 years, formed by couple and child(s) and where the couple works) and then select these households.

Follow a code (using the package data table. and the sample bank I have already has the Abels, but it is easy to adapt to date.frame and without the Abels):

# Primeiro filtro - Pegar o código das residências dos casais com filhos

Filtro1 <- dados[V5090 == 'Casal com filho(s)', V0300]

# Segundo filtro - Pegar, dos casais com filhos, as mães que tem filhos com menos de 18 anos

Filtro2 <- dados[V0300 %in% Filtro1 & V6660 < 18, V0300]

# Agora, pegar somente o responsável ou seu conjuge:

temp <- dados[V0300 %in% Filtro2 & V0502 %in% c('Pessoa responsável pelo domicílio', 'Cônjuge ou companheiro(a) de sexo diferente', 'Cônjuge ou companheiro(a) do mesmo sexo'), .(V0300, PessoaTrabalhando = V0641 == 'Sim')] # Aqui já pego só as variáveis de interesse.

# Terceiro filtro - Pegar, dos casais com filhos e com pelo menos 1 filho com menos de 18 anos, os que ambos trabalham.

Filtro3 <- temp[, .(PessoasTrabalhando = sum(PessoaTrabalhando)), by = V0300][PessoasTrabalhando == 2, V0300]

# Agora sim da para fazer as análises

novosdados <- dados[V0300 %in% Filtro3, ]
novosdados[, .(N = .N), by = V0300][, table(N)]

# Resultado em Porto Alegre:
# N
#    3    4    5    6    7    8    9   10   11   12   13 
# 1570 1139  397  141   52   17   13    6    1    1    1

Just remembering that the sample data should be weighted by the variable V0010. If I’m not mistaken, the weight of the household/ family is the same as that of the responsible. Taking advantage, you can download the documentation of the Census 2010 at that link (FTP of the IBGE itself).

1

Good afternoon rcoster and celacanto. Thank you for your talk, solved one of my doubts.

I would like to contribute that its solutions ignore secondary families, e.g. the family of the daughter living with the family of the parents. In this case, the primary family has V5040=1 and the secondary family V5040=2. V5090=2, "couple with children" only applies to the primary family; for the others it needs V5100=2. In other words, their solutions should also include secondary families (most likely to have children under 18 years of age).

Members of a family in the same household (V0300) are grouped by V5020. Grouping only by V0300 ignores the boundaries between families. " N" in the Rcoster solution is the number of people per household? The high numbers let me think of households of two families.

For the family of the head of the household, V0502 IN (1,2,3) is sufficient to identify both parents. For the secondary family, it should group by V0300 and V5020 and test whether the mother and one other person has V5090=2. In households of four generations could go wrong, but here ends my passion for genealogy :-)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.