3
What is the difference between how R and SAS perform merge?
The SAS Merge command returns 205546 lines whereas the R command returns 207208 lines.
Following example.
I am working with the IBGE file available at:
ftp://ftp.ibge.gov.br/PNS/2013/microdados/pns_2013_microdados.zip
DOMPNS2013.txt and PESPNS2013.txt bases will be used
SAS:
1) Variable assignment: run the files "DOMPNS2013 input" and "PESPNS2013 input"
2) Selection of an interest value and Merge:
data dompns2013v3;
set dompns2013;
if V0015 = 1;
run;
/*NOTE: There were 81187 observations read from the data set WORK.DOMPNS2013.
NOTE: The data set WORK.DOMPNS2013V2 has 64348 observations and 20 variables.*/
data arq.dompes2013v3;
merge dompns2013v3 pespns2013;
by v0001 v0024 upa_pns v0006;
run;
/*NOTE: There were 64348 observations read from the data set WORK.DOMPNS2013V2.
NOTE: There were 205546 observations read from the data set WORK.PESPNS2013.
NOTE: The data set ARQ.DOMPES2013V2 has 205546 observations and 388 variables.
NOTE: DATA statement used (Total process time):*/
#
R: 1) assignment of variables:
d2013 = read.fwf(file='DOMPNS2013.txt',widths=c(2,8,7,4,2,6,1,1))
names(d2013) = c("v0001","v0024","upa_pns","v0006","v0015","skip1","v0026","v0031")
d2013 = subset(d2013,select=c("v0001","v0024","upa_pns","v0006","v0015","v0026","v0031"))
p2013 = read.fwf(file='PESPNS2013.txt',widths=c(2,8,7,4,1,2,2,2,1,8,3))
names(p2013)=c("v0001","v0024","upa_pns","v0006","v0025","skip1","c00301","c004","c006","skip2","c008")
p2013=subset(p2013,select=c("v0001","v0024","upa_pns","v0006","v0025","c00301","c004","c006","c008"))
2) Selection of an interest value and Merge:
dim(d2013)
[1] 81187 7
d2013 = subset(d2013, d2013$v0015 == 1)
dim(d2013)
[1] 64348 7
dim(p2013)
[1] 205546 9
dpmerge = merge( p2013,d2013,by=c("v0001","v0024","upa_pns","v0006"))
dim(dpmerge)
[1] 207208 12
Henry, it seems to me that the SAS is removing the duplicate records of
DOMPNS
before merging. If you dod2013 <- unique(d2013)
before merging into R, the number of observations will be equal.– Carlos Cinelli