Contingency table in R

Asked

Viewed 1,954 times

2

I have a table that summarizes in the following:

Destino; Proposito; Custo;<br/>
Chicago; Negocios; 35;<br/>
Nova York; Negocios; 30;<br/>
Miami; Turismo; 25;<br/>
Chicago; Estudo; 50;<br/>
Nova York; Turismo; 40;<br/>
Miami; Estudo; 90;<br/>
Miami; Estudo; 110;<br/>
Chicago; Turismo; 30;<br/>
Miami; Negocios; 20;<br/>
Chicago; Turismo; 35;<br/>
Nova York; Negocios; 40;<br/>
Chicago; Estudo; 150;<br/>
Nova York; Turismo; 40;<br/>
Miami; Negocios; 30;<br/>
Nova York; Estudo; 140;<br/>
Chicago; Turismo; 35;<br/>
Nova York; Turismo; 40;<br/>

I am trying to write an R script that turns this data into a contiguous table with the following configuration:

Destino; Negocios; Turismo; Estudo; Total;<br/>
Miami; 50; 25; 200; 275;<br/>
Chicago; 35; 100; 150; 285;<br/>
Nova York; 70; 120; 140;<br/>
Total; 155; 130; 545;<br/>

The idea is to have a matrix with the sum of costs per "Destination" and "Purpose" simultaneously.

The most I could, after much help from my colleagues with SQL experience, was:

require(sqldf)
%>% df
select Detino,
       sum(case when carater = 'Estudo'   then Custo else 0 end) as P_Estudo,
       sum(case when carater = 'Turismo'  then Custo else 0 end) as P_Turismo,
       sum(case when carater = 'Negocios' then Custo else 0 end) as P_Negocios,
       sum(Custo) as Total
from df
group by Destino

The task would be easier if the goal was to obtain the total costs by "Destination" or by "Purpose". There are several ways to make him employ the functions group_by package dplyr, or aggregate or xtabs.

I appreciate any suggestion that might help me solve the problem.

1 answer

2

Try to rotate the commands with > ahead, assuming that your data is within a data frame called df:

> library(reshape2)
> acast(df, Destino ~ Proposito, fun.aggregate=sum)

Using Custo as value column: use value.var to override.
           Estudo  Negocios  Turismo
Chicago       200        35      100
Miami         200        50       25
Nova York     140        70      120

Now just add the margins with the totals:

> addmargins(acast(df, Destino ~ Proposito, fun.aggregate=sum))

Using Custo as value column: use value.var to override.
           Estudo  Negocios  Turismo
Chicago       200        35      100
Miami         200        50       25
Nova York     140        70      120

Notice that in the second block of commands I repeated the command acast. In fact, it is not necessary to separate it into two steps. I did so to be a step by step, to improve understanding.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.