Frequency Table with two variables

Asked

Viewed 381 times

-3

Hello, I’m having a doubt that I can’t find the answer. I have a data set from a Statistics book that I’m studying Link to the dataset.

When imported it appears as in the image.

Imagem do dataset.

What I would like to do is to transform and view the data set as shown in the following image

inserir a descrição da imagem aqui.

Thank you very much to anyone who can help me.

  • what are these numbers you seek?

  • A cross-visualization of frequencies... is a concise way of visualizing data... appears in all lovro of statistics and I wanted to know how to do with pandas

  • Sorry, I guess I didn’t make myself clear in my previous comment. Can you describe in detail how those numbers are filled in? Which column do they come from? What happens when the "Capital" value repeats? is the numbers of each column summed? is the average calculated?

  • ah yes excuse me...these numbers are the frequencies ... for example in the column Instruction Degree we have qualitative data, if we make the sum we will have 12 E Fundamental, 18 and High School and 6 Higher Education... and these data are related to the region... the table of the book I posted shows for example that of the cases of Elementary School, 4 belong to the Capital... What I don’t know is to cross-reference the information from these two columns... thank you very much for your interest in helping me Terry...

1 answer

0


To do this you will need to use the groupby by the columns Region of Origin and Instruction Degree, use command size to take the size of each of these groups. After this, it is possible to remove the data on Instruction level index with the command unstack for them if "transform" into columns, in this way:

df2 = df.groupby(['Região de Procedência', 'Grau de Instrução']).size().unstack(1)
df2.head()

Grau de Instrução   ensino fundamental  ensino médio    superior
Região de Procedência           
capital             4                   5               2
interior            3                   7               2
outra               5                   6               2

In order to calculate the total values, the sum of each column with sum, and save this data in the index "Total", and then repeat the same function but adding line by line with sum(axis= 1) to create a new column.

df2.loc['Total',:]= df2.sum(axis=0)
df2.loc[:,'Total'] = df2.sum(axis=1)
df2.head()

Grau de Instrução   ensino fundamental  ensino médio    superior    Total
Região de Procedência               
capital             4.0                 5.0             2.0         11.0
interior            3.0                 7.0             2.0         12.0
outra               5.0                 6.0             2.0         13.0
Total               12.0                18.0            6.0         36.0 

Browser other questions tagged

You are not signed in. Login or sign up in order to post.