How to plot a plot with ggplot

Asked

Viewed 334 times

4

I have the following date.:

                     sample OPN1SW  OPN1MW OPN1LW    RHO   OPN3   OPN4   OPN5
1: GTEX-11WQK-1026-SM-5EQLX  2.365  0.0000      0  4.138 86.322 40.199 12.533
2:  GTEX-XQ3S-1426-SM-4BOPR 22.317  0.0000      0 30.693 84.376 33.564  0.000
3:  GTEX-WHPG-2626-SM-3NMBR 21.142  0.6874      0 29.372 89.879 48.453  0.000
4:  GTEX-WEY5-2326-SM-3GIKK  0.000 16.2860      0 28.632 83.683 23.741  0.000
5: GTEX-14A5H-0826-SM-5QGPJ 20.448  0.0000      0 28.585 80.831 44.142 13.579
6: GTEX-132AR-0326-SM-5KM2C 12.052  0.0000      0 26.375 78.887 29.123 12.052

Dados Completo: https://pastebin.com/hSghfm2d

It’s a small sample from a Xena Browser (Bioinfo) database; the columns are the genic expression, while the rows are the samples.

I need to make a chart of boxplot, where genic expression is the axis x and the values are the y

The problem is I can’t do it, I tried it this way:

qplot(OPN1SW,sample,data = sk, geom='boxplot')

Plot de grafico com ggplot

But it’s not what I need; what I need is something like this:

Gráfico certo

The names of the X-axis would be the samples of my table. I imagine I’m missing X and Y at plotting time, but I don’t know how to solve it, because each column must be a boxplot with the appropriate calculated values. How can I resolve this? Thank you for your help.

1 answer

4


Your problem is in the organization of the data frame sk. The ggplot2 is a very demanding package with the form of the data to be plotted. The ideal is that one column has all the values of gene expression and another column has all the names of the genes.

A practical way to organize the data is through the command melt package reshape2:

library(reshape2)

sk_melt <- melt(sk)

Note that the data frame sk_melt has 3 columns:

  • sample: self-explanatory

  • variable: with the names of the genes of the original dataset

  • value: with the gene expression values associated with each gene in each sample

Now just do the boxplot with the new dataset sk_melt. Personally, I don’t like qplot. I think the graph is more organized using the ggplot spelling. So use the following commands and your chart will be ready:

library(ggplot2)

ggplot(sk_melt, aes(x=variable, y=value)) +
  geom_boxplot() +
  labs(x="Gene", y="Expressão Gênica")

inserir a descrição da imagem aqui

  • 1

    That’s just what I needed, thank you! However my CSV file is just over 3GB, I tried to turn it with mult() is error. I believe that can be done in "blocks", so that it does not consume all the RAM memory; but do you know any other lib that can be done without using mult? Or another efficient way to do this?

  • 1

    Try to use the package data.table to read your data. It is more efficient in memory usage than the traditional methods of R. Here’s an app for that.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.