Create new column from existing column in R

Asked

Viewed 545 times

3

I have a database in which the first column has the code of the disciplines of my institution and the second column has the name of the respective discipline. I want to create a third column where entries depend on the discipline code. For example, if the discipline has initials AGF wish that in the created column appear the course of AGRONOMY, if the initials are ADF, I want the corresponding entry in the created column to appear ADMINISTRATION. As well as, MAF corresponds to MATEMATICA, MFF corresponds to CHEMISTRY and so on. Follow the spreadsheet that I have:

link <- 
url("https://raw.githack.com/fsbmat/StackOverflow/master/tab.txt")
tab <- read.table("https://raw.githack.com/fsbmat/StackOverflow/master/tab.txt",sep = "\t",header = TRUE)

How to proceed with this construction in the R?

1 answer

2


The code below does the desired.

library(dplyr)

tab_mini <- head(tab)

tab_mini %>%
  mutate(Cd_Disciplina_Simples = sub("^([[:alpha:]]*).*", "\\1", Cd_Disciplina)) %>%
  mutate(Curso = recode(Cd_Disciplina_Simples,
                        ADF = "ADMINISTRACAO",
                        AGF = "AGRONOMIA",
                        BQF = "BIOQUIMICA")) %>%
  select(-Cd_Disciplina_Simples)

  Cd_Disciplina             Nome_Disciplina         Curso
1       ADF 401            SOCIOLOGIA RURAL ADMINISTRACAO
2       AGF 100      INTRODUÇÃO À AGRONOMIA     AGRONOMIA
3       AGF 150             DESENHO TÉCNICO     AGRONOMIA
4       BQF 100      BIOQUÍMICA FUNDAMENTAL    BIOQUIMICA
5       BQF 101 LABORATÓRIO DE BIOQUÍMICA I    BIOQUIMICA
6       BQF 102           BIOQUÍMICA BÁSICA    BIOQUIMICA

I just applied it to the first six lines of the original dataset because they’re very different codes. I imagine that my example is enough to continue what is desired.

What my code does is this:

  1. Through a regular expression, I create a column called Cd_Disciplina_Simples. How the codes of the disciplines are of the type ABC XYZ, only the part ABC is required to determine the course. Thus, the regular expression I put there serves precisely to extract only the letters of the code of the discipline.

  2. The function recode is applied in Cd_Disciplina_Simples just to make the conversion requested in the original question: ADF flipped ADMINISTRACAO, for example. Like the R does not know what each of the codes of disciplines with three letters means, it is necessary to enter with their meaning manually.

  3. Like the column Cd_Disciplina_Simples is not necessary at the end, the function select removes it from the final dataset. If there is a conflict between functions select of different packages, replace the line

    select(-Cd_Disciplina_Simples)

for

dplyr::select(-Cd_Disciplina_Simples)

So the R will be informed that he must use the function select package dplyr.

  • Thanks Marcus Nunes, solved!!!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.