Creating a dataframe based on two other dataframes using dplyr in R

Asked

Viewed 136 times

4

These are my dataframes:

df<-    as.data.frame(matrix(rexp(200),, 25))
colnames(df)<-c("A","B","C","D","E","F","G","H","I","J",
        "K","L","M","N","O","P","Q","R","S","T",
        "U","V","X","Z","W")


df.new<-as.data.frame(matrix(rexp(200),,20))
colnames(df.new)<-c("A.B","A.B.new",
            "A.C","A.C.new",
            "A.F","A.F.new",
                "B.C","B.C.new",
                    "C.D","C.D.new",
                    "F.G","F.G.new",
                "H.I","H.I.new",
                    "H.K","H.K.new",
                    "L.M","L.M.new",
            "N.Q","N.Q.new")
df.new<-df.new[-c(9:10),]


finaldf<-cbind(df,df.new)
colnames(finaldf)<-c("A","B","C","D","E","F","G","H","I","J",
        "K","L","M","N","O","P","Q","R","S","T",
        "U","V","X","Z","W","A.B","A.B.new",
            "A.C","A.C.new",
            "A.F","A.F.new",
                "B.C","B.C.new",
                    "C.D","C.D.new",
                    "F.G","F.G.new",
                "H.I","H.I.new",
                    "H.K","H.K.new",
                    "L.M","L.M.new",
            "N.Q","N.Q.new")

I would like to make the dataframe finaldf sort the columns like this:

colnames(finaldf)<-c("A","A.B","A.B.new",
            "A","A.C","A.C.new",
            "A","A.F","A.F.new",
            "B","B.C","B.C.new",
            "C","C.D","C.D.new",
            "F","F.G","F.G.new",
            "H","H.I","H.I.new",
            "H","H.K","H.K.new",
            "L","L.M","L.M.new",
            "N","N.Q","N.Q.new")

As my original dataframe is much larger, I must need a more robust code, which escapes my ability in view q I am new in R.

Note that the idea is simply to take columns from the dataframe df and play on the dataframe df.new. But these columns must fit the order established by the dataframe df.new.

And I would like to do this using dplyr package. It is possible?

Edited:

Well, my original code comes with the names of the Bovespa steels:

nms.new<-c("ABEV3.BBAS3", "ABEV3.BBAS3.new", "ABEV3.BRAP4", "ABEV3.BRAP4.new", 
"ABEV3.BRKM5", "ABEV3.BRKM5.new", "ABEV3.CSAN3", "ABEV3.CSAN3.new", 
"ABEV3.CSNA3", "ABEV3.CSNA3.new", "ABEV3.CYRE3", "ABEV3.CYRE3.new", 
"ABEV3.DTEX3", "ABEV3.DTEX3.new", "ABEV3.ELPL4", "ABEV3.ELPL4.new", 
"ABEV3.EVEN3", "ABEV3.EVEN3.new", "ABEV3.FIBR3", "ABEV3.FIBR3.new", 
"ABEV3.GGBR4", "ABEV3.GGBR4.new", "ABEV3.GOAU4", "ABEV3.GOAU4.new", 
"ABEV3.HYPE3", "ABEV3.HYPE3.new", "ABEV3.JBSS3", "ABEV3.JBSS3.new","SANB11.BBAS3","SANB11.BBAS3.new")

I would need to separate what comes before and after the point and not 1 in 1. Sometimes comes 5 letters before the point, other times comes 6 letters.

  • The column A appears 3 times, is that right? Repeated before each A.C and A.F? And the same for B, C, etc.?

  • Exact @Ruibarradas

1 answer

3


Here it comes.
The trick is to group by the first characters of names(finaldf) and then process the group list.

First we chose the names that matter.

nms <- names(finaldf)
nms <- nms[sapply(nms, nchar) > 1]

Now group by the first 3 characters.

sp <- split(nms, substr(nms, 1, 3))
nms <- lapply(sp, function(s) c(substr(s[1], 1, 1), s))
nms <- unlist(nms)
result <- finaldf[nms]

The result result has the dimensions of the question.

dim(result)
#[1]  8 30

But beware, as there can be no repeated names, there will be A, A.1 and A.2. And the same thing for any others where there are repetitions, for example H.I and H.K.

names(result)
# [1] "A"       "A.B"     "A.B.new" "A.1"     "A.C"     "A.C.new"
# [7] "A.2"     "A.F"     "A.F.new" "B"       "B.C"     "B.C.new"
#[13] "C"       "C.D"     "C.D.new" "F"       "F.G"     "F.G.new"
#[19] "H"       "H.I"     "H.I.new" "H.1"     "H.K"     "H.K.new"
#[25] "L"       "L.M"     "L.M.new" "N"       "N.Q"     "N.Q.new"
  • Thank you so much for the effort @Rui Narrated! Works Perfectly! as I mentioned my original dataframe has more than 200 columns. Also I should group by the first 5 characters. How I would modify your answer? sp <- split(nms, substr(nms, 1, 5))&#xA;nms <- lapply(sp, function(s) c(substr(s[1], 1, 1), s))&#xA;nms <- unlist(nms)&#xA;result <- finaldf[nms] But I get the following message: Error in [.data frame.(finaldf.1, nms.new) : undefined columns selected

  • 1

    @Laura Can you give an example of the names with the first 5 characters? Preferably by editing the question, please.

  • edited it. It’s about those names I should work on, from my original dataframe. Now the rule changes, when I separate I must separate before and after the point. If you are going to repair have names that have 5 letters before the point and names with 6 letters before the point. To get the spI did so: split(nms.new, gsub("\\.new$", "", nms.new)). Then to get the nms enter the question before the point has 5 letters or 6 letters. This harms the rest of the code.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.