Creating a dataframe based on two other dataframes using dplyr in R

Question

Creating a dataframe based on two other dataframes using dplyr in R

Asked 6 years, 9 months ago

Viewed 136 times

4

These are my dataframes:

df<-    as.data.frame(matrix(rexp(200),, 25))
colnames(df)<-c("A","B","C","D","E","F","G","H","I","J",
        "K","L","M","N","O","P","Q","R","S","T",
        "U","V","X","Z","W")


df.new<-as.data.frame(matrix(rexp(200),,20))
colnames(df.new)<-c("A.B","A.B.new",
            "A.C","A.C.new",
            "A.F","A.F.new",
                "B.C","B.C.new",
                    "C.D","C.D.new",
                    "F.G","F.G.new",
                "H.I","H.I.new",
                    "H.K","H.K.new",
                    "L.M","L.M.new",
            "N.Q","N.Q.new")
df.new<-df.new[-c(9:10),]


finaldf<-cbind(df,df.new)
colnames(finaldf)<-c("A","B","C","D","E","F","G","H","I","J",
        "K","L","M","N","O","P","Q","R","S","T",
        "U","V","X","Z","W","A.B","A.B.new",
            "A.C","A.C.new",
            "A.F","A.F.new",
                "B.C","B.C.new",
                    "C.D","C.D.new",
                    "F.G","F.G.new",
                "H.I","H.I.new",
                    "H.K","H.K.new",
                    "L.M","L.M.new",
            "N.Q","N.Q.new")

I would like to make the dataframe finaldf sort the columns like this:

colnames(finaldf)<-c("A","A.B","A.B.new",
            "A","A.C","A.C.new",
            "A","A.F","A.F.new",
            "B","B.C","B.C.new",
            "C","C.D","C.D.new",
            "F","F.G","F.G.new",
            "H","H.I","H.I.new",
            "H","H.K","H.K.new",
            "L","L.M","L.M.new",
            "N","N.Q","N.Q.new")

As my original dataframe is much larger, I must need a more robust code, which escapes my ability in view q I am new in R.

Note that the idea is simply to take columns from the dataframe df and play on the dataframe df.new. But these columns must fit the order established by the dataframe df.new.

And I would like to do this using dplyr package. It is possible?

Edited:

Well, my original code comes with the names of the Bovespa steels:

nms.new<-c("ABEV3.BBAS3", "ABEV3.BBAS3.new", "ABEV3.BRAP4", "ABEV3.BRAP4.new", 
"ABEV3.BRKM5", "ABEV3.BRKM5.new", "ABEV3.CSAN3", "ABEV3.CSAN3.new", 
"ABEV3.CSNA3", "ABEV3.CSNA3.new", "ABEV3.CYRE3", "ABEV3.CYRE3.new", 
"ABEV3.DTEX3", "ABEV3.DTEX3.new", "ABEV3.ELPL4", "ABEV3.ELPL4.new", 
"ABEV3.EVEN3", "ABEV3.EVEN3.new", "ABEV3.FIBR3", "ABEV3.FIBR3.new", 
"ABEV3.GGBR4", "ABEV3.GGBR4.new", "ABEV3.GOAU4", "ABEV3.GOAU4.new", 
"ABEV3.HYPE3", "ABEV3.HYPE3.new", "ABEV3.JBSS3", "ABEV3.JBSS3.new","SANB11.BBAS3","SANB11.BBAS3.new")

I would need to separate what comes before and after the point and not 1 in 1. Sometimes comes 5 letters before the point, other times comes 6 letters.

The column A appears 3 times, is that right? Repeated before each A.C and A.F? And the same for B, C, etc.?

– Rui Barradas

2018/09/23 at 10:46
Exact @Ruibarradas

– Laura

2018/09/23 at 13:25

1 answer

Browser other questions tagged r dplyr

You are not signed in. Login or sign up in order to post.

by Rui Barradas • **15,422** points · Answer 1 · 2018-09-23T15:43:29+00:00

Here it comes.
The trick is to group by the first characters of names(finaldf) and then process the group list.

First we chose the names that matter.

nms <- names(finaldf)
nms <- nms[sapply(nms, nchar) > 1]

Now group by the first 3 characters.

sp <- split(nms, substr(nms, 1, 3))
nms <- lapply(sp, function(s) c(substr(s[1], 1, 1), s))
nms <- unlist(nms)
result <- finaldf[nms]

The result result has the dimensions of the question.

dim(result)
#[1]  8 30

But beware, as there can be no repeated names, there will be A, A.1 and A.2. And the same thing for any others where there are repetitions, for example H.I and H.K.

names(result)
# [1] "A"       "A.B"     "A.B.new" "A.1"     "A.C"     "A.C.new"
# [7] "A.2"     "A.F"     "A.F.new" "B"       "B.C"     "B.C.new"
#[13] "C"       "C.D"     "C.D.new" "F"       "F.G"     "F.G.new"
#[19] "H"       "H.I"     "H.I.new" "H.1"     "H.K"     "H.K.new"
#[25] "L"       "L.M"     "L.M.new" "N"       "N.Q"     "N.Q.new"