How to compare information from one table to another table

Asked

Viewed 93 times

-1

I have a very large spreadsheet in csv format with about 8 million lines and several columns, among them a column with people’s phone. I also have another spreadsheet, in excel, with 2 thousand lines with people’s phone. I need to generate a third spreadsheet, with the phones that appear in the two spreadsheets simultaneously. How do I do that?

  • Your question is very vague and may not even belong to Stackoverflow if you intend to use spreadsheet to do this. If it’s a spreadsheet - is it Excel or another product? If you want to do it using code - what programming language do you know and want to use?

1 answer

1

Use the function inner_join package dplyr:

df1 <- data.frame(nome = c("Ana", "Bernardo", "Carlos"),
                  telefone = c("123", "456", "789"),
                  altura = c(1.70, 1.75, 1.80))

##       nome telefone altura
## 1      Ana      123   1.70
## 2 Bernardo      456   1.75
## 3   Carlos      789   1.80


df2 <- data.frame(nome = c("Bernardo", "Carlos", "Daniel"),
                  telefone = c("456", "789", "555"),
                  peso = c(75, 80, 70))

##       nome telefone peso
## 1 Bernardo      456   75
## 2   Carlos      789   80
## 3   Daniel      555   70

library(dplyr)

inner_join(df1, df2, by = "telefone")
    nome.x telefone altura   nome.y peso
1 Bernardo      456   1.75 Bernardo   75
2   Carlos      789   1.80   Carlos   80
Warning message:
Column `telefone` joining factors with different levels, coercing to character vector

Browser other questions tagged

You are not signed in. Login or sign up in order to post.