Change <Chr> to number in R

Asked

Viewed 1,524 times

5

My dear, I am trying to convert the data of columns 4 and 5 to number, but you are giving this error. Any suggestions? Thanks in advance!

Erro ao converter <chr>  para  número no R

library(tidyverse)

dadosarrumados <- data_frame(
  Região = c("Brasil", "Norte", "Rondônia", "Acre", "Amazonas"),
  Total = c(102083, 6715, 711, 285, 1597),
  `Anos de estudo` = rep("menor que 4 anos", 5),
  Quantidade = c("5068.075", "348.574", "42.42", "18.042", "73.231"),
  Porcentagem = c("5", "5.2", "6", "6.3", "4.")
)

as.numeric(dadosarrumados[, c(4, 5)])

Error: (list) Object cannot be coerced to type 'double'

  • 2

    Try this: library(tidyverse) dadosarrumados%>%mutate_at(c('Quantidade','Porcentagem'),as.numeric). Note that dbl is a numeric variable.

  • 2

    I suggest you edit your question with this command on r: dput(dadosarrumados) and post here. If your dice are too big: dput(head(dadosarrumados)). This will show the closure of your question.

  • 1

    Welcome to Stackoverflow! Take a look at how to improve your next question to make it easier to help you.

2 answers

5

I would do so:

library(tidyverse)
dadosarrumados %>% 
  mutate_at(vars(Quantidade, Porcentagem), parse_number)

# A tibble: 5 x 5
Região    Total `Anos de estudo` Quantidade Porcentagem
<chr>     <dbl> <chr>                 <dbl>       <dbl>
  1 Brasil   102083 menor que 4 anos     5068.          5  
2 Norte      6715 menor que 4 anos      349.          5.2
3 Rondônia    711 menor que 4 anos       42.4         6  
4 Acre        285 menor que 4 anos       18.0         6.3
5 Amazonas   1597 menor que 4 anos       73.2         4

The advantage of using parse_number instead of as.numeric is that it has several other options, for example specifying which is decimal separator and which is thousands:

> parse_number(c("1,10"), locale = locale(decimal_mark = ","))
[1] 1.1

> as.numeric("1,1")
[1] NA
Warning message:
NAs introduced by coercion

In addition to functioning in other contexts:

> parse_number("1%")
[1] 1
> as.numeric("1%")
[1] NA
Warning message:
NAs introduced by coercion

A possible problem in your case is that missing values came with some unwanted character rather than empty, it may be a . or something like that. In this case you could use the argument na of parse_number and do so:

dadosarrumados %>% 
  mutate_at(vars(Quantidade, Porcentagem), ~parse_number(.x, na = c(".")))

note parse_number is a package function readr that’s inside the tidyverse.

  • This solution is even more elegant.

  • Some reason for the "." be inside the c() in the last code?

  • it’s as if the . is the character they used for Missing in the spreadsheet

  • That is the reason to be passed to the argument na, not to be inside a c( ). As far as I can tell c(".") == "."

  • ah yes! then no reason, just remember that you can pass a vector there.

4

The problem is that the R understands that dadosarrumados[, c(4, 5)] is a list:

is.list(dadosarrumados[, c(4, 5)])
[1] TRUE

One way to solve this problem is to undo the list and then convert to numeric:

as.numeric(unlist(dadosarrumados[, c(4, 5)]))
[1] 5068.075  348.574   42.420   18.042   73.231    5.000    5.200    6.000
[9]    6.300    4.000

But see that we got out of one problem and fell into another: we lost the formatting that was in two columns. The function unlist transformed the data set into a vector. We could transform this vector into data frame, but I prefer another approach.

Use the function apply. It is used to apply other functions in columns or rows of data frames. For example, when running

apply(dadosarrumados[, c(4, 5)], 2, as.numeric)
     Quantidade Porcentagem
[1,]   5068.075         5.0
[2,]    348.574         5.2
[3,]     42.420         6.0
[4,]     18.042         6.3
[5,]     73.231         4.0

I am saying to the R apply (apply) the function as.numeric in the columns (number 2) of data frame dadosarrumados[, c(4, 5)]. If I had used 1 instead of 2 in the second argument of apply, the function as.numeric would have been applied on the lines and then we would not have had the desired result.

One way to get the complete data frame, with the columns converted to numeric, is to do so:

bind_cols(dadosarrumados[, 1:3],
          as_data_frame(apply(dadosarrumados[, c(4, 5)], 2, as.numeric)))
# A tibble: 5 x 5
  Região    Total `Anos de estudo` Quantidade Porcentagem
  <chr>     <dbl> <chr>                 <dbl>       <dbl>
1 Brasil   102083 menor que 4 anos     5068.          5  
2 Norte      6715 menor que 4 anos      349.          5.2
3 Rondônia    711 menor que 4 anos       42.4         6  
4 Acre        285 menor que 4 anos       18.0         6.3
5 Amazonas   1597 menor que 4 anos       73.2         4

I’m using the function bind_cols to join two data frames: the original, from columns 1 to 3, and the resulting conversion we did above.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.