Convert a factor (of real numbers) into a numerical vector

Asked

Viewed 55 times

1

I know there’s such a thing question. But in my case, there’s a difference:

  • want to convert the factor into a numerical vector without removing the separator (either semicolon).

Dice:

df <- structure(list(Data = structure(1:43, .Label = c("2009 T1", "2009 T2", 
"2009 T3", "2009 T4", "2010 T1", "2010 T2", "2010 T3", "2010 T4", 
"2011 T1", "2011 T2", "2011 T3", "2011 T4", "2012 T1", "2012 T2", 
"2012 T3", "2012 T4", "2013 T1", "2013 T2", "2013 T3", "2013 T4", 
"2014 T1", "2014 T2", "2014 T3", "2014 T4", "2015 T1", "2015 T2", 
"2015 T3", "2015 T4", "2016 T1", "2016 T2", "2016 T3", "2016 T4", 
"2017 T1", "2017 T2", "2017 T3", "2017 T4", "2018 T1", "2018 T2", 
"2018 T3", "2018 T4", "2019 T1", "2019 T2", "2019 T3"), class = "factor"), 
    confianca = structure(c(17L, 24L, 37L, 40L, 38L, 36L, 41L, 
    39L, 35L, 27L, 28L, 31L, 32L, 29L, 30L, 36L, 34L, 23L, 22L, 
    25L, 19L, 15L, 21L, 18L, 5L, 1L, 3L, 2L, 4L, 8L, 14L, 10L, 
    13L, 11L, 7L, 9L, 12L, 6L, 16L, 33L, 26L, 20L, 22L), .Label = c("37,9", 
    "38,7", "38,8", "39,8", "40,2", "40,3", "40,6", "41,4", "41,6", 
    "41,8", "42", "42,5", "42,8", "43,3", "45,1", "45,3", "45,5", 
    "46,1", "46,4", "47", "47,2", "47,3", "47,5", "47,9", "48,3", 
    "48,4", "48,7", "48,8", "49,1", "49,3", "49,4", "49,5", "49,8", 
    "50", "50,2", "50,4", "50,8", "51,2", "51,5", "51,7", "52,3"
    ), class = "factor")), class = "data.frame", row.names = c(NA, 
-43L))

Simple example of what I tried:

library(tidyverse)

df %>% 
  mutate(var = as.numeric(as.character(confianca))) %>% 
  head(20)
      Data confianca var
1  2009 T1      45,5  NA
2  2009 T2      47,9  NA
3  2009 T3      50,8  NA
4  2009 T4      51,7  NA
5  2010 T1      51,2  NA
6  2010 T2      50,4  NA
7  2010 T3      52,3  NA
8  2010 T4      51,5  NA
9  2011 T1      50,2  NA
10 2011 T2      48,7  NA
11 2011 T3      48,8  NA
12 2011 T4      49,4  NA
13 2012 T1      49,5  NA
14 2012 T2      49,1  NA
15 2012 T3      49,3  NA
16 2012 T4      50,4  NA
17 2013 T1        50  50
18 2013 T2      47,5  NA
19 2013 T3      47,3  NA
20 2013 T4      48,3  NA

Warning message: Nas introduced by coercion

The class is converted into numeric:

lapply(df, class)
$Data
[1] "factor"

$confianca
[1] "factor"

$var
[1] "numeric"

But, the converted values (those that have separators) stay with NA. Only if the number is integer (Obseve line 17, in which it has the value 50), it remains. I thought to use some regex to perform the conversion, but I cannot remove the number separator.

3 answers

2


From what I understand, the problem is that you are using a comma separator instead of the dot. Converting a comma to a point, and then to a number vector is an option:

df$var <- as.numeric(gsub(",", ".", gsub("\\.", "", df$confianca)))

> head(df)
# Data confianca  var
# 1  2009 T1      45,5 45.5
# 2  2009 T2      47,9 47.9
# 3  2009 T3      50,8 50.8
# 4  2009 T4      51,7 51.7
# 5  2010 T1      51,2 51.2
# 6  2010 T2      50,4 50.4


> str(df)
'data.frame':   43 obs. of  3 variables:
 $ Data     : Factor w/ 43 levels "2009 T1","2009 T2",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ confianca: Factor w/ 41 levels "37,9","38,7",..: 17 24 37 40 38 36 41 39 35 27 ...
 $ var      : num  45.5 47.9 50.8 51.7 
  • 1

    That’s right, @Willian, I get it now. Thank you.

1

You can use parse_number package readr of tidyverse. Is a as.numeric more flexible.

library(tidyverse)

df %>% 
  mutate(var = parse_number(as.character(confianca), locale = locale(decimal_mark = ","))) %>% 
  head(20)
#>       Data confianca  var
#> 1  2009 T1      45,5 45.5
#> 2  2009 T2      47,9 47.9
#> 3  2009 T3      50,8 50.8
#> 4  2009 T4      51,7 51.7
#> 5  2010 T1      51,2 51.2
#> 6  2010 T2      50,4 50.4
#> 7  2010 T3      52,3 52.3
#> 8  2010 T4      51,5 51.5
#> 9  2011 T1      50,2 50.2
#> 10 2011 T2      48,7 48.7
#> 11 2011 T3      48,8 48.8
#> 12 2011 T4      49,4 49.4
#> 13 2012 T1      49,5 49.5
#> 14 2012 T2      49,1 49.1
#> 15 2012 T3      49,3 49.3
#> 16 2012 T4      50,4 50.4
#> 17 2013 T1        50 50.0
#> 18 2013 T2      47,5 47.5
#> 19 2013 T3      47,3 47.3
#> 20 2013 T4      48,3 48.3

Created on 2019-10-26 by the reprex package (v0.3.0)

0

The @Willian solution meets the order. An alternative would be to resolve with tidyverse:

library(tidyverse)

df %>% 
   mutate(var = as.numeric(str_replace_all(string = confianca, pattern = ',', replacement = '.'))) %>% 
   head(20)

      Data confianca  var
1  2009 T1      45,5 45.5
2  2009 T2      47,9 47.9
3  2009 T3      50,8 50.8
4  2009 T4      51,7 51.7
5  2010 T1      51,2 51.2
6  2010 T2      50,4 50.4
7  2010 T3      52,3 52.3
8  2010 T4      51,5 51.5
9  2011 T1      50,2 50.2
10 2011 T2      48,7 48.7
11 2011 T3      48,8 48.8
12 2011 T4      49,4 49.4
13 2012 T1      49,5 49.5
14 2012 T2      49,1 49.1
15 2012 T3      49,3 49.3
16 2012 T4      50,4 50.4
17 2013 T1        50 50.0
18 2013 T2      47,5 47.5
19 2013 T3      47,3 47.3
20 2013 T4      48,3 48.3

Browser other questions tagged

You are not signed in. Login or sign up in order to post.