Transformation of a dataframe column into date returns NA

Asked

Viewed 48 times

0

I’ve encountered very similar problems, but none of them got where I am. I’m trying to read a CSV file and turn it into Time Series.

download.file('https://drive.google.com/u/3/uc?id=1hAQfAKXmzwWv0ZEcE2j_kiUcXplHLOSTB&export=download', 
destfile = 'STP.csv')

STP = read.csv2('STP.csv',header = FALSE,';', col.names = c('Data', 'Indice'), skip = 1)

Here I created the object in Dataframe that generated this spreadsheet (actually the spreadsheet has 336 entries but here are the first 20)

inserir a descrição da imagem aqui

STP$Data=as.Date(STPData, format = '%m/%Y')

When I do this my dataframe looks like this:

inserir a descrição da imagem aqui

If I try without the format it says that the format is not unambiguous.

I spent a few hours searching in Google similar cases and I can’t understand what happens. I’ve used sapply(STP, class) before and after. Before it says that the date column is of type Character and then, when NA, as date.

  • Welcome to Stackoverflow! Unfortunately, this question cannot be reproduced by anyone trying to answer it. Instead of sharing the file via Google Drive, take a look at this link (mainly in the use of function dput) and see how to ask a reproducible question in R. So, people who wish to help you will be able to do this in the best possible way.

  • Use dput or simulate the data, your question does not exactly depend on your spreadsheet. And remove the references to the readr library, it is not used in the code posted.

  • The error comes from the date only having month and year, need one day. Try, for example day 1. paste0("1/", STP$Data).

1 answer

3

@Rui-Arradas already indicated the problem in the comments, I will extend the answer.

A date is a single day, the Dates class requires that day, month, and year be specified; partial formats are used for display only. Without the day, it is not possible to determine the date; the object will be of the date class, but the value will be set as not available (NA):

as.Date("11/2011", format = "%m/%Y")
#> [1] NA

as.Date("11/11/2011", format = "%d/%m/%Y")
#> [1] "2011-11-11"

If your data is monthly, you can 1) use a fixed day to generate dates or 2) use a class made to store monthly data (e.g. yearmon, from the zoo package):

datas <- paste0(10:12, "/2011")

as.Date(paste0("01/", datas), format = "%d/%m/%Y")
[1] "2011-10-01" "2011-11-01" "2011-12-01"

zoo::as.yearmon(datas, format = "%m/%Y")
#> [1] "out 2011" "nov 2011" "dez 2011"

If you are going to use the time series class and your data is already ordered and regular, you do not need the date vector, just indicate the start and frequency:

ts(1:20, start = c(2011, 11), frequency = 12)
#>      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 2011                                           1   2
#> 2012   3   4   5   6   7   8   9  10  11  12  13  14
#> 2013  15  16  17  18  19  20                        

Browser other questions tagged

You are not signed in. Login or sign up in order to post.