What is the best way to perform a descriptive analysis of birth dates in R?

Asked

Viewed 87 times

3

I have in my data table a column of birthdates that are in the English model, but I don’t know how to analyze it descriptively nor what resources to use of the R.

Should I use a bar chart or histogram? How would I make R understand that they are dates? I tried to risk doing that:

x = as.Date(rehab.1$Data.Nascimento)

hist(x, main = "Data de Nascimento", breaks = "years",axes = TRUE, xlab = "data", ylab = "Frequência Absoluta", col = "green")

But what came out wasn’t cool:

inserir a descrição da imagem aqui

  • What kind of data rehab.1$Data.Nascimento? You can show an example of these values (i.e., head(rehab.1$Data.Nascimento)?)

  • head(Rehab.1$Birth date) [1] 1/10/1953 6/1/1941 4/11/1941 5/2/1946 6/18/1938 7/12/1941 376 Levels: 1/1/1924 1/10/1953 1/11/1924 1/11/1950 1/15/1923 1/16/1936 ... 9/8/1941

1 answer

3


Your code has some problems:

  • The function as.Date assumes the date is in Y/m/d format, and its date is in m/d/Y. Change the first call to x <- as.Date(rehab.1$Data.Nascimento, "%m/%d/%Y")
  • What information do you want to show on your histogram? How many people were born each year? Each month? Depending on what you want, you will use different methods.

For example, to show the number of births per year, you can use the method you used (hist, with breaks = "years"), as in the code below.

randomDates <- function(N, st = "1/1/1920", et = "12/31/2015") {
    st <- as.POSIXlt(as.Date(st, "%m/%d/%Y"))
    et <- as.POSIXlt(as.Date(et, "%m/%d/%Y"))
    dt <- as.numeric(difftime(et, st, unit = "sec"))
    ev <- runif(N, 0, dt)
    st + ev
}
x <- randomDates(1000)
hist(x, freq = TRUE,
     breaks = "years", col = "green",
     xlab = "ano", "ylab" = "Frequencia",
     main = "Ano de nascimento")
  • Thanks Carlos, thank you so much man! D

Browser other questions tagged

You are not signed in. Login or sign up in order to post.