Split Column into Other Two Lat Long

Asked

Viewed 75 times

4

I have the following datasheets:

  BIRDS.GENERO         SP1        SP2                       XLOCAL
  Thamnomanes      caesius    glaucus                0°32'S52°12'W 
  Thamnomanes   ardesiacus  obidensis                0°10'S51°50'W
  Thamnomanes      caesius    glaucus         00°53'44"N52°00'08"W 
  Thamnomanes   ardesiacus  obidensis         00°53'44"N52°00'08"W 
  Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W
  Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W
  Thamnomanes      caesius hoffmannsi   03°58'14,82"S49°52'56,37"W
  Thamnomanes      caesius hoffmannsi   03°58'14,82"S49°52'56,37"W
  Thamnomanes      caesius hoffmannsi     03°31'46,9"S51°44'05,2"W
  Thamnomanes      caesius hoffmannsi     03°31'46,9"S51°44'05,2"W

I’d like to split the column XLOCAL in Latitude and Longitude. Thus:

 BIRDS.GENERO         SP1        SP2             XLOCAL                   Lat            Long
 Thamnomanes      caesius    glaucus                0°32'S52°12'W         0°32'S        52°12'W
 Thamnomanes   ardesiacus  obidensis                0°10'S51°50'W         0°10'S        51°50'W
 Thamnomanes      caesius    glaucus         00°53'44"N52°00'08"W     00°53'44"N     52°00'08"W
 Thamnomanes   ardesiacus  obidensis         00°53'44"N52°00'08"W     00°53'44"N     52°00'08"W 
 Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W        01°40'N        51°23'W
 Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W        01°40'N        51°23'W
 Thamnomanes      caesius hoffmannsi   03°58'14,82"S49°52'56,37"W  03°58'14,82"S  49°52'56,37"W 
 Thamnomanes      caesius hoffmannsi   03°58'14,82"S49°52'56,37"W  03°58'14,82"S  49°52'56,37"W
 Thamnomanes      caesius hoffmannsi     03°31'46,9"S51°44'05,2"W   03°31'46,9"S   51°44'05,2"W
 Thamnomanes      caesius hoffmannsi     03°31'46,9"S51°44'05,2"W   03°31'46,9"S   51°44'05,2"W

Searching the forum I found something like this:

dados$lat<- str_sub(dados$XLOCAL, end = 6)
head(x)
dados$long<- str_sub(dados$XLOCAL, start  = 7)

As you can see there is a large variation in the size of the information. So the above argument does not work.

Thanks in advance.

  • 1

    Your script has to be smart enough to search for the letter S. If you find copy from start to S and from the character after S to the end. If not, copy from the beginning to the letter N and after the N until the end.

2 answers

5


You can use strsplit to separate the coordinates, taking advantage of the "look back" option of regular expressions. Because strsplit returns a list, the most practical is to first throw the result to an object and then include each column in the original date.frame.

dados <- read.table(text = c(
 "BIRDS.GENERO         SP1        SP2                       XLOCAL
  Thamnomanes      caesius    glaucus                0°32'S52°12'W 
  Thamnomanes   ardesiacus  obidensis                0°10'S51°50'W
  Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W
  Thamnomanes   ardesiacus  obidensis               01°40'N51°23'W"),
  header = TRUE, stringsAsFactors = FALSE
)

listaC <- strsplit(dados$XLOCAL, "(?<=N|S)", perl = TRUE)
dados$Lat  <- unlist(lapply(listaC, '[', 1))
dados$Long <- unlist(lapply(listaC, '[', 2))

> dados
  BIRDS.GENERO        SP1       SP2         XLOCAL     Lat    Long
1  Thamnomanes    caesius   glaucus  0°32'S52°12'W  0°32'S 52°12'W
2  Thamnomanes ardesiacus obidensis  0°10'S51°50'W  0°10'S 51°50'W
3  Thamnomanes ardesiacus obidensis 01°40'N51°23'W 01°40'N 51°23'W
4  Thamnomanes ardesiacus obidensis 01°40'N51°23'W 01°40'N 51°23'W

2

I couldn’t get a direct solution, I used the function separate package tidyverse twice and made some manipulations on the resulting variables to get the result you expect.

Being dados your bank, follows code:

library(tidyverse)

dados %>% 
  separate("XLOCAL", into = c("Lat1", "Long1"), sep = "N", remove = F, extra = "drop", fill = "right") %>% 
  separate("XLOCAL", into = c("Lat2", "Long2"), sep = "S", remove = F, extra = "drop", fill = "right") %>% 
  mutate( Lat1 = paste0(Lat1, "N"), Lat2 = paste0(Lat2, "S") ) %>% 
  mutate( Long2 = ifelse(is.na(Long2), "", Long2), Long1 = ifelse(is.na(Long1), "", Long1) ) %>% 
  mutate( Lat = ifelse(Long1 == "", Lat2, Lat1), Long = paste0(Long2, Long1) ) %>% 
  select(BIRDS.GENERO:XLOCAL, Lat, Long)
  • the first separate is breaking XLOCAL, in two variables (Lat1 and Long1), by the separator N;
  • the second separate is breaking XLOCAL, in two variables (Lat2 and Long2), by the separator S;
  • the first mutate is including the indicators N and S the variables Lat1 and Lat2, respectively, since the function separate excludes used separators;
  • the second mutate is modifying what is NA to a blank field;
  • the third mutate is creating the variables Lat and Long using the variables "auxiliary" Lat1, Lat2, Long1, Long2;
  • finally, the function of select is selecting the variables of interest

Browser other questions tagged

You are not signed in. Login or sign up in order to post.