read txt file with less than 5 elements using read.table

Asked

Viewed 66 times

2

I am trying to read the txt file with two columns below:

+-----------------------------------------------------------------------------+
|                      Category Information                        |    square|
| #|description                                                    |     miles|
|-----------------------------------------------------------------------------|
| 3| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.096540|
| 4| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 14.719017|
|15| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  4.763791|
|19| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.002395|
|21| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.780825|
|25| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.087930|
|33| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.484098|
|-----------------------------------------------------------------------------|
|TOTAL                                                             | 24.934597|
+-----------------------------------------------------------------------------+

I’m using the following line of code:

rawdata<-read.table("1986.txt", sep = "|",skip = 5)

But it reads nothing and returns that there is not a minimum of 5 elements.

2 answers

4

Knowing what is in the file, the following works. It is absolutely nothing general.

dados <- read.table(file = "Artur.txt", 
                    sep = "|", comment.char = "+", 
                    skip = 4, fill = TRUE)

dados <- dados[!sapply(dados, function(x) all(is.na(x)))]
dados <- dados[apply(dados, 1, function(x) !any(grepl("----", x))), ]
dados$V4[nrow(dados)] <- as.numeric(as.character(dados$V3[nrow(dados)]))
dados <- dados[-2]
dados$V2 <- droplevels(dados$V2)
dados$V2 <- trimws(as.character(dados$V2))
names(dados) <- c("number", "sq.miles")

dados
#  number  sq.miles
#1      3  2.096540
#2      4 14.719017
#3     15  4.763791
#4     19  0.002395
#5     21  2.780825
#6     25  0.087930
#7     33  0.484098
#9  TOTAL 24.934597

3


Reproducing the problem:

tf <- tempfile()
write.table(
  "+-----------------------------------------------------------------------------+
   |                      Category Information                        |    square|
   | #|description                                                    |     miles|
   |-----------------------------------------------------------------------------|
   | 3| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.096540|
   | 4| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 14.719017|
   |15| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  4.763791|
   |19| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.002395|
   |21| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.780825|
   |25| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.087930|
   |33| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.484098|
   |-----------------------------------------------------------------------------|
   |TOTAL                                                             | 24.934597|
   +-----------------------------------------------------------------------------+",
   tf, row.names = FALSE, col.names = FALSE)

rawdata <- read.table(tf, sep = "|", skip = 5)

Error in scan(file = file, what = what, Sep = Sep, quote = quote, Dec = Dec, : line 7 Did not have 5 Elements

The file has some problems.

  1. There are lines that do not contain tabular information (|---...---). When the finds this row, it does not find there the same 5 columns that was found in the previous rows and plays the error.
  2. Character witness "#": it is read by default as a comment on read.table().
  3. The last line, with the total, does not meet the standard of the rest of the file (no | after "TOTAL"

In addition the | initial and final do not add any information to the table and generate two useless columns when the data is read

To solve the reading of this data I see at least two possible paths.

r-base

A way is to read the dice with readLines(), remove these bothersome lines and then pass the "clean" data to the read.table()

txt <- readLines(tf)
limpo <- txt[! grepl("----|TOTAL", txt)]
rawdata <- read.table(text = limpo, sep = "|", skip = 1, comment.char = "")
rawdata

  V1 V2                                                              V3         V4 V5
1 NA  # description                                                          miles NA
2 NA  3  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    2.096540 NA
3 NA  4  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   14.719017 NA
4 NA 15  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    4.763791 NA
5 NA 19  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    0.002395 NA
6 NA 21  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    2.780825 NA
7 NA 25  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    0.087930 NA
8 NA 33  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    0.484098 NA

tidyverse

In the the package to read files is the . Using it we would have:

library(tidyverse)
rawdata2 <- read_delim(tf, "|", skip = 2, comment = "----")
rawdata2 %>% filter(!is.na(` #`))

# A tibble: 8 x 5
  `   ` ` #`                        `description                   ~ `     miles` X5   
  <chr> <chr>                       <chr>                            <chr>        <chr>
1 "   " " 3"                        " . . . . . . . . . . . . . . .~ "  2.096540" NA   
2 "   " " 4"                        " . . . . . . . . . . . . . . .~ " 14.719017" NA   
3 "   " 15                          " . . . . . . . . . . . . . . .~ "  4.763791" NA   
4 "   " 19                          " . . . . . . . . . . . . . . .~ "  0.002395" NA   
5 "   " 21                          " . . . . . . . . . . . . . . .~ "  2.780825" NA   
6 "   " 25                          " . . . . . . . . . . . . . . .~ "  0.087930" NA   
7 "   " 33                          " . . . . . . . . . . . . . . .~ "  0.484098" NA   
8 "   " "TOTAL                    ~ " 24.934597"                     NA           NA 

Note that in both cases the tables are not equal, because in the second case the row with total can be maintained.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.