Reproducing the problem:
tf <- tempfile()
write.table(
"+-----------------------------------------------------------------------------+
| Category Information | square|
| #|description | miles|
|-----------------------------------------------------------------------------|
| 3| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 2.096540|
| 4| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 14.719017|
|15| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 4.763791|
|19| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 0.002395|
|21| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 2.780825|
|25| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 0.087930|
|33| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 0.484098|
|-----------------------------------------------------------------------------|
|TOTAL | 24.934597|
+-----------------------------------------------------------------------------+",
tf, row.names = FALSE, col.names = FALSE)
rawdata <- read.table(tf, sep = "|", skip = 5)
Error in scan(file = file, what = what, Sep = Sep, quote = quote, Dec = Dec, :
line 7 Did not have 5 Elements
The file has some problems.
- There are lines that do not contain tabular information (
|---...---
).
When the r finds this row, it does not find there the same 5 columns that was found in the previous rows and plays the error.
- Character witness
"#"
: it is read by default as a comment on read.table()
.
- The last line, with the total, does not meet the standard of the rest of the file (no
|
after "TOTAL"
In addition the |
initial and final do not add any information to the table and generate two useless columns when the data is read
To solve the reading of this data I see at least two possible paths.
r-base
A way is to read the dice with readLines()
, remove these bothersome lines and then pass the "clean" data to the read.table()
txt <- readLines(tf)
limpo <- txt[! grepl("----|TOTAL", txt)]
rawdata <- read.table(text = limpo, sep = "|", skip = 1, comment.char = "")
rawdata
V1 V2 V3 V4 V5
1 NA # description miles NA
2 NA 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.096540 NA
3 NA 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.719017 NA
4 NA 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.763791 NA
5 NA 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.002395 NA
6 NA 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.780825 NA
7 NA 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.087930 NA
8 NA 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.484098 NA
tidyverse
In the tidyverse the package to read files is the reader. Using it we would have:
library(tidyverse)
rawdata2 <- read_delim(tf, "|", skip = 2, comment = "----")
rawdata2 %>% filter(!is.na(` #`))
# A tibble: 8 x 5
` ` ` #` `description ~ ` miles` X5
<chr> <chr> <chr> <chr> <chr>
1 " " " 3" " . . . . . . . . . . . . . . .~ " 2.096540" NA
2 " " " 4" " . . . . . . . . . . . . . . .~ " 14.719017" NA
3 " " 15 " . . . . . . . . . . . . . . .~ " 4.763791" NA
4 " " 19 " . . . . . . . . . . . . . . .~ " 0.002395" NA
5 " " 21 " . . . . . . . . . . . . . . .~ " 2.780825" NA
6 " " 25 " . . . . . . . . . . . . . . .~ " 0.087930" NA
7 " " 33 " . . . . . . . . . . . . . . .~ " 0.484098" NA
8 " " "TOTAL ~ " 24.934597" NA NA
Note that in both cases the tables are not equal, because in the second case the row with total can be maintained.