How to delete values in a text column from a csv file in R?

Asked

Viewed 39 times

-2

I am learning R and need help with CSV files. I can read the file but need to perform operations with a few columns:

The file is like this:

1   {date:2018-08-01    state:RN    store_id:3162633    sale_id:326463633336323 off_product_id:613665646663346  quantity:1  price:229.0 customer_id:null}
2   {date:2018-08-01    state:RN    store_id:3162633    sale_id:333331346330323 off_product_id:343739666265353  quantity:1  price:179.0 customer_id:null}
3   {date:2018-08-01    state:RN    store_id:3162633    sale_id:393461383633336 off_product_id:373034633762613  quantity:1  price:269.0 customer_id:null}
4   {date:2018-08-01    state:RN    store_id:3162633    sale_id:616531626536373 off_product_id:343634373062363  quantity:1  price:29.0  customer_id:null}
5   {date:2018-08-01    state:RN    store_id:3162633    sale_id:626462353638383 off_product_id:616339353035393  quantity:1  price:459.0 customer_id:null}
6   {date:2018-08-01    state:RN    store_id:3162633    sale_id:653363623539323 off_product_id:346165376638356  quantity:1  price:369.0 customer_id:null}

But I want to leave it this way:

inserir a descrição da imagem aqui

  • Add the shape you leave directly to the question. (Transfer the image)

1 answer

0


dados <- read.table(text = 
  "1   {date:2018-08-01    state:RN    store_id:3162633    sale_id:326463633336323 off_product_id:613665646663346  quantity:1  price:229.0 customer_id:null}
   2   {date:2018-08-01    state:RN    store_id:3162633    sale_id:333331346330323 off_product_id:343739666265353  quantity:1  price:179.0 customer_id:null}
   3   {date:2018-08-01    state:RN    store_id:3162633    sale_id:393461383633336 off_product_id:373034633762613  quantity:1  price:269.0 customer_id:null}
   4   {date:2018-08-01    state:RN    store_id:3162633    sale_id:616531626536373 off_product_id:343634373062363  quantity:1  price:29.0  customer_id:null}
   5   {date:2018-08-01    state:RN    store_id:3162633    sale_id:626462353638383 off_product_id:616339353035393  quantity:1  price:459.0 customer_id:null}
   6   {date:2018-08-01    state:RN    store_id:3162633    sale_id:653363623539323 off_product_id:346165376638356  quantity:1  price:369.0 customer_id:null}",
  stringsAsFactors = FALSE)

# Remove a primeira coluna (desnecessária)
dados <- dados[, !(names(dados) %in% "V1")]

# Remove os colchetes
dados[] <- lapply(dados, gsub, pattern = "\\{|\\}", replacement = "")

# Pega os nomes das colunas
colnames(dados) <- gsub(":.*$", "", dados[1,])

# Remove caracteres até ":"
dados[] <- lapply(dados, gsub, pattern = ".*:", replacement = "")


> dados
        date state store_id         sale_id  off_product_id quantity price customer_id
1 2018-08-01    RN  3162633 326463633336323 613665646663346        1 229.0        null
2 2018-08-01    RN  3162633 333331346330323 343739666265353        1 179.0        null
3 2018-08-01    RN  3162633 393461383633336 373034633762613        1 269.0        null
4 2018-08-01    RN  3162633 616531626536373 343634373062363        1  29.0        null
5 2018-08-01    RN  3162633 626462353638383 616339353035393        1 459.0        null
6 2018-08-01    RN  3162633 653363623539323 346165376638356        1 369.0        null

All data will be in format Character; you will have the additional job of converting to the appropriate formats.

JSON file

By the appearance of your raw data, you have a JSON file that was erroneously saved as CSV. If this is the case and you are the one capturing the data, fix the script to save in json format. The reading can be done in a simple way and preserving the formats with the package jsonlite:

library(jsonlite)

dados.json <- '[{"date":"2018-08-01","state":"RN","store_id":3162633,"sale_id":326463633336323,"off_product_id":613665646663346,"quantity":1,"price":229.0,"customer_id":"null"},
                {"date":"2018-08-01","state":"RN","store_id":3162633,"sale_id":333331346330323,"off_product_id":343739666265353,"quantity":1,"price":179.0,"customer_id":"null"},
                {"date":"2018-08-01","state":"RN","store_id":3162633,"sale_id":393461383633336,"off_product_id":373034633762613,"quantity":1,"price":269.0,"customer_id":"null"},
                {"date":"2018-08-01","state":"RN","store_id":3162633,"sale_id":616531626536373,"off_product_id":343634373062363,"quantity":1,"price":29.0,"customer_id":"null"},
                {"date":"2018-08-01","state":"RN","store_id":3162633,"sale_id":626462353638383,"off_product_id":616339353035393,"quantity":1,"price":459.0,"customer_id":"null"},
                {"date":"2018-08-01","state":"RN","store_id":3162633,"sale_id":653363623539323,"off_product_id":346165376638356,"quantity":1,"price":369.0,"customer_id":"null"}]'

dados <- fromJSON(dados.json)


> dados
        date state store_id      sale_id off_product_id quantity price customer_id
1 2018-08-01    RN  3162633 3.264636e+14   6.136656e+14        1   229        null
2 2018-08-01    RN  3162633 3.333313e+14   3.437397e+14        1   179        null
3 2018-08-01    RN  3162633 3.934614e+14   3.730346e+14        1   269        null
4 2018-08-01    RN  3162633 6.165316e+14   3.436344e+14        1    29        null
5 2018-08-01    RN  3162633 6.264624e+14   6.163394e+14        1   459        null
6 2018-08-01    RN  3162633 6.533636e+14   3.461654e+14        1   369        null

> str(dados)
'data.frame':   6 obs. of  8 variables:
$ date          : chr  "2018-08-01" "2018-08-01" "2018-08-01" "2018-08-01" ...
$ state         : chr  "RN" "RN" "RN" "RN" ...
$ store_id      : int  3162633 3162633 3162633 3162633 3162633 3162633
$ sale_id       : num  3.26e+14 3.33e+14 3.93e+14 6.17e+14 6.26e+14 ...
$ off_product_id: num  6.14e+14 3.44e+14 3.73e+14 3.44e+14 6.16e+14 ...
$ quantity      : int  1 1 1 1 1 1
$ price         : num  229 179 269 29 459 369
$ customer_id   : chr  "null" "null" "null" "null" ...
  • Thank you very much

Browser other questions tagged

You are not signed in. Login or sign up in order to post.