Remove NA from a csv in R

Asked

Viewed 99 times

0

Good morning, folks! I’m trying to pull out NA but the code is returning null. Follow my code below:

Note: In case I need to leave only the columns Population and Area (sq. mi.).

rm(list=ls())

dados<-read.csv(file = "C:/Users/lucas/Downloads/projeto/dados_corrigido.csv",header = TRUE,sep = ";",dec = ".",na.strings = "NA")
dados1 <- na.omit(dados[,2:3]) 
str(dados)

# Inicio ------------------------------------------------------------------

summary(dados)
# Country              X.Population.       X.Area..sq..mi...  X.Infant.mortality..per.1000.births.. X.GDP....per.capita.. X.Literacy.....
# Afghanistan   :  1   Min.   :7.026e+03   Min.   :       2   Min.   :  2.29                        Min.   :  500         Min.   : 17.60
# Albania       :  1   1st Qu.:4.376e+05   1st Qu.:    4648   1st Qu.:  8.15                        1st Qu.: 1900         1st Qu.: 70.60
# Algeria       :  1   Median :4.787e+06   Median :   86600   Median : 21.00                        Median : 5550         Median : 92.50
# American Samoa:  1   Mean   :2.874e+07   Mean   :  598227   Mean   : 35.51                        Mean   : 9690         Mean   : 82.84
# Andorra       :  1   3rd Qu.:1.750e+07   3rd Qu.:  441811   3rd Qu.: 55.70                        3rd Qu.:15700         3rd Qu.: 98.00
# Angola        :  1   Max.   :1.314e+09   Max.   :17075200   Max.   :191.19                        Max.   :55100         Max.   :100.00
# (Other)       :221                                          NA's   :3                             NA's   :1             NA's   :18
#   X.Birthrate.    X.Deathrate.    X.Agriculture.       X.Continent.    X.
#  Min.   : 7.29   Min.   : 2.290   Min.   :  0.000   "AFRICA" :57    Mode:logical
#  1st Qu.:12.67   1st Qu.: 5.910   1st Qu.:  0.415   "AMERICA":50    NA's:227
# Median :18.79   Median : 7.840   Median : 54.000   "ASIA"   :56
# Mean   :22.11   Mean   : 9.241   Mean   :103.015   "EUROPE" :43
# 3rd Qu.:29.82   3rd Qu.:10.605   3rd Qu.:163.500   "OCEANIA":21
# Max.   :50.73   Max.   :29.740   Max.   :769.000
# NA's   :3       NA's   :4        NA's   :15

Dice: https://drive.google.com/file/d/1fRuX54rw5NBrxlNcXGR7Kx1TAZ6U3hPu/view

  • Please make your data available so that you can receive help. In the way you put your code, your file csv is in your Downloads folder, so we have no way to replicate them. Use dput(dados) for that reason.

  • Good morning: Here is my bank: [https://drive.google.com/file/d/1fRuX54rw5NBrxlNcXGR7Kx1TAZ6U3hPu/view?usp=sharing]

  • You need to make it publicly available, I cannot access the data through your link.

  • sorry. the file is already public.

  • Your data has none NA in the columns População and Área.

  • then, I believe I made a mess. and I only have to isolate the columns Population and Area to perform Descriptive Statistics

Show 1 more comment

2 answers

1


There are some ways to select the columns you want. I will use two examples, through the package dplyr and by R base.

Bundle dplyr:

library(tidyverse)

dados <- read_csv("~/Downloads/dados_corrigido.csv") %>%
  select(Country, `"Population"`, `"Area (sq. mi.)"`)

summary(dados)

For R base:

library(readr)

dados <- read_csv("~/Downloads/dados_corrigido.csv") 
dados <- dados[,1:3]

summary(dados)

Note: I used the package readr to make a better reading of csv.

-1

you have tried the drop_NA of the tidyr package?
library(dplyr)
library(tidyr)
df <- Tibble(x = c(1, 2, NA), y = c("a", NA, "b"))
df %>% drop_na()

Browser other questions tagged

You are not signed in. Login or sign up in order to post.