file read . XLSX in R

Asked

Viewed 5,899 times

5

What is the problem and/or difference between reading a file in .txt and .xlsx in the R?

Reading in .xlsx can have more problems than in .txt during an analysis ?

A friend asked me to do everything in .txt because it’s better but I don’t understand why.

Another detail is that I have a file on .xlsx with 4 tabs (Sheets) and when I change the tab name in the script it continues reading the previous one. This is due to be Excel ?

2 answers

6


What is the problem and/or difference between reading a file in .txt and .xlsx in R?

Strictly speaking, none. Both are valid ways to store data for analysis, as well as .csv, .sav and .dat are also. The only drawback in using .xlsx requires, almost necessarily, a spreadsheet editor to view the files, while the format .txt can be read pro virtually any program installed on the computer.

Reading in .xlsx can have more problems than in .txt during an analysis?

If reading both files was done correctly, no problems should occur while analyzing the data.

A friend asked me to do everything in .txt because it’s better but I don’t understand why.

See the first answer I gave. Also, it might just be his personal preference. Particularly, I prefer files .txt and .csv because I can read them directly on the terminal without needing additional programs. In addition to, of course, the disk space occupied by files .txt is smaller than the space occupied by files .xlsx (although these days this is not so relevant).

Another detail is that I have a file in . xlsx with 4 tabs (Sheets) and when I change the name of the tab in the script it continues reading the previous one. This is due to be Excel?

I can’t answer this question because I don’t have your code available. So, I can’t evaluate what could be wrong in it or even in the file .xlsx to be read. What I can say is that I use something similar to the code below when I work with people who use Excel and this code, when adapted to the needs of each analysis, works very well, even in files .xlsx with more than one sheet. I just change the parameter sheet=1 for sheet=2 in order to read a different sheet. I do not call them by name, but rather by position within the file .xlsx.

library(readxl)
read_excel("arquivo.xlsx", sheet=1, col_names=TRUE)
read_excel("arquivo.xlsx", sheet=2, col_names=TRUE)

Note that you need to install the package readxl before running the above commands.

  • Dear Marcos Nunes and Marcelo de Andrade. Thank you very much for your answer and enlightening, really helped me a lot. As for the error I reported, it has already been corrected. There was an extra space, which I had not noticed. I already have more than one package for excel, xlsx and the one you told me about. As I am a beginner in R we will probably talk other times and of course vc always helping with direct answers. Thank you very much

  • That’s good to know, Herlon. Since the answer was helpful, consider voting for it as explained in this link, as users of the site are rewarded and feel compelled to continue helping other users.

5

Well there are still some differences that are important in relation to the two formats and in general I would say that your friend is right.

1) txt reading is faster and you don’t depend much on external packages or other languages.

Until recently we didn’t have the package readxl, which is a no wheel hand for reading excel. You had to opt for packages like xlsx or XLconnect or openxlsx or (several other packages)... and each of them has a different external dependency (Java, C++, etc.). This way it was very common to have compatibility issues to read excel files.

Moreover, it is still much faster to read txt files.

2) to save excel files you probably may face compatibility issues

We still don’t have a good and reliable package like readxl to save excel files. Then you will have to use one of those I mentioned and even if everything works properly on your computer, on your friend’s computer or other people can give problem.

3) excel has line limitation

In excel you have a limitation on the number of lines. Also the files get unnecessarily heavy. If you’re working with large databases, forget it.

That way, you have to think about what you want to save in excel, otherwise work with txt.

There are situations where saving in excel is useful in general when you want to present the final result in an excel spreadsheet. But note that this usually occurs in the final stage of the analysis. While you are manipulating and exchanging database, it is interesting to avoid this format.

  • Dear Carlos, thank you very much for aggregating information. That’s exactly what I’m going through my base is very large and on my friend’s pc in question the R does not read. I need to exercise the transformation in .txt. Thank you

Browser other questions tagged

You are not signed in. Login or sign up in order to post.