Reading and storing several files using R

A practical method to read and store data in R (statistical programming language)

Introduction

Usually we need to read several files to create a data bank (data frame with all data). If you have two or three files it is simple. However, if you have five or more files like .csv, .dat, etc, it can demand more time. Using R, one of the options to solve it is the list object, we will show how to handle with.

Case I - reading several files

Supposing you have some .csv files. If you have not one we will create them.

Creating a hypothetical data frame


DF <-
    data.frame(BLOCO = rep(seq(1,4), each = 12),
               TRAT=rep(c("T1", "T2", "T3"), each = 4),
               REP = seq(1,4),
               Z = rnorm(48, 30, 2.6))
head(DF)
##   BLOCO TRAT REP        Z
## 1     1   T1   1 30.50221
## 2     1   T1   2 30.11910
## 3     1   T1   3 27.47513
## 4     1   T1   4 30.58861
## 5     1   T2   1 30.31546
## 6     1   T2   2 28.90498

Recording five .csv files

sapply(c("foo1.csv", "foo2.csv", "foo3.csv", "foo4.csv", "foo5.csv"),
       function(i){
       write.csv(DF, paste(i))})
## $foo1.csv
## NULL
## 
## $foo2.csv
## NULL
## 
## $foo3.csv
## NULL
## 
## $foo4.csv
## NULL
## 
## $foo5.csv
## NULL

Checking if the five files have been created

dir()
## [1] "170514_READING_FILES.md"  "170514_READING_FILES.Rmd"
## [3] "foo1.csv"                 "foo2.csv"                
## [5] "foo3.csv"                 "foo4.csv"                
## [7] "foo5.csv"

Reading file’s (.csv) names

temp = list.files(pattern="*.csv")
temp
## [1] "foo1.csv" "foo2.csv" "foo3.csv" "foo4.csv" "foo5.csv"

All .csv file in only one list, and transforming it to data frame

In my opinion, a good way to store data from several files is the list format. Especially for files with the same header.

DF_LIST <-
    lapply(seq(1, length(temp)),
           function(z)
               read.csv(temp[z],
                        header = TRUE,
                        dec = ".",
                        sep = ","))

DF_LIST_agrup <-
    do.call(rbind.data.frame, DF_LIST)

head(DF_LIST_agrup)
##   X BLOCO TRAT REP        Z
## 1 1     1   T1   1 30.50221
## 2 2     1   T1   2 30.11910
## 3 3     1   T1   3 27.47513
## 4 4     1   T1   4 30.58861
## 5 5     1   T2   1 30.31546
## 6 6     1   T2   2 28.90498

Checking dimensions. All looks great.

dim(DF_LIST_agrup)
## [1] 240   5

You can see it.

View(DF_LIST_agrup)

Case II - Excel files, dealing with …

If you work with spread sheets (.xls or .xlsx), you can use readxl package.

library(readxl)

DF1 <- 
read_excel("FILE_NAME.xlsx")

Just in case, if you require a specif sheet from your file .xls or .xlsx, you can set the options.

#library(readxl)

DF2 <-
    read_excel("FILE_NAME.xlsx",
               sheet = "SHEET_NAME")

Case III - more than one sheet in your excel file

If you .xls or xlxs file has more than one sheet, you can use the same logic for Case I.

Create a list with sheet’s names

library(readxl)

LIST_SHEET <- c("SHEET1",
                "SHEET2",
                "SHEET3")

Reading file and sheets

DF3_LIST <-
    sapply(LIST_SHEET,
           function(L){
               read_excel("FILE_NAME.xlsx",
                          sheet = L)
               },
           simplify = FALSE)

Transform to data frame (if necessary)

### DB ACCOUNT
DF3_DF <-
    do.call(rbind.data.frame, DF3_LIST)

More details about reading and storing data in R you can find in References

[]

“Without data, you’re just another person with an opinion.”
W. Edwards Deming

References

Convert a list of data frames into one data frame Concatenating a list of data frames readxl

 Share!

 
comments powered by Disqus