Introduction
Usually we need to read several files to create a data bank (data frame
with all data). If you have two or three files it is simple. However, if
you have five or more files like .csv
, .dat
, etc, it can demand more
time. Using R, one of the options to solve it is the list object,
we will show how to handle with.
Case I - reading several files
Supposing you have some .csv
files. If you have not one we will create them.
Creating a hypothetical data frame
DF <-
data.frame(BLOCO = rep(seq(1,4), each = 12),
TRAT=rep(c("T1", "T2", "T3"), each = 4),
REP = seq(1,4),
Z = rnorm(48, 30, 2.6))
head(DF)
## BLOCO TRAT REP Z
## 1 1 T1 1 30.50221
## 2 1 T1 2 30.11910
## 3 1 T1 3 27.47513
## 4 1 T1 4 30.58861
## 5 1 T2 1 30.31546
## 6 1 T2 2 28.90498
Recording five .csv
files
sapply(c("foo1.csv", "foo2.csv", "foo3.csv", "foo4.csv", "foo5.csv"),
function(i){
write.csv(DF, paste(i))})
## $foo1.csv
## NULL
##
## $foo2.csv
## NULL
##
## $foo3.csv
## NULL
##
## $foo4.csv
## NULL
##
## $foo5.csv
## NULL
Checking if the five files have been created
dir()
## [1] "170514_READING_FILES.md" "170514_READING_FILES.Rmd"
## [3] "foo1.csv" "foo2.csv"
## [5] "foo3.csv" "foo4.csv"
## [7] "foo5.csv"
Reading file’s (.csv
) names
temp = list.files(pattern="*.csv")
temp
## [1] "foo1.csv" "foo2.csv" "foo3.csv" "foo4.csv" "foo5.csv"
All .csv
file in only one list, and transforming it to data frame
In my opinion, a good way to store data from several files is the list format. Especially for files with the same header.
DF_LIST <-
lapply(seq(1, length(temp)),
function(z)
read.csv(temp[z],
header = TRUE,
dec = ".",
sep = ","))
DF_LIST_agrup <-
do.call(rbind.data.frame, DF_LIST)
head(DF_LIST_agrup)
## X BLOCO TRAT REP Z
## 1 1 1 T1 1 30.50221
## 2 2 1 T1 2 30.11910
## 3 3 1 T1 3 27.47513
## 4 4 1 T1 4 30.58861
## 5 5 1 T2 1 30.31546
## 6 6 1 T2 2 28.90498
Checking dimensions. All looks great.
dim(DF_LIST_agrup)
## [1] 240 5
You can see it.
View(DF_LIST_agrup)
Case II - Excel files, dealing with …
If you work with spread sheets (.xls
or .xlsx
), you can use readxl
package.
library(readxl)
DF1 <-
read_excel("FILE_NAME.xlsx")
Just in case, if you require a specif sheet from your file .xls
or
.xlsx
, you can set the options.
#library(readxl)
DF2 <-
read_excel("FILE_NAME.xlsx",
sheet = "SHEET_NAME")
Case III - more than one sheet in your excel file
If you .xls
or xlxs
file has more than one sheet, you can use the
same logic for Case I.
Create a list with sheet’s names
library(readxl)
LIST_SHEET <- c("SHEET1",
"SHEET2",
"SHEET3")
Reading file and sheets
DF3_LIST <-
sapply(LIST_SHEET,
function(L){
read_excel("FILE_NAME.xlsx",
sheet = L)
},
simplify = FALSE)
Transform to data frame (if necessary)
### DB ACCOUNT
DF3_DF <-
do.call(rbind.data.frame, DF3_LIST)
More details about reading and storing data in R you can find in References
[]
“Without data, you’re just another person with an opinion.”
W. Edwards Deming
References
Convert a list of data frames into one data frame Concatenating a list of data frames readxl