Introduction
Usually we need to read several files to create a data bank (data frame
with all data). If you have two or three files it is simple. However, if
you have five or more files like .csv, .dat, etc, it can demand more
time. Using R, one of the options to solve it is the list object,
we will show how to handle with.
Case I - reading several files
Supposing you have some .csv files. If you have not one we will create them.
Creating a hypothetical data frame
DF <-
data.frame(BLOCO = rep(seq(1,4), each = 12),
TRAT=rep(c("T1", "T2", "T3"), each = 4),
REP = seq(1,4),
Z = rnorm(48, 30, 2.6))
head(DF)
## BLOCO TRAT REP Z
## 1 1 T1 1 30.50221
## 2 1 T1 2 30.11910
## 3 1 T1 3 27.47513
## 4 1 T1 4 30.58861
## 5 1 T2 1 30.31546
## 6 1 T2 2 28.90498
Recording five .csv files
sapply(c("foo1.csv", "foo2.csv", "foo3.csv", "foo4.csv", "foo5.csv"),
function(i){
write.csv(DF, paste(i))})
## $foo1.csv
## NULL
##
## $foo2.csv
## NULL
##
## $foo3.csv
## NULL
##
## $foo4.csv
## NULL
##
## $foo5.csv
## NULL
Checking if the five files have been created
dir()
## [1] "170514_READING_FILES.md" "170514_READING_FILES.Rmd"
## [3] "foo1.csv" "foo2.csv"
## [5] "foo3.csv" "foo4.csv"
## [7] "foo5.csv"
Reading file’s (.csv) names
temp = list.files(pattern="*.csv")
temp
## [1] "foo1.csv" "foo2.csv" "foo3.csv" "foo4.csv" "foo5.csv"
All .csv file in only one list, and transforming it to data frame
In my opinion, a good way to store data from several files is the list format. Especially for files with the same header.
DF_LIST <-
lapply(seq(1, length(temp)),
function(z)
read.csv(temp[z],
header = TRUE,
dec = ".",
sep = ","))
DF_LIST_agrup <-
do.call(rbind.data.frame, DF_LIST)
head(DF_LIST_agrup)
## X BLOCO TRAT REP Z
## 1 1 1 T1 1 30.50221
## 2 2 1 T1 2 30.11910
## 3 3 1 T1 3 27.47513
## 4 4 1 T1 4 30.58861
## 5 5 1 T2 1 30.31546
## 6 6 1 T2 2 28.90498
Checking dimensions. All looks great.
dim(DF_LIST_agrup)
## [1] 240 5
You can see it.
View(DF_LIST_agrup)
Case II - Excel files, dealing with …
If you work with spread sheets (.xls or .xlsx), you can use readxl package.
library(readxl)
DF1 <-
read_excel("FILE_NAME.xlsx")
Just in case, if you require a specif sheet from your file .xls or
.xlsx, you can set the options.
#library(readxl)
DF2 <-
read_excel("FILE_NAME.xlsx",
sheet = "SHEET_NAME")
Case III - more than one sheet in your excel file
If you .xls or xlxs file has more than one sheet, you can use the
same logic for Case I.
Create a list with sheet’s names
library(readxl)
LIST_SHEET <- c("SHEET1",
"SHEET2",
"SHEET3")
Reading file and sheets
DF3_LIST <-
sapply(LIST_SHEET,
function(L){
read_excel("FILE_NAME.xlsx",
sheet = L)
},
simplify = FALSE)
Transform to data frame (if necessary)
### DB ACCOUNT
DF3_DF <-
do.call(rbind.data.frame, DF3_LIST)
More details about reading and storing data in R you can find in References
[]
“Without data, you’re just another person with an opinion.”
W. Edwards Deming
References
Convert a list of data frames into one data frame Concatenating a list of data frames readxl