Both are two-dimensional data structures.
Matrices are atomic vectors with dimensions and can only have one mode (type) of data.
Dataframes can have columns of different data modes.
Matrices are an extension of numeric or character vectors.
They are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns.
Because they are an atomic vector, all the elements are the same type (integer, or numeric, or, etc).
Matrices are used frequently in mathematical models and statistics.
We can create an empty matrix and fill it later, with matrix()
.
matrix(ncol = 2, nrow = 2)
[,1] [,2]
[1,] NA NA
[2,] NA NA
or create one from vector (here we fill in the optional data =
argument within matrix()
).
matrix(1:4, ncol = 2, nrow = 2)
[,1] [,2]
[1,] 1 3
[2,] 2 4
By default, matrices are filled by columns. We can fill be rows instead.
matrix(1:4, ncol = 2, nrow = 2, byrow = TRUE)
[,1] [,2]
[1,] 1 2
[2,] 3 4
We can label the columns, with colnames()
…
m <- matrix(1:4, ncol = 2, nrow = 2)
colnames(m) <- c("A", "B")
m
A B
[1,] 1 3
[2,] 2 4
… and the rows, with rownames()
rownames(m) <- c("a", "b")
m
A B
a 1 3
b 2 4
We can add extra columns, with cbind()
,
cbind(m, 5:6)
A B
a 1 3 5
b 2 4 6
or rows, with rbind()
.
rbind(m, 7:8)
A B
a 1 3
b 2 4
7 8
We can calculate the sum of each row, with rowSums()
.
rowSums(m)
a b
4 6
And the sum of each column, with colSums()
.
colSums(m)
A B
3 7
We can transpose the entire matrix, with t()
.
t(m)
a b
A 1 2
B 3 4
Dataframes are also two-dimensional objects. But, unlike matrices, dataframes can include columns of any data type.
Techically, data frames are a type of list where every part (i.e., column) of the list has same length. A data frame is therefore a rectangular list.
Data frames area akin to the usual “observations and variables” model that most statistical software uses. They are also similar to database tables.
So, each row of a data frame should represent an observation and the elements in a given row represent information about that observation. Each column has all the information about a particular variable for the entire data set.
Dataframes are most likely what you will use to work with your own data.
Each column can only be one mode or data type.
Each column is the same length.
Can be created with data.frame()
. This works ok for small datasets.
df <- data.frame(sample_id = c('i', 'ii', 'iii', 'iv'), x = c(1,2,3,4), y = c(5, 6, 7, 8))
df
sample_id x y
1 i 1 5
2 ii 2 6
3 iii 3 7
4 iv 4 8
For larger dataset, more usually you will read in external data using read.table()
or read.csv()
(See Importing Data)
We can label (specific) columns, with colnames()
…
colnames(df)[2:3] <- c('x1', 'y2')
df
sample_id x1 y2
1 i 1 5
2 ii 2 6
3 iii 3 7
4 iv 4 8
… and the rows with rownames()
rownames(df) <- c('row1', 'row2', 'row3', 'row4')
df
sample_id x1 y2
row1 i 1 5
row2 ii 2 6
row3 iii 3 7
row4 iv 4 8
We can add columns using cbind()
and rows using rbind()
, as for matrices.
z <- c(9, 10, 11, 12)
df <- cbind(df, z)
df
sample_id x1 y2 z
row1 i 1 5 9
row2 ii 2 6 10
row3 iii 3 7 11
row4 iv 4 8 12
We can also create new columns using the ` $ ` and new column name.
df$col4 <- c(9, 10, 11, 12)
df
sample_id x1 y2 z col4
row1 i 1 5 9 9
row2 ii 2 6 10 10
row3 iii 3 7 11 11
row4 iv 4 8 12 12
We can check if it is a dataframe, with is.data.frame()
is.data.frame(df)
[1] TRUE
And confirm that is it, actually, a list.
is.list(df)
[1] TRUE
We can check the dimenions with dim()
.
dim(df)
4 5
And ask how many columns with ncol()
,
ncol(df)
[1] 5
and rows with nrow()
.
nrow(df)
[1] 4
We can check the structure, with str()
.
str(df)
'data.frame': 4 obs. of 5 variables:
$ sample_id: Factor w/ 4 levels "i","ii","iii",..: 1 2 3 4
$ x1 : num 1 2 3 4
$ y2 : num 5 6 7 8
$ z : num 9 10 11 12
$ col4 : num 9 10 11 12
We can examine the first few rows (default is 6) with head()
.
head(df, n = 2)
sample_id x1 y2 z col4
row1 i 1 5 9 9
row2 ii 2 6 10 10
And the last few rows with tail()
.
tail(df, n = 3)
sample_id x1 y2 z col4
row2 ii 2 6 10 10
row3 iii 3 7 11 11
row4 iv 4 8 12 12
Crawley, M. The R Book. Ch. 4 Dataframes