R allows you to create new data structures^1^
Use different data modes.
Understand how the different data structures are organised.
Create and subset these structures.
These can be put together to create more complex objects.
Data objects can either be restricted to one kind of data, or contain more than one kind.
We will explore the common object types over the next few weeks.
Integer
x <- 1:10
Numeric
x <- c(1.5, 2.5, 3.34, 10.6576)
Character or string
simon_says <- c("hullo", "my", "name", "is", "Simon")
Logical
simon_says_again <- c("TRUE", "TRUE", "FALSE", "TRUE", "FALSE", "FALSE")
Complex (numbers with real and imaginary parts)
1 + 4i
R has many different data structures
We can put data classes together in many different ways:
A vector can be thought of as equivalent to a single row or single column in a spreadsheet.
A vector is any number of elements stuck together.
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
x
[1] 1 2 3 4 5 6 7 8 9 10
A single element is a vector of length 1.
x <- 1
x
[1] 1
A vector can contain only one class of data (= atomic vector).
All elements in a vector are coerced to be the same kind of data.
x <- c(1, 2, 3, 4, "a")
x
[1] "1" "2" "3" "4" "a"
Vectors can be created with c()
or vector()
, in which case R will try
to guess what kind of vector it is. (The default for vector()
is logical).
vector(length = 5)
[1] FALSE FALSE FALSE FALSE FALSE
Or, you can create specific kinds of vector with character()
, numeric()
, and logical()
.
character(length = 5)
[1] "" "" "" "" ""
Matrices are vectors with two dimensions.
Can be created by:
sticking vectors together,
chopping up a longer vector.
matrix(1:20, ncol = 5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
A 3D array in R:
> array(1:60, dim = c(4,5,3))
, , 1
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
, , 2
[,1] [,2] [,3] [,4] [,5]
[1,] 21 25 29 33 37
[2,] 22 26 30 34 38
[3,] 23 27 31 35 39
[4,] 24 28 32 36 40
, , 3
[,1] [,2] [,3] [,4] [,5]
[1,] 41 45 49 53 57
[2,] 42 46 50 54 58
[3,] 43 47 51 55 59
[4,] 44 48 52 56 60
Each element comes from a pre-defined set of categories.
Can be:
Factors can be written or coded using any mode (integer, text, logical).
# unordered 3-level factor with integers
x0 <- factor(c(1, 2, 3, 2))
x0
[1] 1 2 3 2
Levels: 1 2 3
table(x0)
x0
1 2 3
1 2 1
# unordered 3-level factor with text (default order is alphanumeric)
x1 <- factor(c("large", "small", "medium", "small"))
table(x1)
x1
large medium small
1 1 2
# ordered 3-level factor with text
x2 <- factor(c("large", "small", "medium", "small"),
ordered = TRUE,
levels = c("small", "medium", "large"))
x2
[1] large small medium small
Levels: small < medium < large
table(x2)
x2
small medium large
2 1 1
Dataframes can contain different kinds of data
Dataframes are equivalent to a single worksheet in a spreadsheet.
They can contain columns of different kinds of data.
You will likely read your data into R as a dataframe.
Year Colour Size_mm
2017 red 23.5
2016 red 12.67
2017 blue 15.2
2016 blue 1.0
...
A List is a recursive vector
Lists can contain any other kind of data (including lists!) in a nested hierarchy.
Lists are like a wardrobe (= closet) where you can store many different kinds of hangers and clothes:
Objects in R can be examined for their contents with functions such as: length()
, class()
, str()
, typeof()
.
As well as any attributes: names()
, dimnames()
, dim()
.
You can verify the object type with is.object: is.vector()
, is.matrix()
, etc.
You can convert (coerce) between atomic vectors with as.object:
as.numeric(c('TRUE', 'FALSE', 'TRUE', 'FALSE'))
[1] NA NA NA NA
… although be careful.
R supports missing data, represented as NA
.
Inf
is infinity. You can have either positive or negative infinity.
1/0
NaN
means Not a Number. It’s an undefined value.
0/0
Phylogenetic trees.
Vector GIS (shape files, …).
Spatial point pattern.
Updated: 2018-09-10