We’re finally ready to start performing statistical tests in R! In this Unit, we will focus on relatively simple cases of grouped data (data that can be binned into categories).
Such tests include comparison of proportions or ratios using proportion or chi square tests, and comparison of means between groups using t-tests, ANOVAs, and ANCOVAs.
We will start with simple tests to compare proportions and ratios, for this example asking if there are more people with blue or brown eyes in a room.
The specific statistical test we use will depend on the number of groups (e.g., blue:brown, or blue:brown:green) and the number of rooms (e.g., G01, 319, Burke, …), and if we want to compare the ratio to some expected value, or among rooms.
OK, first, we might want to compare the flowering sex ratio of a population of trees (for example … this has absolutely nothing to do with the research of Simon at all in any way whatsoever), and ask if there are more female or male trees.
This question is interesting, because females are the trees that produce fruit and therefore invest more resources than males (that produce only flowers and pollen) in any one reproductive event. This disparity in investment may have implications for the life history of male and female trees: male trees may become reproductively mature at smaller sizes or flower more frequently than females.
So, our question is: Are there significantly more males or females that flowered in this population? In other words, is the ratio of males:females different from 1:1 (= or 50:50)?
We can compare our observed ratio (males:females) to the theoretical (or expected) ratio of 1:1, and ask if they are different.
To compare an observed proportion to a theoretical one when there are only two categories we use a one-proportion z-test.
If we have no a priori expectation, we might ask if the proportion of males is equal to the expected proportion (i.e., 50:50). Our null hypothesis is therefore that the observed proportion (p_o) is equal to the expected proportion (p_e), such that p_o == p_e. Our alternative hypothesis is therefore that the observed proportion is different (p_o != p_e). Thus, we have a two-tailed test (p_o could be greater or less than p_e).
If we have some reason to think that the number of flowering males might be greater than the number of flowering females, we might conduct a one-tailed test, such that our null hypothesis is that p_o > p_e.
The R functions binom.test()
and prop.test()
can be used to carry out a one-proportion test: binom.test()
computes an exact binomial test and is recommended when the sample size is small. prop.test()
can be used when your sample size is large (n > 30). It uses a normal approximation to the binomial distribution.
The syntax of these two functions is essentially the same. They require the arguments x =
the number of successes (the occurence of the outcome we’re interested in, e.g. the number of males), n =
the total number of trials (e.g., the number of males + number of females), p =
the expected probability to test against (the default is 0.5), and alternative =
a character string specifying your alternative hypothesis (‘two.sided’, ‘greater’, or ‘lesser’).
Ok, let’s try it out! Test if a population of 48 male trees and 52 female trees is significantly different from a 1:1 ratio.
prop.test(x = 48, n = 100, alternative = 'two.sided')
##
## 1-sample proportions test with continuity correction
##
## data: 48 out of 100, null probability 0.5
## X-squared = 0.09, df = 1, p-value = 0.7642
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.3798722 0.5816817
## sample estimates:
## p
## 0.48
What does this output tell us? If you used prop.test()
, it first tells us the name of the test, the data, number of successes, trials, and p-value. It provides us with the alternative hypothesis, the 95% confidence intervals, and the calculated probability of success (equal to the p-value for this test). binom.test()
returns something similar.
In either function, the p-value is greater than the usual alpha of 0.05, suggesting that there is no difference in our sample from the expected ratio of 1:1.
Usefully, the output of all statistical models or tests in R is a list, so we can extract the parts we want. Run the function again using prop.test
, and assign the result to the object m
.
m <- prop.test(x = 48, n = 100, alternative = 'two.sided')
Now, extract the p-value from m
using the $
operator.
m$p.value
## [1] 0.7641772
Check if the p-value is less than an alpha of 0.05.
m$p.value < 0.05
## [1] FALSE
We have some sensible biological reason to think that we might expect more males flowering this year. Alter your previous test to reflect this hypothesis. Do not assign the output to any object.
prop.test(x = 48, n = 100, alternative = 'greater')
##
## 1-sample proportions test with continuity correction
##
## data: 48 out of 100, null probability 0.5
## X-squared = 0.09, df = 1, p-value = 0.6179
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
## 0.3946052 1.0000000
## sample estimates:
## p
## 0.48
Ok, great work! You can test if a proportion is different from an expected value. What about if we now have data on our trees flowering in the next year? Do we have the same proportion of males flowering in both years?
We can use a two-proportions z-test to compare two observed proportions. Similarly to the one-proportion z-test, we can ask whether the proportion of males flowering in year 1 is equal to the proportion in year 2 (p_1 == p_2), or if the proportion in year 1 is less in year 2 (p_1 < p_2), or if the proportion in year 1 is greater in year 2 (p_1 > p_2).
If we have no a priori expectation of how male flowering varies from year to year, we might ask if the proportion of males flowering in year 1 is equal to that in year 2 (i.e., 50:50). Our null hypothesis is therefore that p_1 == p_2. Our alternative hypothesis is therefore that the observed proportion is different (p_1 != p_2). Thus, we have a two-tailed test (p_1 could be greater or less than p_2).
If we have some reason to think that the number of flowering males might be greater in year 1 than year 2, we might conduct a one-tailed test, such that our null hypothesis is that p_1 > p_2.
There are two different R functions to compare two-proportions. If you have >5 individuals or counts in all four groups (e.g., >5 males in year 1, and >5 males in year 2, and >5 females in year 1, and >5 females in year 2) a two-proportion z-test is fine. If you have fewer than 5 individuals or counts in any category, then you should use a Fisher’s exact test, a non-parametric test for small sample sizes (we will come back to this shortly).
We can use the function prop.test()
again to run a two-proportion z-test. Again, we need the number of success (x =
), the total number of trials (n =
), and specify the alternative hypothesis (alternative =
).
In the one-proportion test, we passed single integers to x
and n
(x = 48
and n = 100
). In this case, we have two proportions to test, so we need to pass a vector. In year 1, we had 48 male trees flowering and 52 females. In year 2, we had 58 male trees flowering and 43 female trees. Test if the proportion of males was greater in year 2.
prop.test(x = c(48, 58), n = c(100, 101), alternative = 'less')
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: c(48, 58) out of c(100, 101)
## X-squared = 1.4329, df = 1, p-value = 0.1156
## alternative hypothesis: less
## 95 percent confidence interval:
## -1.0000000 0.0310283
## sample estimates:
## prop 1 prop 2
## 0.4800000 0.5742574
If you look at the help file for prop.test()
, you will see that for x
it states: ‘a vector of counts of successes, a one-dimensional table with two entries, or a two-dimensional table (or matrix) with 2 columns, giving the counts of successes and failures, respectively.’ And for n
: ‘a vector of counts of trials; ignored if x is a matrix or a table’.
So, we can pass a matrix to x
instead. Make a matrix of two columns, with the counts of male and female flowering trees in years 1 and 2. Columns should be males and females, rows should be years. (There are several ways to do this operation, but do it with a vector of values and either an nrow or ncol argument).
matrix(c(48, 58, 52, 43), ncol = 2)
## [,1] [,2]
## [1,] 48 52
## [2,] 58 43
This matrix is what is known as a contingency table, a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. In other words, it displays the counts that match the criteria for each cell in the table, e.g., male flowering in year 1.
Now, put this matrix into the call to prop.test()
for the argument x
. Then, you do not need to include n
(but you will still need to set alternative =
).
prop.test(x = matrix(c(48, 58, 52, 43), ncol = 2), alternative = 'less')
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: matrix(c(48, 58, 52, 43), ncol = 2)
## X-squared = 1.4329, df = 1, p-value = 0.1156
## alternative hypothesis: less
## 95 percent confidence interval:
## -1.0000000 0.0310283
## sample estimates:
## prop 1 prop 2
## 0.4800000 0.5742574
The output is the same as before, except the p-value relates to the hypothesis of whether the proportion of males flowering is the same in years 1 and 2.
However, what if instead we wanted to ask if the ratio of males:females was different from 1:1 in either year? prop.test()
has the argument p =
, where you can give the specific probability for each pair of successes and failures.
Run the two-proportion test again, specifying a 1:1 ratio for each year.
prop.test(x = matrix(c(48, 58, 52, 43), ncol = 2), alternative = 'two.sided', p = c(0.5,0.5))
##
## 2-sample test for given proportions with continuity correction
##
## data: matrix(c(48, 58, 52, 43), ncol = 2), null probabilities c(0.5, 0.5)
## X-squared = 2.0306, df = 2, p-value = 0.3623
## alternative hypothesis: two.sided
## null values:
## prop 1 prop 2
## 0.5 0.5
## sample estimates:
## prop 1 prop 2
## 0.4800000 0.5742574
So, what do we do if our sample size is very small? We can run a Fisher’s exact test using the function fisher.test()
.
This test was devised in order to test a the claim of a friend of Ronald Fisher that she could tell whether the tea or the milk was added first to a cup of tea (she was English, of course). As a good statistician, Fisher proposed an experimental test, and gave her 8 cups of tea in a random order, 4 with the milk in first, and 4 with the tea in first. The results can be seen in the object tea_test
. Have a look.
tea_test
## Truth
## Guess Milk Tea
## Milk 3 1
## Tea 1 3
As for prop.test()
, the fisher.test()
takes a matrix, which, handily, is how the tea_test
object is structured. Run a Fisher’s exact test on the tea_test data.
fisher.test(tea_test)
##
## Fisher's Exact Test for Count Data
##
## data: tea_test
## p-value = 0.4857
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.2117329 621.9337505
## sample estimates:
## odds ratio
## 6.408309
Hmmm … in this experiment, the friend did not do so well …
So far, we have dealt with variables with only two categories. But many things occur in more than two categories. To compare an observed proportion to a theoretical one when there are more than two categories we can use a Chi-square goodness-of-fit test.
The chi-square goodness of fit test is used to compare an observed distribution to an expected distribution, in a situation where we have two or more categories of discrete data. In other words, it compares multiple observed proportions to expected probabilities.
Suppose that you go to Lyman’s Orchard to pick apples. You picked 81 Honeycrisp, 50 Empire, and 27 Macoun. We can ask two questions about these apples: First, did we pick equal proportions of all varieties? Second, does the proportion of each variety we picked (i.e., the observed) match the proportions in which they are grown at Lymans (i.e., the expected)?
We can use the R function chisq.test()
to answer both these questions. As usual, we need to provide the arguments x =
(a vector or matrix of data) and p =
(a vector of probabilities). Test if we picked the three apple varieties in equal proportions.
chisq.test(x = c(81, 50, 27), p = c(1/3, 1/3, 1/3))
##
## Chi-squared test for given probabilities
##
## data: c(81, 50, 27)
## X-squared = 27.886, df = 2, p-value = 8.803e-07
The p-value is much less than 0.05, so we reject the null hypothesis that we picked the varieties at equal proportions.
Ok, does our picking reflect the relative abundance of each variety in Lyman’s Orchard? Lyman’s grow 10 ha of Honeycrisp, 20 ha of Empire, and 30 ha of Macoun. If we picked apples at random, our bags should reflect these proportions. Check if this was so. Simplify to fractions of 6
chisq.test(x = c(81, 50, 27), p = c(1/6, 2/6, 3/6))
##
## Chi-squared test for given probabilities
##
## data: c(81, 50, 27)
## X-squared = 147.85, df = 2, p-value < 2.2e-16
The p-value of < 2.2e-16 suggests that someone has a significant preference for Honeycrip apples …
Ok.. now we want to know if apple preference is related to preference for pasta shapes. I asked last year’s class* what apples and pasta shapes they preferred. The results are in the object apples_vs_pasta
. Have a look. [*not really]
apples_vs_pasta
## Apple
## Pasta Baldwin Golden Russet Newtown Pippin Winesap Honeycrip
## fusilli 10 26 7 14 25
## spaghettini 30 28 20 16 29
## rigatoni 16 12 7 10 15
## orzo 2 19 12 8 8
The chi-squared test of independence is used to analyze the contingency table formed by the counts of levels of two categorical variables. The chi-squared test evaluates whether there is a significant association between the categories of the two variables.
The chi-squared test can be carried out with the function chisq.test()
. We used it above with a vector of data and the function assumed that we wanted to run a goodness-of-fit test. If we pass a data matrix to the argument x =
instead, it assumes (correctly!) that we want to run a Pearson chi-squared test.
Run a chi-squared test on the apples_vs_pasta data matrix. Assign the output to the object m
.
m <- chisq.test(apples_vs_pasta)
You can see the test results by typing m
.
m
##
## Pearson's Chi-squared test
##
## data: apples_vs_pasta
## X-squared = 26.032, df = 12, p-value = 0.01062
Lots of other useful information is stored in our model object m
. Look at the structure to get a better idea.
str(m)
## List of 9
## $ statistic: Named num 26
## ..- attr(*, "names")= chr "X-squared"
## $ parameter: Named int 12
## ..- attr(*, "names")= chr "df"
## $ p.value : num 0.0106
## $ method : chr "Pearson's Chi-squared test"
## $ data.name: chr "apples_vs_pasta"
## $ observed : num [1:4, 1:5] 10 30 16 2 26 28 12 19 7 20 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ Pasta: chr [1:4] "fusilli" "spaghettini" "rigatoni" "orzo"
## .. ..$ Apple: chr [1:5] "Baldwin" "Golden Russet" "Newtown Pippin" "Winesap" ...
## $ expected : num [1:4, 1:5] 15.15 22.72 11.08 9.05 22.2 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ Pasta: chr [1:4] "fusilli" "spaghettini" "rigatoni" "orzo"
## .. ..$ Apple: chr [1:5] "Baldwin" "Golden Russet" "Newtown Pippin" "Winesap" ...
## $ residuals: num [1:4, 1:5] -1.322 1.527 1.477 -2.344 0.807 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ Pasta: chr [1:4] "fusilli" "spaghettini" "rigatoni" "orzo"
## .. ..$ Apple: chr [1:5] "Baldwin" "Golden Russet" "Newtown Pippin" "Winesap" ...
## $ stdres : num [1:4, 1:5] -1.7 2.17 1.82 -2.83 1.1 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ Pasta: chr [1:4] "fusilli" "spaghettini" "rigatoni" "orzo"
## .. ..$ Apple: chr [1:5] "Baldwin" "Golden Russet" "Newtown Pippin" "Winesap" ...
## - attr(*, "class")= chr "htest"
We have a list of 9 things, including the chi-squared statistic, a p-value, the observed data and the calculated expected data.
Congratulations! You made it through the first lesson in the statistics series. Hopefully you are comfortable working with counts of categories and checking if they match some a priori expected ratio, or are related to counts of other categorical variables.
In the next lesson, we will look at categorical predictor variables and continuous response variables, and ask if our samples are drawn from populations with different means.
Please submit the log of this lesson to Google Forms so that Simon may evaluate your progress.
I accept the arbitrary nature of socially constructed educational programs and agree to be evaluated.