These labs test and build on the material presented in the SWIRL lessons.
Scroll down or click here to to check how to submit them.
Lab 1 and 2: 2018-09-21 23:59
Lab 3 and recap: 2018-09-28 23:59
(no best practice lab)
a. Create a vector from 1 to 20 using whatever method you like.
a. Convert this vector into a 4 column matrix.
b. Sum the rows of the matrix, with rowSums()
.
c. Sum the columns of the matrix, with colSums()
.
d. Give the matrix column names (whatever you want).
Table 1. Ten random responses from the class survey.
Height | Eye Color | Shoe Size | # Siblings | Home Pop. | HP Books | HP House | Ideal Temp | Fav Color | Tea or Coffee | Pineapple on Pizza? | Roll your Tongue? | Tree or Pollen Allergy |
---|---|---|---|---|---|---|---|---|---|---|---|---|
175 | Brown | 10 | 2 | 4621 | 7 | Hufflepuff | 21 | Blue | Coffee | Yes | Yes | Both |
177.8 | Blue | 10 | 2 | 200000 | 7 | Ravenclaw | 21 | Blue | Coffee | Yes | Yes | Neither |
167.64 | Hazel | 9.5 | 3 | 4988 | 0 | Hufflepuff | 15.56 | Blue | Coffee | Disgusting | No | Both |
183 | Hazel | 11.5 | 2 | 62243 | 5 | Ravenclaw | 21 | Blue | Coffee | Yes | Yes | Both |
163 | Hazel | 9 | 1 | 3300 | 7 | Hufflepuff | 18.9 | Green | Coffee | Disgusting | Yes | Both |
175.26 | Blue | 11 | 2 | 21845 | 7 | Gryffindor | 26 | Red | Tea | Yes | Yes | Neither |
165 | Brown | 8 | 0 | 9000000 | 7 | Hufflepuff | 20 | Purple | Coffee | Yes | Yes | Neither |
162.56 | Brown | 7 | 2 | 27865 | 7 | Gryffindor | 23 | Yellow | Coffee | Disgusting | Yes | Neither |
167 | Brown | 7.5 | 0 | 80000 | 7 | Ravenclaw | 23 | Blue | Coffee | Disgusting | Yes | Both |
167 | Brown | 8.5 | 5 | 3500 | 0 | Hufflepuff | 27 | Blue | Coffee | Disgusting | Yes | Neither |
a. Using these vectors, create a matrix. Ensure that all vectors are numeric or integers (you may need to re-code them).
b. Convert this matrix to a data frame (you may need to google or find external help).
b. Display the first 7 rows of the data frame. Check ?head
if needed.
c. Display the last 4 rows of the data frame. Check ?tail
if needed.
d. Add eye color as another column to the same data frame (do not create a new data frame).
e. Look at the structure and summary of the data frame. Which columns are which data type?
Table 2. Apple production in selected countries in 2016 (Source: FAO).
Country | Harvested area (ha) | Apple production (tonnes) |
---|---|---|
China | 2383815 | 44447793 |
India | 314000 | 2872000 |
Iran | 238638 | 2799197 |
Russia | 214270 | 1843544 |
Poland | 177203 | 3604271 |
Turkey | 173394 | 2925828 |
United States | 130552 | 4649323 |
Uzbekistan | 101726 | 1120209 |
Pakistan | 91928 | 590039 |
Ukraine | 91600 | 1099240 |
Italy | 56164 | 2455616 |
France | 49618 | 1819762 |
Chile | 36063 | 1759421 |
Ukraine | 91600 | 1099240 |
Brazil | 33981 | 1049251 |
Germany | 31334 | 1032913 |
United Kingdom | 16512 | 481100 |
a. Display the head, tail, and summary of data.
b. What are the dimensions of the data frame?
c. What is the mean harvested area across all countries?
d. What is the minimum apple production across all countries?
e. Plot the values of apple production on harvested area. Add labels to the x- and y-axes, and a title to the plot.
Read in the following data frame (copy and paste into your R console). This data details CO2 emissions from various sources in 2015. (Source)
CO2_2015 <- data.frame(
Country = c('World', 'China', 'United States', 'European Union', 'India', 'Russia', 'Japan', 'Germany', 'International Shipping', 'Iran', 'South Korea', 'Canada', 'Saudi Arabia', 'Indonesia', 'International Aviation'),
Total_kt = c(36061710, 10641789, 5172336, 3469671, 2454968, 1760895, 1252890, 777905, 642024, 633750, 617285, 555401, 505565, 502961, 502936),
Percent_World_CO2 = c(100.00, 29.51, 14.34, 9.62, 6.81, 4.88, 3.47, 2.16, 1.78, 1.76, 1.71, 1.54, 1.40, 1.39, 1.39),
Per_capita_t = c(NA , 7.7, 16.1, 6.9, 1.9, 12.3, 9.9, 9.6, NA , 8, 12.3, 15.5, 16, 2, NA),
Kg_per_USD1000_GDP_2014 = c(490.8, 1235, 324.2, 184.7, 1051.5, 999.4, 205.2, 197.4, NA, 1344.4, 475.7, 301, 921.9, 492.7, NA)
)
CO2_2015
.a. Subset the Percent of world emissions. Use $
, and assign to a new variable.
b. Subset the Per_capita_t column, using [‘name’].
c. Subset the Country column. What is the data structure?
a. Display just the first element of the vector
b. Display just the last element of the vector
c. Display the 2, 5, and 9th element of the vector
d. Display the first 5 elements of the vector
e. Remove the first 4 elements of this vector
a. Display all values of the vector less than 10
b. Display all values of the vector more than 2
c. Display all values of the vector more than 2 and less than 10
d. Display all values of the vector less than 2 or more than 10
Read in this data frame (copy and paste into your R console). The data are part of a survey of the state of global happiness, conducted each year by the UN. Each variable measured reveals a populated-weighted average score on a scale running from 0 to 10 that is tracked over time and compared against other countries. (Source)
Happiness <- data.frame(
OverallRank = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
Country = c('Finland', 'Norway', 'Denmark', 'Iceland', ' Switzerland', 'Netherlands', 'Canada', 'New Zealand', 'Sweden', 'Australia', 'Israel', 'Austria', 'Costa Rica', 'Ireland', 'Germany', 'Belgium', 'Luxembourg', 'United States', 'United Kingdom', 'United Arab Emirates'),
Score = c(7.632, 7.594, 7.555, 7.495, 7.487, 7.441, 7.328, 7.324, 7.314, 7.272, 7.19, 7.139, 7.072, 6.977, 6.965, 6.927, 6.91, 6.886, 6.814, 6.774),
GDP_per_capita = c(1.305, 1.456, 1.351, 1.343, 1.42, 1.361, 1.33, 1.268, 1.355, 1.34, 1.244, 1.341, 1.01, 1.448, 1.34, 1.324, 1.576, 1.398, 1.301, 2.096),
Social_support = c(1.592, 1.582, 1.59, 1.644, 1.549, 1.488, 1.532, 1.601, 1.501, 1.573, 1.433, 1.504, 1.459, 1.583, 1.474, 1.483, 1.52, 1.471, 1.559, 0.776),
Healthy_life_expectancy = c(0.874, 0.861, 0.868, 0.914, 0.927, 0.878, 0.896, 0.876, 0.913, 0.91, 0.888, 0.891, 0.817, 0.876, 0.861, 0.894, 0.896, 0.819, 0.883, 0.67),
Freedom_life_choices = c(0.681, 0.686, 0.683, 0.677, 0.66, 0.638, 0.653, 0.669, 0.659, 0.647, 0.464, 0.617, 0.632, 0.614, 0.586, 0.583, 0.632, 0.547, 0.533, 0.284),
Generosity = c(0.192, 0.286, 0.284, 0.353, 0.256, 0.333, 0.321, 0.365, 0.285, 0.361, 0.262, 0.242, 0.143, 0.307, 0.273, 0.188, 0.196, 0.291, 0.354, 0.186),
Perceptions_corruption = c(0.393, 0.34, 0.408, 0.138, 0.357, 0.295, 0.291, 0.389, 0.383, 0.302, 0.082, 0.224, 0.101, 0.306, 0.28, 0.24, 0.321, 0.133, 0.272, NA)
)
a. Display the first element of the first column.
b. Display the entire first row.
c. Display the entire 2nd column without its name.
d. Display the 2-5th rows and the 3-4th columns.
e. Display $Score
column values that are more than 7.
f. Display $Score
column values for where column $Generosity
is greater than 0.2.
g. Make a boxplot of ‘Perception of corruption’ for which GDP is greater than 1.3. Make the boxes green.
h. Remove the 3rd row of data.
i. Plot a histogram of Social support. Change the number of breaks in the histogram to 10.
The data for this lab are available on the Data page.
Please upload any cleaned datafiles with your R code to Canvas.
a. Read in CO2_2015.txt (tab-delim). Display the first 10 rows.
b. Read in happiness.csv (comma-delim). What is the mean GDP?
c. Read in apples.txt. How many rows and columns does this data have?
a. Michigan tree species.
b. Harry Potter movie budgets.
c. Galapagos mammal incidence.
birdflu.xls
spreadsheet.Remember that R can only have 1 row as the header.
a. Use the names()
and str()
functions in R to view the data.
b. What is the total number of bird flu cases in 2003 and in 2005?
c. Which country has had the most cases?
d. Which country has had the least bird flu deaths?
e. What is the total number of bird flu cases per country?
f. What is the total number of cases per year?
Data you will need is available from the Data page.
Please upload any cleaned datafiles with your R code to Canvas.
a. Create a matrix of your choice.
b. Create a data frame of your choice.
c. Create a vector of your choice.
d. Put all three objects in a list.
e. Subset the data frame from the list by its name.
f. Subset a column in the data frame in the list.
g. Subset the vector by its position in the list.
birdflu.xls
spreadsheet againa. Edit the birdflu spreadsheet from a wide format to a long format.
Check the best practice on Principles of data files (which I’m sure you have read!) to see the difference.
b. Read in the edited birdflu data.
c. How many deaths were there in 2007?
d. How many cases were there in Thailand in all years?
e. How many cases were there in Indonesia in 2008?
Download the New Haven road race data from 2015.
Check here if you are having trouble.
a. Clean and read in the data.
b. How many males and females ran in the race?
c. How many males under 19 years old ran in the race?
c. What was the fastest race time?
d. What was the mean race time for runners from New Haven?
f. Make a boxplot of pace as a function of gender (male vs female).
g. How fast did your instructor run?
You will need to write R code to answer each of the questions.
Please format your answers as follows:
Copy and paste each question, commented out. This ensures that we know which answer corresponds to which question.
Write your R code answer below each question.
It should look something like this:
# LAB: Unit 1. Lab 1
# Your Name # Put your name here
# 1. Add 7 and 3,456.
7 + 3456
# 2. Assign this value to an object
x <- 7 + 3456
Log into Canvas.
Go to the Assignments page.
Under ‘Labs’, you should find the correct assignment.
Copy and paste your R code into the text box.
Click ‘Submit Assignment’.
You are permitted to submit your answers as many times as you like within each Unit.
Answers will be graded two or three times a week and re-opened if you submit early.
Each lab will close at its respective deadline (see Canvas).
Final grades for each lab will be computed and entered into the Canvas gradebook at the end of each Unit.