HUDM 5026 - Introduction to Data Analysis and Graphics in R
HW 05 - Practice Exam
Instructions.
1. We have the computer lab booked for the class period. The exam will end at the usual class ending time.
2. No outside materials are allowed.
3. The tidyverse and mice packages are installed on your machine and should be acces-sible through the library() command.
4. Open RStudio and create and save an R code file with name that is your last name, fol-lowed by an underscore, followed by your first name. For me this would be Keller_Bryan.R. This is where you will save your work. When you finish, you will upload this data file. It will be the only product from your work today, so make sure you are saving it frequently.
5. When you are finished, save the final version of your file. Then <upload instructions TBD>.
The Exam.
1. Locate the file mtcars.Rdata on the desktop. Note that I have changed the data set a so that it contains some missing values, denoted as numeric values of -999.
(a) Import the data set as a data frame. and call it dat. Make sure to convert the -999 missing values to NA system missing values and note that the automobile names in the first column of the .csv file should be the row names in the R data frame.
(b) Report the number of rows and columns in the data frame.
(c) Briefly describe the variables and note the level of measurement (numeric, ordered categorical, or unordered categorical) for each variable. The level of measurement should be based on the description. That is, I am not simply asking you to report the class of each variable as stored in R. Instead, think about the meaning of each variable.
2. Load package mice.
(a) Use the md.pattern() function in package mice to quantify the extent of missing data. Use an argument to ask that the variable names be printed vertically.
(b) Copy and paste the table of output from the call to md.pattern() and describe the meaning of the marginal counts on the left, right, and bottom margins in the context of the variables and cases in the data frame.
3. Summary statistics. Use all available observed data when calculating summary statis-tics. Round results to two decimal places.
(a) Calculate means and standard deviations for the following variables: mpg, cyl, hp, wt, and am.
(b) Calculate the pairwise Pearson correlations between mpg, cyl, hp, wt, and am. What does the estimate for the pairwise correlation between automobile weight and horsepower suggest about the relationship between those two variables?
(c) Calculate means and standard deviations for the variables mpg, cyl, hp, and wt by transmission status (automatic or manual).
(d) Create a table of means and SDs called tab1 for mpg, cyl, hp, and wt by trans-mission status.
4. Plotting. Label axes clearly.
(a) Create parallel boxplots of automobile weight by transmission type and discuss the results.
(b) Create a scatterplot of automobile weight on the horizontal axis by fuel efficiency in miles per gallon on the vertical axis. What can you say about the strength of the linear relationship between weight and fuel efficiency based on the scatterplot and the value of the estimate for the pairwise correlation?
(c) Create parallel boxplots of fuel efficiency in miles per gallon by transmission type. Also report the value of the two-sample t-test, testing for a difference in average fuel efficiency across transmission types. What can you say about the association between fuel efficiency and transmission type based on the boxplots and the results of the t-test?