Department of Statistics
STATS 779:
Professional Skills for Statisticians
Test: May 29, 2019
4:00 pm—8:00 pm.
INSTRUCTIONS
* Total marks = 70.
* Attempt all questions.
* Note: Some questions are open-ended and it may not be clear how extensive your answer should be. Do not write long answers to these questions. You should be able to answer any question of this type in a few paragraphs at most, or within half a page.
1 Write complete LATEX code to reproduce the slides given in Figure 1.
Figure 1: Beamer slides.
Tips:
• The slides use Warsaw and rectangles for presentation and inner themes, respectively.
• Use the \institute command to add institution details.
• Specify the word fragile as a frame option to display verbatim text in a frame.
• Create a Verbatim environment using the fancyvrb package to display the LATEX code shown in the right-most block of Figure 1. [12 marks]
2 Write an R code chunk in a knitr document to reproduce Figure 2 (including the caption). Note: Both plots should be displayed next to each other in the compiled document. [5 marks]
Figure 2: Histograms of the speed and dist variables in the cars dataset.
3 Write the YAML header, R code chunk and inline codes to reproduce the output file given in Figure 3.
Note: course.df is a data set provided in the s20x package.
The columns in the data frame are shown in the following output:
' data.frame ' : 146 obs . of 15 variables:
$ Grade : Factor w/ 4 levels "A","B","C","D": 3 2 1 1 4 1 4 4 3 3 . . .
$ Pass : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 1 1 2 2 . . .
$ Exam : int 42 58 81 86 35 72 42 25 36 48 . . .
$ Degree : Factor w/ 4 levels "BA","BCom","BSc", . .: 3 2 4 4 4 2 3 1 2 2 . . .
$ Gender : Factor w/ 2 levels "Female","Male": 2 1 1 1 2 1 1 2 1 1 . . .
$ Attend : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 2 1 2 2 . . .
$ Assign : num 17 .2 17 .2 17 .2 19 .6 8 18.4 14.4 8.8 17 .6 12 . . .
$ Test : num 9 .1 13 .6 14 .5 19 .1 8 .2 12 .7 7.3 10 .9 10 .9 9 .1 . . .
$ B : int 5 12 14 15 4 15 4 3 10 8 . . .
$ C : int 13 12 17 17 1 17 14 0 4 8 . . .
$ MC : int 12 17 25 27 15 20 12 11 11 16 . . .
$ Colour : Factor w/ 4 levels "Blue","Green", . .: 1 4 1 4 1 1 2 4 2 3 . . .
$ Stage1 : Factor w/ 3 levels "A","B","C": 3 1 1 1 3 1 3 3 2 2 . . .
$ Years .Since: num 2 .5 2 3 0 3 1 .5 0 .5 1 .5 2 .5 4 . . .
$ Repeat : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 1 1 . . .
Figure 3: R markdown output.
[8 marks]
4 Write bibTEX entries to be included in a .bib file to produce the following bibliography:
References
K. Aas and I. Hobæk Haff. The generalised hyperbolic skew Student’s t-distribution. Journal of Financial Econometrics, 4(2):275–309, 2006.
A. Azzalini. R package sn: The skew-normal and skew-t distributions (version 0.4-2). Università di Padova, Italia, 2006. URL http://azzalini.stat.unipd.it/SN.
D. J. Bartholomew. Stochastic Models for Social Processes. Wiley, London, 2nd edition, 1973. [11 marks]
5 The ToothGrowth dataset comes with base R. It is described as follows:
The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
The columns in the data frame are shown in the following output:
> str ( ToothGrowth )
' data . frame. ' : 60 obs . of 3 variables :
$ len : num 4 . 2 11 . 5 7 . 3 5 . 8 6 .4 10 11 . 2 11 . 2 5 . 2 7 . . .
$ supp : Factor w / 2 levels " OJ " ," VC " : 2 2 2 2 2 2 2 2 2 2 . . .
$ dose : num 0 . 5 0 . 5 0 . 5 0 . 5 0 . 5 0 . 5 0 . 5 0 . 5 0 . 5 0 . 5 . . .
Write R code to produce Figure 4 using ggplot2.
Note: To centre a title in ggplot2 use
theme(plot.title = element_text(hjust = 0.5)) [10 marks]
6 For each of the following SELECT statement pairs, explain why the results are either different or the same.
Figure 4: Boxplots of Tooth Growth
[8 marks]
7 Details of passengers and crew who sailed on the Titanic are contained in the .csv file titanic .csv. The column names and format of the column entries is shown in Figure 5
Figure 5: Top of titanic .csv
The passenger name can be very long, up to 90 characters because wives’ names include their husband’s name.
The fare is in pounds, to 4 decimal places. The orginal fare was in pounds, shillings and pence which explains the strange fractions in the fare values.
Note that underscores are permitted in column names in MySQL although it is generally rec- ommended not to use underscores. You may use them in this example.
a Write MySQL code to create a table called titanic for this data set. Do not create an automatically incremented variable as the primary key for the data. Instead specify the passenger name as the primary key.
b Write MySQL code to read the data from titanic .csv into the table titanic.
c Write MySQL code to produce a table showing the average fare by passenger class (Pclass), rounded to 1 decimal place.
d Write MySQL code to add a column to the titanic table which is of type DATE, named DateOfBirth, which can be NULL.
e Mr . Owen Harris Braund was born on March 30, 1880.
Write MySQL code to enter his date of birth in the column DateOfBirth. [16 marks]