Math2203: Linear Models and Design of Experiments
Assignment 1 (worth 10%)
Question 1: The following table gives measurements of the area (x in km 2 ) and pH level (y) of 13 lakes in Ontario, Canada. Use the sixth (s6) and seventh (s7) numerical digit from your student number ( that you completed above)
Area(x )
|
33
|
16 1
|
18 9
|
14 9
|
47
|
17 0
|
35 2
|
18 7
|
76
|
52
|
17 5
|
53
|
20 0
|
pH( y)
|
6. 6
|
Max( s 6, s7 )
|
6. 5
|
6. 9
|
7. 1
|
7. 5
|
8. 8
|
6. 4
|
5. 9
|
6. 7
|
7. 1
|
6. 6
|
8. 0
|
(a) Sketch the scatterplot of y vs x and comment on the plot . (use
R/Rstudio/SAS or hand calculations ) .
(b) Use the Principle of Least Squares to fit the simple linear regression model to the data. Superimpose this line of best fit on the scatter plot in part (a) .
(use R/Rstudio/SAS or hand calculations) .
(c) Perform an A NOVA test to deduce whether there is a linear relationship between area and pH level. (use R/Rstudio/SAS or hand calculations) .
(d) Perform all appropriate residual checks using SAS or R/Rstudio and clearly explain if any of the model assumptions have been violated.
(e) Another lake in the same region was found to have an area of 2050 km 2 .
Predict its pH level and find a 99% confidence interval of this prediction.
(use R/Rstudio/SAS or hand calculations) .
(3+4 + 8 + 5 + 3=23 points)
Question 2: All calculations , in this question, needs to be done by applying the
formulas from the course and not use any statistical software functions. You may use a calculator or excel spreadsheet to apply the formulas. Note that all the used formulas and the calculations should be clearly shown. If you choose to use excel for the calculations, you need to upload that as well with the cells of the spreadsheet to include the formulas you used. The following data are provided:
a/a
|
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
X
|
|
35.3
|
29.7
|
30.8
|
58.8
|
61.4
|
71.3
|
74.4
|
76.7
|
70.7
|
57.5
|
46.4
|
28.9
|
28.1
|
Y
|
|
10.98
|
11.13
|
12.51
|
8.40
|
9.27
|
8.73
|
6.36
|
8.50
|
7.82
|
9.14
|
8.24
|
12.19
|
11.88
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
a/a
|
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
21
|
22
|
23
|
24
|
25
|
|
X
|
|
39.1
|
46.8
|
48.5
|
59.3
|
70
|
70
|
74.5
|
72.1
|
58.1
|
44.6
|
33.4
|
28.6
|
|
Y
|
|
9.57
|
10.94
|
9.58
|
10.09
|
8.11
|
6.83
|
8.88
|
7.68
|
8.47
|
8.86
|
10.36
|
11.08
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Here, X represent the steam in pounds per months and Y is the mean atmosphere temperature measured in Fahrenheit .
Calculate the followings:
a) Fit a linear regression model and give the least square estimates for the constant and the slope.
b) Calculate the residuals for each of the 25 observations.
c) Make the A NOVA table and complete it by performing all the required calculations. What is the A NOVA table used for?
d) Find the coefficient of determination and the correlation c oefficient . Explain its value .
e) Calculate the std for the error, the std for the slope and the std for the constant .
f) Test whether the slope and the constant are significant . State the needed hypotheses and explain your results.
g) Construct the confidence intervals for the slope and for the constant .
(5+4 + 7 +4 + 5 +4 +4=33 points)
Question 3:
For the data provided in the above question 2 do the following using a statistical software (either R|Rstudio or SAS will do) . Include the used code for R|SAS with all the details. If you use R an R markdown is recommended. Note that you will also need to add comments and outputs to justify the answers.
1. Generate a scatterplot of the data and comment on it .
2. Answer all the queries a)-g) of Question 2 and comment on the derived outputs.
3. Using the generated residuals, test the assumption behind the simple linear regression.
(3+ 15 + 6=24 points)
Question 4 :
Select one of the questions above (question 1 or question 2 or question 3) and do the followings:
1. Generate and upload a few slides to present the solution of this question. Make sure you justify the method and the steps you followed to solve this question. Make sure you cover and explain all your steps clearly.
2. Generate and upload a short video ( up to a max of 5 min) showing you
presenting the slides , that are generated in Q4.1) with the solution of the chosen question. Make sure you cover and clearly explain all your steps towards the solution .
(10+10=20 points)