COMPSCI 4ML3, Introduction to Machine Learning
Assignment 1, Fall 2023
Due date: Thursday, October 5th, 11pm
Notes. Type your solutions in Latex and upload a single pdf file that includes all your answers
in Avenue. Use Teams to ask/answer questions.
Review (Linear Algebra). A set of k d-dimensional vectors v1, v2, ..., vk ∈ R
d are linearly
dependent if there exists a1, a2, ..., ak ∈ R such that at least one of ai
’s is non-zero and P k
i=1 aivi = ⃗0.
Also, a set of vectors are linearly independent if they are not linearly dependent.
Furthermore, the column rank of a matrix (i.e., the number of linearly independent column vectors)
is equal to its row rank (i.e., the number of linearly independent row vectors) — in fact, this is why
we can call it just the “rank” of the matrix. A k-by-k square matrix is invertible if and only if it is
full rank (i.e., its rank is k). Also, k-by-k matrix A is said to be positive definite if for every u ∈ R
T Au > 0. All positive definite matrices are invertible.
Review (Ordinary Least Squares). In the ordinary least squares problem we are given n data
i ∈ R, and the goal is fitting a line/hyperplane
(represented by a d-dimensional vector W) with the minimum sum of squared errors:
where in the matrix form (on the right side) X is an n-by-d matrix, and Y is an n-dimensional
vector.1. Consider the least-squares setting discussed above and assume (XT X) is invertible. In each of
the following cases, either prove that (Z
TZ) is necessarily invertible, or give a counter example.
(a) [5 points] Z = XT X + 0.1Id×d, where Id×d is a d-by-d identity matrix.
(b) [5 points] We add an arbitrary new column u ∈ R
d
(i.e., a new feature) to X and call it Z
(so Z = [X|u] is an n-by-d + 1 matrix).
(c) [5 points] We add an arbitrary new row v ∈ R
n (i.e., a new data point) to X and call it Z
(so Z = [XT
|v]
T
is an n + 1-by-d matrix).
2. Consider the least squares problem.
(a) [5 points] Assume Rank(X) = n = d and let WLS be the solution of the least squares.
Show XT
(Y − XWLS) = 0.
(b) [5 points] Assume Rank(X) = n < d, and let W be “one of the solutions” of the least
squares minimization problem. Can we always say XT
(Y − XW) = 0? Why?
(c) [5 points] Assume Rank(X) = n = d. Prove that ∥XWLS − Y ∥
3. [20 points] In this question we will use least squares to find the best line (ˆy = ax + b) that fits
a non-linear function, namely f(x) = 2x − 5x
3 + 1. For this, assume that you are given a set
that fits the training data the best when n → ∞. Write down your calculations as well as the
final values for a and b. (Additional notes: the n → ∞ assumption basically means that we are
dealing with an integral rather than a finite summation. If it makes it easier for you, instead of
working with an actual training data you can assume x is uniformly distributed on [0, 1].)
4. [20 points] This question is similar to the previous one, except that you are allowed to use a
program to find the final answer. Assume the input is three dimensional (x1, x2, x3), and the
target function is f(x1, x2, x3) = x1 + 3x2 + 4x3 + 5x1x2 − 5x2x3 + x
a, b, c, d ∈ R such that the hyperplane ˆy = ax1 + bx2 + cx3 + d fits the data the best when x is
uniformly distributed in [1, 2]3
(the least squares solution). Report the values of a, b, c, d, and
include your code in the pdf file for the solutions. You can use the python OLS script that is
provided in Avenue as a starting point.
5. [20 points] In this question we would like to fit a line with zero y-intercept (ˆy = ax) to the
curve y = x
. However, instead of minimizing the sum of squares of errors, we want to minimize
the following objective function: