代写COMPSCI 4ML3 Assignment 1

2023-10-14 代写COMPSCI 4ML3 Assignment 1

COMPSCI 4ML3, Introduction to Machine Learning

Assignment 1, Fall 2023


Due date: Thursday, October 5th, 11pm

Notes. Type your solutions in Latex and upload a single pdf file that includes all your answers

in Avenue. Use Teams to ask/answer questions.

Review (Linear Algebra). A set of k d-dimensional vectors v1, v2, ..., vk ∈ R

d are linearly

dependent if there exists a1, a2, ..., ak ∈ R such that at least one of ai

’s is non-zero and P k

i=1 aivi = ⃗0.

Also, a set of vectors are linearly independent if they are not linearly dependent.

Furthermore, the column rank of a matrix (i.e., the number of linearly independent column vectors)

is equal to its row rank (i.e., the number of linearly independent row vectors) — in fact, this is why

we can call it just the “rank” of the matrix. A k-by-k square matrix is invertible if and only if it is

full rank (i.e., its rank is k). Also, k-by-k matrix A is said to be positive definite if for every u ∈ R

T Au > 0. All positive definite matrices are invertible.

Review (Ordinary Least Squares). In the ordinary least squares problem we are given n data

i ∈ R, and the goal is fitting a line/hyperplane

(represented by a d-dimensional vector W) with the minimum sum of squared errors:

where in the matrix form (on the right side) X is an n-by-d matrix, and Y is an n-dimensional

vector.1. Consider the least-squares setting discussed above and assume (XT X) is invertible. In each of

the following cases, either prove that (Z

TZ) is necessarily invertible, or give a counter example.

(a) [5 points] Z = XT X + 0.1Id×d, where Id×d is a d-by-d identity matrix.

(b) [5 points] We add an arbitrary new column u ∈ R

d

(i.e., a new feature) to X and call it Z

(so Z = [X|u] is an n-by-d + 1 matrix).

(c) [5 points] We add an arbitrary new row v ∈ R

n (i.e., a new data point) to X and call it Z

(so Z = [XT

|v]

T

is an n + 1-by-d matrix).

2. Consider the least squares problem.

(a) [5 points] Assume Rank(X) = n = d and let WLS be the solution of the least squares.

Show XT

(Y − XWLS) = 0.

(b) [5 points] Assume Rank(X) = n < d, and let W be “one of the solutions” of the least

squares minimization problem. Can we always say XT

(Y − XW) = 0? Why?

(c) [5 points] Assume Rank(X) = n = d. Prove that ∥XWLS − Y ∥

3. [20 points] In this question we will use least squares to find the best line (ˆy = ax + b) that fits

a non-linear function, namely f(x) = 2x − 5x

3 + 1. For this, assume that you are given a set

that fits the training data the best when n → ∞. Write down your calculations as well as the

final values for a and b. (Additional notes: the n → ∞ assumption basically means that we are

dealing with an integral rather than a finite summation. If it makes it easier for you, instead of

working with an actual training data you can assume x is uniformly distributed on [0, 1].)

4. [20 points] This question is similar to the previous one, except that you are allowed to use a

program to find the final answer. Assume the input is three dimensional (x1, x2, x3), and the

target function is f(x1, x2, x3) = x1 + 3x2 + 4x3 + 5x1x2 − 5x2x3 + x

a, b, c, d ∈ R such that the hyperplane ˆy = ax1 + bx2 + cx3 + d fits the data the best when x is

uniformly distributed in [1, 2]3

(the least squares solution). Report the values of a, b, c, d, and

include your code in the pdf file for the solutions. You can use the python OLS script that is

provided in Avenue as a starting point.

5. [20 points] In this question we would like to fit a line with zero y-intercept (ˆy = ax) to the

curve y = x

. However, instead of minimizing the sum of squares of errors, we want to minimize

the following objective function: