代写DSCI 550: Data Science at Scale Homework 2 Spring 2024帮做R编程

2024-09-10 代写DSCI 550: Data Science at Scale Homework 2 Spring 2024帮做R编程

DSCI 550: Data Science at Scale

Homework 2, Spring 2024

1. (30 pts) You have the following training dataset about houses. We want to classify if a house is a mid-class family home or not based on the price and size of a house.

1) (10 pts) Define the most specific hypothesis S. Use algebraic expression to define the hypothesis.

2) (10 pts) Define the most general hypothesis G. Use algebraic expression to define the hypothesis.

3) (5 pts) Assuming the most general hypothesis, give an answer if this case is FP, FN, TP, TN.

a. P= 330 and S=220 and Mid Family = Yes

b. P= 500 and S= 550 and Mid Family = Yes

c. P= 600 and S=400 and Mid Family = No

d. P= 600 and S=550 and Mid Family = No

4) (5 pts) Assuming a model, we got the following results with a test dataset. Calculate Recall, Precision, Accuracy, False Positive Rate, and F1 Score.

2. (30 pts) Line separator Question In the dataset “Grade.xlsx”, students had pass/fail grades based on their HW, midterm, and final score. Using this as a training dataset, you are supposed to make a binary classification model (i.e., a simple linear model) to decide if a student with a certain score(s) would pass or fail. Especially, consider only two features (out of three) in your modeling. What would be your model and explain why your model is the best? What is the accuracy of your model with the training dataset? This question is not asking a specific classification algorithm (do not use any program or tool) but requiring a conceptual discussion to understand classification. So plain explanation with supporting numbers will be fine as the answer.

3. (30 pts) A Priori: You have the following 20 transactions with the items a - g.

Suppose that the minimum support requirement is 20%. Using the Apriori algorithm, find all frequent item sets. Show your work step by step.

4. (10 pts) A Priori: Suppose that the itemset {beer, cheese, eggs} has 30% support in the DB, {beer, cheese} has 40%, {beer, eggs} has 30%, {cheese, eggs} has 50%, and each of beer, cheese, and eggs alone has 50% support. When the minimum confidence level for a strong association rule is 70%, answer if the following rule is strong.

1) (5 pts) IF one buys {Beer and Cheese}, THEN also buys {Eggs}.

2) (5 pts) IF one buys {Beer}, THEN also buys {Cheese and Eggs}.