ASSESSMENT GUIDE
COMM5000
Data Literacy
Vinho Verde
Milestone 2 information
CASESTUDYINFORMATION—Vinho Verde
Business context: In recent years, the growing interest in wine has fuelled the expansion of the wine industry. As a result, companies are investing in new technologies to enhance both wine production and sales. Quality certification plays a vital role in these processes and currently relies heavily on wine tasting by human experts.
Case/Scenario: You consult a winery and help this company to predict or estimate human wine taste preferences at the certification step. Knowing the wine quality will allow the winery to be better positioned to predict available amounts and yearly sales. It will also support the oenologist wine tasting evaluations by potentially improving the quality and speed of their decisions, and improve wine production. Furthermore, similar techniques can help in target marketing by modelling consumer tastes from niche markets. In order to predict wine quality you will use a dataset consisting of 4898 white and 1599 red vinho verde samples from Portugal's northwest region, and the statistical methods covered in this course.
MILESTONE 2: Case Study Project Proposal
Report details
Week 7, Friday, 5PM
20%
Report: This is individual work. Reports will be checked for plagiarism.
Via Moodle course site, combining both Answer Sheet and used dataset file
Description of M2 assessment task
In M1, students investigated all the variables included in the dataset. M2 aims to use hypothesis testing to explore and further investigate from a more quantitative perspective some of the patterns students may have observed in their analysis in M1.
To address the question of whether wine quality can be predicted using any combination of the variables included in the dataset, students are asked in M2 to study whether there are statistically significant differences in alcohol content and wine quality, when considering the two wine types, red and white. This will be achieved by conducting significance hypothesis tests to evaluate the evidence from the data that the mean for red wines is statistically different from white wines. These types of analysis are very helpful to understand the meaning of the calculations performed, and enable students to critically evaluate the usefulness and values of these calculations.
Please note that calculations are required to be done in Excel. This assessment will be marked by the teaching team. Academic Integrity violations will not be tolerated in any shape or form.
Statistical Analysis Required for M2:
The task in M2 is to answer all the questions included in Answer Sheet file: “COMM5000 - Milestone 2 Answer Sheet.xlsx”. This task is in essence about conducting significance hypothesis tests of null hypotheses of equal means between wine types, red and white. The means to be studied are again for 1) wine quality and 2) alcohol content. It is assumed that these samples are independent samples from two normal distributions with unequal variances.
As instructed in the Answer Sheet, students are asked, among other tasks, to:
1. Formulate the null hypothesis and the alternative hypothesis for each test.
2. State the assumptions under the null hypothesis, and consider a test of equal means given by:
where X and Y refer to the white and red groups, and n and m related sample sizes.
The degrees of freedom can be computed by:
Given the large sample sizes in this case, the test statistic above is normally distributed as N(0,1).
3. State the conclusion of the tests using the p-value method. Use a 1% and 5% significance level to illustrate the test conclusions.
4. To exclusively use the Answer Sheet Excel file, which contains the questions to be answered and will contain the students’ answers.
5. To copy the Answer Sheet Excel file into the Excel worksheet that contains your data, so that both data and answers are included in the same Excel file.
Structure of the report
Please follow the instructions contained in the Answer Sheet file: “COMM5000 - Milestone 2 Answer Sheet.xlsx”. Answers will need to be provided in the allocated Excel cells in the B column. Please note that other cells are not accessible to ensure that answers are exclusively provided in the correct Excel cells and no changes to the file are introduced.
Submission instructions
• Copy the Excel sheet with your answer (before or after providing your answers) into the data file that you have been using also for M1. In doing so you will have a tab with your data, and the answer sheet in another tab.
This will ensure that data and derived calculations/answers are all included in the same Excel file. The file containing both data and answers in two different tabs is the one to be submitted. See below how your final Excel file should look like:
• Before submitting, check in Excel -> File -> Print that no more that a combined, grand total of 400 pages could be printed. The randomised dataset as provided to you would result in some 278 pages (as shown in screenshot) and the answer sheet would result in 5 printed pages (for a combined total of 283 pages). If you add columns or other information to your Excel, please ensure that less than 400 pages in total (data + answer sheet) would be printed. You can check how many pages would be printed for each tab using the commands Excel -> File -> Print.
• Via Moodle course site.UNSW Business School