BU.510.650
Data Analytics
Group Project Assignment
1 Group Project
Each group contains x students. Each group identifies a problem of interest and collects relevant data. The collected data set should contain at least 500 observations and 6 variables. The task is to develop a series of research hypotheses based on theory or past empirical evidences and then apply some of the techniques covered in class (or not covered) on such data for testing.
2 Data Sets
Students are encouraged to collect data by themselves. Students can collect data from any sources such as the following, but not limit to,
(1) UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/datasets.html
(2) Kaggle: https://www.kaggle.com/datasets
(3) World Bank: http://data.worldbank.org/
(4) U.S. open government data: http://www.data.gov/
(5) US Census Bureau: http://www.census.gov/main/www/access.html
(6) Data from your work
(7) Or others
3 Analysis and Report
Students should work closely in a group on data collection, data analysis and result interpretation, report writing, etc. In the project report, students are supposed to describe the results and con- clusion of their analysis. Keep in mind that plots, tables and other visual representations of data are useful in conveying your conclusions. In addition, you may want to include the following parts in your reports.
(1) Questions/Hypotheses. Write one or multiple questions or hypotheses you want to explore with the data sets. After each question, state your expected answers, which may be different from your data analysis because you have not yet analyzed the data.
(2) Data Description. Describe the data sets. What is the data, e.g., variables and results? How was the data collected? Briefly summarize the data. Provide the URL link if available.
(3) Methodologies. Write a complete, clear description of the analysis you performed. This should be sufficient for someone else to write an R program to reproduce your results. It should also likely be helpful to people who read your code later. This section should tie your computations to your questions/hypotheses, indicating exactly what results would lead you to what conclusion. You may want to provide the key statistics, e.g., t-statistic, z-statistic, p-values, R2 and the adjusted R2, etc.
(4) Results and Conclusion. Discuss your results. Focus in particular on the results that are most interesting, surprising, or important. Discuss the consequences or implications. Interpret the results: if the answers are unexpected, then see whether you can find an explanation for them, such as an external factor that your analysis did not account for. You may also want to make prediction for new scenarios.
(5) Appendix. Put plots, tables, technical details or other results in appendix if necessary. This part is optional.
You may want to have a title for your project report. At the beginning, use one sentence to summarize each group member’s contribution.
4 Present Presentation and Competition
All group members are encouraged to present their projects in class. The presentation for each group should be no longer than 10 minutes. It is encouraged to use slides (e.g., MS PowerPoint, etc). The slide deck should summarize the main points of your project, including motivation, research questions, and results.
During the presentations, all students from not-presenting groups, including the instructor and perhaps teaching assistants, should actively ask questions, which is considered part of class partic- ipation. Each member of the presenting group, not only the presenters, can answer the questions or give comments.
All students evaluate other groups’ presentation. The evaluation link will be sent out before the presentation starts. The peer evaluations will be used to choose winning team for the project competition. More details will be shared later.
5 Project Submission
Each group submits only once. Your submission should include a project report, data set, R script. and the slides. It is suggested that the project reports should be 6 – 10 pages (1.5 space) long, excluding appendix.
6 Important Dates
(1) 11:59pm of Two Days after Class Day of Session 6: Each team should submit a short paragraph via Course Website that briefly describes the collected data sets. You may want to include a URL link if available.
(2) 11:59pm Before Class Day of Session 7: Please submit your project via Course Website including everything, e.g., report and presentation slides, data sets and R scripts. If your data set is too big or sensitive, a sample subset should be fine. (For example, if you have Session 8 class on December 22, this part is due at 11:59pm December 21.)
(3) Class Day of Session 7: In-Class Project Presentation and Competition.