Part one: Project Data for Neural Networks
In this part of the course homework, you will work with predictions on a data set from the 2016 US Presidential election. This part of the homework requires some work in RStudio, use this document to answer questions on what you discover.
Q1. How well did your model predict the election results? (Hint: You need to split data into two sets: training/testing. Do the neuralnet(). Give the train result, and analysis the result based on confusion matrix or other metrics.)
Q2. Do you think your model will generalize well to new data? Why or why not? (Hint: show the test result. You may see some change in result compared with train set. Provide some words on the change.)
Q3. What could be done to improve the model?
Q4. In the space below, describe your neural network structure. I.e., how many hidden layers were in the network? How many nodes in each hidden layer? Which activation function did you use? Which independent variables did you include?
Part two: Use H2O and LIME for Neural Network Predictions
In this part of the course homework, you will use H2O and LIME for neural network predictions on a data set predicting the sale prices of houses. This part of the homework requires some work in RStudio, use this document to record your accuracy measures.
NOTE: Because of the amount of data in this assignment, you may experience delays while R runs the computations. If you encounter warnings that the H20 cluster node is behaving slowly, paste ‘h2o.removeAll()’ into the Console and run it to free up as much memory as you can.
Q1. How well did your model predict the house prices?
Q2. Do you think your model will generalize well to new data? Why or why not? (Hint: show the test result. You may see some change in result compared with train set. Provide some words on the change.)
Q3. Which variables ended up being the most important? (Hint: show the importance variables resulted from H2O.)
Q4. What could be done to improve the model?
Q5. In the space below, describe your neural network structure. I.e., which independent variables did you use? How many hidden layers were in the network? How many nodes were in each hidden layer? Which activation function did you use? Did you use an adaptive learning rate? How many epochs did the training algorithm run for?