COMP1002/8802 - Lab 9 Worksheet
You must complete this worksheet during your lab/workshop and have your answers for the (marked) parts checked by your tutor before the end of the session. Marks are awarded out of two, where completing only some of the tasks will award partial marks.
Part 1 - TensorFlow Playground
The TensorFlow playground (https://playground.tensorflow.org/) is a website where you can experiment with how neural networks can be trained to model different data patterns. When you first load the website in your browser you should see something similar to the following:
This website will allow you to try different neural network architectures, composed of several artificial neurons separated into layers, on a variety of datasets. Each neuron outputs either 0 or 1, based on the combined (weighted) sum of its inputs. For this worksheet, we are going to be looking at the problem of Classification. This means that we have a collection of training data that is labelled into categories, in this case the dots in our data are labelled as either blue or orange. By looking at how the features of each dot (in this case it’s x and y coordinates) correspond with its colour, our neural network will attempt to predict the colour of any new dots we provide it with. This is the same general idea as when we used the features of passengers on the titanic to determine whether they died or survived, except this time we are using a dots position to try and predict its colour.
Part 2 - Neurons
Let’s start by looking at a very simple problem with a single neuron. Select the Gaussian (bottom left) dataset on the left-hand side of the website and reduce our neural network to a single hidden layer with a single neuron. Also select the “Discretize output” checkbox in the bottom right corner. Click the Play () button in the top left corner to begin training our neural network.
After a short time, our neural network will converge on a model that accurately separates the orange and blue dots. Any dot that falls into the orange or blue areas of the output will be classified accordingly.
Hovering our mouse cursor over the lines in our network reveals their weights, and hovering over the small dot in the bottom left of our neuron reveals its bias. In this example case (your results may differ) we can see that the corresponding weights & bias for our neuron are:
As our network is just a single neuron, we can easily verify our neural network’s output for several example inputs (if neuron fires then our prediction is blue, otherwise orange).
- Input: x = 4, y = 5
|
output = (0.77 x 4) + (0.75 x 5) > 0.039
|
BLUE
|
- Input: x = -2, y = -1
|
neuron = (0.77 x -2) + (0.75 x -1) > 0.039
|
ORANGE
|
- Input = x = 4, y = -4
|
neuron = (0.77 x 4) + (0.75 x -4) > 0.039
|
BLUE (but close to boundary)
|
Try running the same neural network on some of the other datasets. You should see that our single neuron network is not able to accurately classify all the dots. This is because a single neuron with just the x and y coordinates as input is only able to produce a linear line of separation. To accurately classify more complex patterns, we will also need to add additional neurons to our neural network.
Part 3 - Additional Neurons (marked)
While a single neuron is not enough to accurately classify the Circle and Exclusive or datasets (top two datasets) if we add additional neurons then we will obtain a better result. Keeping just a single hidden layer, try adding additional neurons to our network and see if this improves our trained model’s accuracy. Like the final output, a picture is provided for each neuron representing its own predicted output for an input dot. While none of these neurons are individually able to predict more complex patterns on their own, they can be combined within a full neural network to produce a complete classification model.
- Question: How many neurons in a single layer are needed to accurately classify (test loss < 0.03) each of the following datasets:
o Circle (top left)
o Exclusive or (top right)
o Gaussian (bottom left)
Part 4 - Neuron Layers (marked)
We have now successfully created neural network models that can classify dots in the first three datasets (Circle, Exclusive or, and Gaussian) using just a single layer of neurons. However, the fourth spiral dataset requires more neuron layers to achieve an accurate result. Neurons within more complex neural networks are typically grouped into layers, where the outputs from every neuron on the previous layer is passed as input to the neurons in the next layer. Experiment with adding additional layers of neurons to our network and see if you can train a network to accurately classify the spiral dataset.
- Question: How many neurons in total (across multiple layers) are needed to accurately classify (test loss < 0.03) each of the following datasets:
o Spiral (bottom right)
Part 5 -Network Training (marked)
Neural networks are trained using a process called backpropagation. While we aren’t going to be covering the mathematics of how this process works, the general idea is that the edge weights and neuron biases within our network are repeatedly updated to better fit our training data. Each time we update our neural network values this is called an “epoch” and the amount by which we change our network values is based on the “learning rate”. Having a higher learning rate means that we might find a reasonably accurate solution in fewer epochs, but also means that our network changes may be too large to find an optimal solution. A high learning rate could also lead to an unstable network that overfits to the latest training example Finding the right learning rate for your network is a difficult balancing act.
Create a neural network with a single hidden layer containing 4 neurons. Using the default learning rate of 0.03, see how many epochs it takes for the model to find an accurate solution for the circle dataset. You can press the step button next to the play button to advance a single epoch. Reset the network and repeat this process several times to get a range of results.
- Question: Experiment with the following learning rates and comment on how long it takes for the model to converge on an accurate solution (test loss < 0.03):
o 0.003
o 0.03
o 0.3 o 3
- Is there a difference in the shape of the output pattern for each of these learning rates?
Part 6 - Feature Engineering (marked)
As well as the x & y coordinates of the dot, we can also apply feature engineering to introduce augmented/modified versions of these features into our set of inputs. This can help to improve the performance (accuracy and training time) of our neural network.
TensorFlow Playground provides the following augmented features:
- X12 The x coordinate squared.
- X22 The y coordinate squared.
- X1X2 The x coordinate multiplied by the y coordinate.
- Sin(X1) The Sine function of the x coordinate.
- Sin(X2) The Sine function of the y coordinate.
Return your learning rate to 0.03, and experiment with adding or removing different input features and see how this affects your network for different datasets.
- Question: Using these augmented features, how many neurons in a single layer are needed to accurately classify (test loss < 0.03) each of the following datasets:
o Circle (top left)
o Exclusive or (top right)
o Spiral (bottom right)
- Try removing some of the augmented input features to see which ones are most beneficial for accurately classifying each dataset.
Bonus 1 - Additional Neural Network Options
The bonus tasks for this worksheet will focus on experimenting with other neural network options provided by the TensorFlow Playground. A brief description of each option is given below, but you should take some time to experiment with these yourself.
Activation:
The activation function determines what value a neuron fires when it’s threshold (bias) is surpassed. In the lecture slides we said that a neuron outputs 1 when it fires or 0 when it doesn’t. This is called a binary step function, but this is often too simple to model complex patterns. TensorFlow Playground instead provides several different activation functions, including ReLU, Tanh, Sigmoid and Linear. The following website provides an overview of these different activation functions:
https://www.v7labs.com/blog/neural-networks-activation-functions
Noise:
By default, no noise is added to our training data. This makes our pattern boundaries very clear but is also not representative of many real-world datasets. Adding noise to our dataset will add some more variance to our data points, blurring the pattern boundaries.
Ratio of training to test data:
By default, the ratio of training data (that is used to train the neural network) to test data (that is used to evaluate the neural network’s accuracy) is 50/50. It is important that we keep some of our available data separate for testing the network, otherwise we will be unable to tell if our model is overfitting. You should see that if our ratio decreases to 10%, our test loss will be significantly higher. This indicates that the model is overfitting to our limited training data and performing poorly on the test data. This effect is especially obvious for the circle dataset, where our model will often not form the ideal circular pattern.
Regularization:
The purpose of regularization is to reduce overfitting and help generalise the model. Overfitting is when a model works well for the data that it was trained on but performs poorly on test data it hasn’t seen before.
- L1 regularization (aka. Lasso) tries to keep the majority of the edge weights close to 0.
- L2 regularization (aka. Ridge) tries to reduce the combined edge weights of the network. A higher regularization rate will also make the regularization effect stronger.
Batch size:
The batch size indicates how many dots are considered when updating the network values each epoch. A batch size of 1 will only consider a single dot, while a batch size of 10 will consider 10 dots. A larger batch size leads to a faster solution but may get stuck in a local (non-optimal) minimum. A smaller batch size adds more randomness to the training process and leads to a low convergence speed.