CMSC 421代写、Python程序设计代做

2023-09-26 CMSC 421代写、Python程序设计代做
CMSC 421 Assignment OneNeural Networks and OptimizationSeptember 12, 2023General Instructions. Please submit TWO (2) files to ELMS:(1) a PDF file that is the report of your experimental results and answers to the questions.(2) a codebase submission in form of a zip file including only the code folders/files you modified andthe Questions folder. Please do not submit the Data Folder we provided. The code should containyour implementations of the experiments and code for producing visualizations of the results.The project is due at 11:59 pm on September 26 (Monday), 2023.Please read through this document before starting your implementation and experiments. Your scorewill be mostly dependent on the completion of experiments, the effectiveness of the reported results,visualizations, the consistency between the experimental results and analysis, and the clarity of thereport. Neatness and clarity count! Good visualization helps!As you would need to use pytorch for the second half of the programming assignment ConvolutionalNeural Networks - 15 Points, We have included links to some tutorials and documentations to helpyou get started with pytorch:• Official Pytorch Documentation• Quickstart Guide• Tensors• Data Loading• Building models in PytorchImplementation DetailsFor each problem, you’ll need to code both the training and application phases of the neural network.During training, you’ll adjust the network’s weights and biases using gradient descent. Use a singleparameter, η, to control the step size during gradient descent. The updated weights and biases will becalculated as the old values minus the gradient multiplied by the step size.We will be providing code snippets and datasets for some parts of the assignment. You will be requiredto read the comments in the code file and fill in the missing pieces in the code files to correctly executethese files. Please ensure that you are read through all the code files we provide. These will be availablein the CMSC421 - Fall2023 GitHub repository.1Part 1: Programming Task - (50 Points)ObjectiveThe goal of this assignment is to build a neural network from scratch, focusing on implementing thebackpropagation algorithm. You’ll apply your neural network to simple, synthetic datasets to gainhands-on experience in tuning network parameters.Language and LibrariesPython is mandatory for this assignment. Use numpy for all linear algebra operations. Do not usemachine learning libraries like PyTorch or TensorFlow for Questions 1,2 & 3; only numpy, matplotlib,and Python built-in libraries are permitted.1 Simple Linear Regression Model - (10 Points)1.1 Network Architecture• The network consists of an input layer, a hidden layer with one unit, a bias layer, and an outputlayer with one unit.• The output is a linear combination of the input, represented as a1 = Xw0 + a0 + b1.1.2 Loss FunctionUse a regression loss for training, defined as12Xni=1(yi − a1(xi))21.3 ImplementationUsing the template_for_solitions file, write code to train this network and apply it to data on both1D data as q1_a and on higher dimensional data as q1_b.• Data Preparation: Use the q1_<a/b> function from the Data.generator module to generatetraining and testing data. The data module has both a and b so use the appropriate functioncall to fetch the right data for each experiment.• Network Setup: Use the net_setup method in the Trainer class to initialize the network, losslayer, and optimizer.• Training: Use the train method in the Trainer class to train the network. Plot the trainingloss over iterations.• Testing: Use the test data to evaluate the model’s performance. Plot the actual vs. predictedvalues and compute evaluation metrics.Tests and Experiments1.4 Hyperparameters• The main hyperparameters are the step size (η) and the number of gradient descent iterations.• You may also have implicit hyperparameters like weight and bias initialization.Hyperparameter TuningDiscuss the difficulty level in finding an appropriate set of hyperparameters.22 A Shallow Network - (10 Points)The goal of this assignment is to implement a fully connected neural network with a single hiddenlayer and a ReLU (Rectified Linear Unit) activation function. The network should be flexible enoughto accommodate any number of units in the hidden layer and any size of input, while having just oneoutput unit.2.1 Network ArchitectureThe network consists of an input layer, a hidden layer with one unit, a bias layer, and an output layerwith one unit.• Input Layer: a01, a02, . . . , a0d• Hidden Layer: z1j =Pdk=1 Xw1k a0k + b1j• ReLU Activation: a1j = max(0, z1j)• Output Layer: a2 =Pdk=1 Xw2k a1k + b22.2 Loss FunctionContinue to use a regression loss for training the network. You can continue to use a regression lossin training the network defined asXni=112(yi − a11(xi))22.3 ImplementationUsing the template_for_solitions file, write code to train this network and apply it to data on both1D data as q2_a.py and on higher dimensional data as q2_b.py.• Data Preparation: Use the q2_<a/b> function from the Data.generator module to generatetraining and testing data. The data module has both a and b so use the appropriate functioncall to fetch the right data for each experiment.• Network Setup: Use the net_setup method in the Trainer class to initialize the network, losslayer, and optimizer.• Training: Use the train method in the Trainer class to train the network. Plot the trainingloss over iterations.• Testing: Use the test data to evaluate the model’s performance. Plot the actual vs. predictedvalues and compute evaluation metrics.Tests and Experiments2.4 HyperparametersYou now have an additional hyperparameter: the number of hidden units.Hyperparameter Tuning:• Discuss the difficulty in finding an appropriate set of hyperparameters.• Compare the difficulty level between solving the 1D problem and the higher-dimensional problem.33 General Deep Learning - (15 Points)The goal of this section of the assignment is to write your neural network to handle fully-connectednetworks of arbitrary depth. It will be just like the network in Problem 2, but with more layers. Eachlayer will use a ReLU activation function, except for the final layer.Tests and Experiments• Test your network with the same training data that you used in Problem 2 A Shallow Network -(10 Points), using both 1D and higher dimensional data. Experiment with using 3 and 5 hiddenlayers. Evaluate the accuracy of your solutions in the same way as Problem 2 A Shallow Network- (10 Points).• Conduct and report on experiments to determine whether the depth of a network has any significant effect on how quickly your network can converge to a good solution. Include at least oneplot to justify your conclusions.Again ensure your files are saved as q3_a.py and q3_b.py.EXTRA CREDIT (EC): - Cross Entropy Loss (10 Points) Modify your network GeneralDeep Learning - (15 Points) in to perform classification tasks using a cross-entropy loss and a logisticactivation function in the output layer.If you are submitting the EC save the code files as qec_a.py and qec_b.py.3.1 Network Architecture• Input Layer: Arbitrary size• Hidden Layers: ReLU activation, arbitrary depth• Output Layer: Logistic activation function defined as aL1 =11+e−zL13.2 Loss FunctionUse a cross-entropy loss defined as:−Xni=1yilog(aL1(xi)) + (1 − yi)log(1 − aL1(xi))Here, yiis assumed to be a binary value (0 or 1).3.3 Note on Numerical StabilityBe cautious when exponentiating numbers in the sigmoid function to avoid overflow. Utilize np.maximumand np.minimum for a concise implementation.Tests and Experiments3.4 Test Scenarios1. 1D Data Tests:• Linearly Separable Data:– Vary the margin between points and the number of layers.– Investigate the difficulty in finding hyperparameters based on the margin.– Examine the speed of convergence based on the margin. Include plots.• Non-Linearly Separable Data:– Note the differences you observe when the data is not linearly separable.42. Higher-Dimensional Data Tests:• Repeat the experiments with higher-dimensional data.• Use both linearly separable and non-linearly separable data sets.• Include data to support your conclusions.54 Convolutional Neural Networks - 15 PointsIn this Section, you are required to implement a Convolutional Neural Network (CNN) using PyTorchto classify images from the CINIC-10 dataset provided.RequirementsYour CNN model should meet the following criteria:(A) Utilize dropout for regularization. Mathematically, dropout sets a fraction p of the input unitsto 0 at each update during training time, which helps to prevent overfitting.(B) Be trained using either the RMSprop and ADAM optimizer separately. The update rule forRMSprop is given by:θt+1 = θt −η√vt + · gtwhere θ are the parameters, η is the learning rate, vt is the moving average of the squaredgradient,  is a smoothing term to avoid division by zero, and gt is the gradient.For ADAM, the update rule is:θt+1 = θt −η · mˆ t √vˆt + where mˆ t and vˆt are bias-corrected estimates of the first and second moment of the gradients.Report on how each optimizer performed.(C) Include at least 3 convolutional layers and 2 fully connected layers. The convolution operationcan be represented as:(f ∗ g)(t) = Xτf(τ ) · g(t − τ )(D) Use wandb for visualization of the training loss L, which could be the cross-entropy loss forclassification:L = −Xiyilog(ˆyi)Experimental ResultsIn addition to reporting the Test Accuracy and plotting the figure of Training Loss over iterations, thefollowing experimental results should also be reported for a comprehensive evaluation of the model’sperformance:1. Validation Accuracy and Loss: Monitor and report the accuracy and loss on a separatevalidation set to assess the model’s generalization capability.2. Confusion Matrix: Include a confusion matrix to identify which classes the model is havingdifficulty distinguishing between.3. Precision, Recall, and F1-Score: Calculate and report these metrics to provide a morenuanced view of the model’s performance. The F1-Score is the harmonic mean of Precision andRecall and is defined as:F1 = 2 ×Precision × RecallPrecision + Recall4. Model Size: Report the number of parameters and the memory footprint of the model.5. Hyperparameter Tuning: If hyperparameter tuning is performed, report the performanceunder different hyperparameter settings, such as learning rate, batch size, etc.6. Class-wise Accuracy: Report the accuracy for each individual class to show how well themodel performs on different categories.6Part 2: Theoretical Questions - (50 Points + 3 Bonus Points)1. Please answer the following questions about the activation function: - (9 Points)(A) Why do we need activation functions in neural networks? (1 points)(B) Write down the formula of the Sigmoid function and its derivative. What are the pros and consof using the Sigmoid function in neural networks? (4 points)(C) Write down the formula of the ReLU function and its derivative. What are the pros and cons ofusing the ReLU function in neural networks? (4 points)2. When we optimize the neural networks, we usually use gradient descent to updatethe weights of neural networks. To obtain well-trained neural networks, one of the mostimportant hyperparameters is the learning rate. Please answer the following questionsabout learning rate: - (6 Points)(A) What is the role of the learning rate in the gradient descent algorithm? (2 points)(B) What happens to the neural network if the Learning Rate is too low or too high? (4 points)3. After we train a neural network, we need to evaluate the model performance by determining if the model is underfitting or overfitting. Please answer the following questionsabout underfitting or overfitting: - (12 Points)(A) Explain the concept of underfitting and overfitting in your own words. And explain how todetermine whether a model is overfitting or underfitting based on the model performance on thetraining set and validation set. (4 points)(B) Please write down four methods that can be used to prevent the overfitting of a neural network.(4 points)(C) Please write down four methods that can be used to prevent the underfitting of a neural network.(4 points)4. Computer Vision(CV) and Natural Language Processing(NLP) are two primary application areas of neural networks. In CV areas, CNN models are often used to extractinformation from images and videos, while RNN and Transformer are often used in NLPareas to handle text data. - (9 Points + 3 Bonus Points)(A) The key components of a CNN architecture include convolutional layers, pooling layers, and fullyconnected layers. Provide a brief description of the function of each component. (4.5 points)(B) Explain the concept of Hidden State, Time Steps and Weight Sharing in the design of RNN. (4.5points)(C) Bonus Question: Batch Normalization (BN) is important in real-world practice. Please describewhat BN is doing and explain why do we need BN in neural networks. (3 points)5. Convolutional to Multi-layer Perceptron - (14 Points)A convolution operation is a linear operation, and therefore convolutional layers can be representedin the form of matrix multiplication, or in other words, represented by multi-layer perceptron. Moreprecisely, if we denote the convolution operation as c(x, θw, θb, γ), where θw are the filter weights, θbare the filter biases, and γ are the padding and stride parameters, we want to convert the filters to aweight matrix so thatflatten(c(x, θw, θb, γ)) = Wflatten(x) + b, (1)where flatten(·) takes in a tensor of size (d1, d2, d3) and outputs a 1-D vector of size (d1×d2×d3). For example, flatten(F ilter1) = (i1,1, i1,2, i1,3, i2,1, i2,2, i2,3, i3,1, i3,2, i3,3, j1,1, j1,2, j1,3, j2,1, j2,2, j2,3, j3,1, j3,2, j3,3)The converted weights and biases W and b depend on the convolution filters θw, θb and also γ (paddingsand strides).Suppose the input is a 2 × 2 × 3 (C × H × W) image, and we have a convolutional layer withtwo filters as shown in Figure 1, where the filter size is 3 × 3, the padding is 1 (filled with zeros)71st ChannelA Sliding Window2nd Channelj1,1 j1,2 j1,3j2,1 14 15j3,1 17 18i1,1 i1,2 i1,3i2,1 i2,2 i2,3i3,1 i3,2 i3,3l1,1 l1,2 l1,3l2,1 32 33l3,1 35 36k1,1 k1,2 k1,3k2,1 k2,2 K2,3k3,1 k3,2 k3,3Filter 1Filter 2Figure 1: Input image and filters. Note that the sliding window slides in row major order, i.e., it firstslides right and changes to the first position of the second row until it reaches the end of the first row.The white region around the input image is the zero padding.and the stride is 1. The bias terms for the two convolutional filters in Filter1(Filter2) are b1(b3)and b2(b4) respectively. For one filter, we convolve it with every sliding window of the inputimage, and every such convolve operation over one sliding window generates one output of thisconvolutional layer. For one filter, there are 6 sliding windows in total, which correspond to the6 outputs of such filter. For every sliding window, we can think the output to be generated bya dot product of a weight vector and the flattened input image, where non-zero entries of thethe weight vector should have exactly the same values as the filter, and their positions dependon the sliding window. When we get the weight vector for each sliding window, we can simplystack them together to get the converted weight matrix W. The bias part is simple, as for onefilter, we are adding the same bias to every sliding window output. Write out the weight matrixW and bias b in terms of the filter weights and biases. Convince yourself that you get exactlythe same output (flattened) as the original convolution.8