代写Assignment 2代做R编程

2025-06-03 代写Assignment 2代做R编程

Assignment 2

Details

For this assignment, you will submit a README.md with your answers to the questions below, along with the code you used to produce your answers (including all sbatch files). You should commit your Assignment 2 file(s) to your private “a2” GitHub repository (click here (https://classroom.github.com/a/vyTm0aet) to accept the GitHub Classroom invitation to access this repository) and submit a link to your repository here on the Canvas (clicking the “Submit Assignment” button to make your submission). You must work alone on this assignment. Before submitting your assignment, please take a look at the tips one of the previous TAs for the course (Jinfei Zhu) compiled for writing a grader-friendly README file and organizing your assignment GitHub repository (https://github.com/lsc4ss-a21/assignment-submission-template) if you have not already done so.

Recall that you can interact with private GitHub repositories (like your assignment repositories) on the Midway Cluster by configuring Git on your remote terminal (https://canvas.uchicago.edu/courses/63368/pages/steps-to-configure-git-slash-github-for-use-on-midway-3) .

Important Note: you should only run your code on large numbers of Midway CPU cores when you have already debugged it and are confident it will run. We have a shared compute budget on the Midway Cluster and will quickly deplete our budget if you try to debug at scale. We recommend debugging code on small numbers of cores (i.e. two or three cores) and/or Colab notebooks (as demonstrated in class) first.

1. (2 Points) This week’s assignment is an extension of what you did in Assignment 1 (https://canvas.uchicago.edu/courses/63368/assignments/749540?wrap=1) . Your first task is to write an sbatch script. that will compile your function from Question (1b) ahead of time so that you can run the simulation from Assignment 1 on the Midway Cluster. You should only need to request a single core in your sbatch script. and should incorporate all of your numba precompilation code into a .py file (that can be executed in your sbatch script).

2. Now, let’s see if we can achieve even further speedup by parallelizing your simulation via mpi4py multi-processing and your parallelization strategy from Assignment 1, Question 2. Using time.time() (as demonstrated in lecture), you will time 20 different runs of the simulation above on the Midway Cluster – i.e., 20 different runs of 1,000 simulations each of the time series of 4,160 periods. You will complete one run in serial using only one core as in the numba-accelerated code you wrote in part (1) (1,000 simulations on one core), the second run using two cores (500 simulations per core), the third run using three cores (333 simulations per core), adding additional cores until you finish your last run using 20 cores (50 simulations per core).

a. (4 Points) Write your parallelized simulation in a file named q2.py. You should incorporate the object file that you compiled ahead of time in (1) into your code. Assume that you’ve overcome the issue in Assignment 1, Question 2 that made generation of eps_mat a serial component that must occur on a single process -- recognizing that you can divide the number of simulated lifetimes S into equal subsets of size N, generated independently on each core. As such, your implementation should generate equal subsets of eps_mat on each process in parallel, with your random seed corresponding to the rank of the process your epsilon matrix subset is being generated on (e.g. rank 0 uses 0 as its random seed, rank 1 uses 1 as its random seed, etc.). Once all of your processes are finished running their portion of the simulation, be sure to collect all simulated arrays back onto a single process (i.e. rank 0) in anticipation of running further analyses on the full set of simulated data in a future study. You do not need to return anything from your simulation, but you should print the time it takes to complete the simulation and collect all of the simulated data onto a single process.

b. (2 Point) Write a sbatch script. (called q2.sbatch) to automate all of your simulation runs with differing number of cores and print the amount of time it takes to run each simulation to file (called q2.out). Note that you should not need to set --ntasks to more than 20 to complete this task, nor should you need to set the --nodes parameter to anything greater than 1. You will lose credit for this entire question (Question 2) if you set either of these parameters above these thresholds at any point.

c. (2 Points) Produce a plot that displays the computation time for the 1,000 simulations on the y-axis against the number of cores for the particular run on the x-axis (based on the results in q2.out). Compare your results to your parallel speedup expectations in Assignment 1, Question 2 and discuss any discrepancies with reference to the Midway Cluster hardware configuration.