代做program、代写Python程序设计
2023-09-28
The DragonGame AI Environment“Untitled Dragon Game”1 or simply DragonGame, is a 2.5D Platformer game in which the player mustcollect all of the gems in each level and reach the exit portal, making use of a jump-and-glide movementmechanic, and avoiding landing on lava tiles. DragonGame is inspired by the “Spyro the Dragon” gameseries from the original PlayStation. For this assignment, there are some changes to the game environmentindicated in pink font. In Assignment 2, actions may have non-deterministic outcomes!To optimally solve a level, your AI agent must find a policy (mapping from states to actions) which collectsall gems and reaches the exit while incurring the minimum possible expected action cost.Levels in DragonGame are composed of a 2D grid of tiles, where each tile contains a character representingthe tile type. An example game level is shown in Figure 1.Figure 1: Example game level of DragonGame, showing character-based and GUI visualiser representations1Assignment 2: DragonGame MDPKey information:• Due: 3pm, Wednesday 20 September• This assignment assesses your skills in developing algorithms for solving MDPs.• Assignment 2 contributes 20% to your final grade.• This assignment consists of two parts: (1) programming and (2) a report.• This is an individual assignment.https://www.foxitsoftware.cn/editmac/?MD=MshanchuAssignment 2: DragonGame MDPGame state representationEach game state is represented as a character array, representing the tiles and their position on the board.In the visualizer and interactive sessions, the tile descriptions are graphical assets, whereas in the input filethese are single characters.Levels can contain the tile types described in Table 1.Table 1: Table of tiles in DragonGame, their corresponding symbol and effectTile Symbol inInput FileSymbol inVisualiserEffectSolid ‘X’ The player cannot move into a Solid tile. Walk and jumpactions are valid when the player is directly above a Solid tile.Ladder ‘=’The player can move through Ladder tiles. Walk, jump, glideand drop actions are all valid when the player is directly abovea Ladder tile. When performing a walk action on top of aladder tile, there is a small chance to fall downwards by 2 tiles(when the position 2 tiles below is non-solid).Air ‘ ’ The player can move through Air tiles. Glide and drop actionsare all valid when the player is directly above an Air tile.Lava ‘*’Moving into a Lava tile results in Game Over and movingdirectly above a Lava tile results in Game Over on the nextturn (as there are no possible moves that can avoid falling intoLava on the next step).SuperJump‘J’When the player performs a ‘Jump’ action while on top of aSuper Jump tile, the player will move upwards by between 2and 5 tiles (random outcome - probabilities given in input file).If blocked by a solid tile, probabilities for moving to each ofthe blocked tiles roll over to the furthest non-blocked tile. Forall other actions, Super Jump tiles behave as ‘Solid’ tiles.SuperCharge‘C’When the player performs a ‘Walk Left’ or ‘Walk Right’ action while on top of a Super Charge tile, the player will move(left or right) until on top of the last adjoining Super Chargetile (see example in Table 2), then move in the same direction by between 2 and 5 tiles (random outcome - probabilitiesgiven in input file). If blocked by a solid tile, probabilities formoving to each of the blocked tiles roll over to the furthestnon-blocked tile. For all other actions, Super Charge tiles behave as ‘Solid’ tiles.Gem ‘G’Gems are collected (disappearing from view) when the playermoves onto the tile containing the gem. The player mustcollect all gems in order to complete the level. Gem tilesbehave as ‘Air’ tiles, and become ‘Air’ tiles after the gem iscollected.Exit ‘E’ Moving to the Exit tile after collecting all gems completes thelevel. Exit tiles behave as ‘Air’ tiles.Player ‘P’The player starts at the position in the input file where thistile occurs. The player always starts on an ‘Air’ tile. In thevisualiser, the type of tile behind the player is shown in placeof the spaces.Page 2https://www.foxitsoftware.cn/editmac/?MD=MshanchuAssignment 2: DragonGame MDPAn example of performing the Walk Right action on a Super Charge tile is shown in Table 2:Table 2: Example Walk Right action on a Super Charge tile>>> >>> >>> (P) (P) (P) (P)The ‘ ’ on the green tile represents the current player position, ‘>>>’ represents tiles which are skippedover, and each (P) represents a possible new position of the player.ActionsAt each time step, the player is prompted to select an action. Each action has an associated cost, representingthe amount of energy used by performing that action. Each action also has requirements which must besatisfied by the current state in order for the action to be valid. The set of available actions, costs andrequirements for each action are shown in Table 3.Table 3: Table of available actions, costs and requirementsAction Symbol Cost Description Validity RequirementsWalk Left wl 1.0Move left by 1 position; if abovea Ladder tile, random chance tomove down by 2; if above a Super Charge tile, move a variableamount (see example in Table 2)Walk Right wr 1.0Move right by 1 position; if abovea Ladder tile, random chance tomove down by 2; if above a Super Charge tile, move a variableamount (see example in Table 2)Jump j 2.0Move up by 1 position; if above aSuper Jump tile, move up by between 2 and 5 (random outcome)Current player must be above aSolid or Ladder tile, and new playerposition must not be a Solid tile.Glide Left 1 gl1 0.7 Move left by between 0 and 2 (random outcome) and down by 1.Glide Left 2 gl2 1.0 Move left by between 1 and 3 (random outcome) and down by 1Glide Left 3 gl3 1.2 Move left by between 2 and 4 (random outcome) and down by 1Glide Right 1 gr1 0.7 Move right by between 0 and 2(random outcome) and down by 1Glide Right 2 gr2 1.0 Move right by between 1 and 3(random outcome) and down by 1Glide Right 3 gr3 1.2 Move right by between 2 and 4(random outcome) and down by 1Current player must be above aLadder or Air tile, and all tiles in theaxis aligned rectangle enclosing boththe current position and newposition must be non-solid (i.e. Airor Ladder tile). See example below.Drop 1 d1 0.3 Move down by 1Drop 2 d2 0.4 Move down by 2Drop 3 d3 0.5 Move down by 3Current player must be above aLadder or Air tile, and all cells in theline between the current positionand new position must be non-solid(i.e. Air or Ladder tile).Page 3https://www.foxitsoftware.cn/editmac/?MD=MshanchuAssignment 2: DragonGame MDPExample of glide action validity requirements for GLIDE RIGHT 2 (‘gr2’):Current Position Must be Non-Solid Must be Non-SolidMust be Non-Solid Must be Non-Solid New PositionPage 4Interactive modeA good way to gain an understanding of the game is to play it. You can play the game to get a feel for howit works by launching an interactive game session from the terminal with the following command:$ python play_game.py <input_file>.txtwhere <input_file>.txt is a valid testcase file from the support code with path relative to the currentdirectory, e.g. testcases/L1.txtIn interactive mode, type the symbol for your chosen action and press enter to perform the action. Type ‘q’and press enter to quit the game.DragonGame as an MDPIn this assignment, you will write the components of a program to play DragonGame, with the objectiveof finding a high-quality solution to the problem using various sequential decision-making algorithms basedon the Markov decision process (MDP) framework. This assignment will test your skills in defining a MDPfor a practical problem and developing effective algorithms for large MDPs.What is provided to youWe provide supporting code in Python, in the form of:1. A class representing the DragonGame environment and a number of helper functions (in game env.py)– The constructor takes an input file (testcase) and converts it into a DragonGame map2. A class representing the DragonGame game state (in game state.py)3. A graphical user interface for visualising the game state (in gui.py)4. A solution file template (solution.py)5. A tester (in tester.py)6. Testcases to test and evaluate your solution (in ./testcases)https://www.foxitsoftware.cn/editmac/?MD=MshanchuAssignment 2: DragonGame MDP• vi initialise()• vi is converged()• vi iteration()• vi get state value()• vi select action()• pi initialise()• pi is converged()• pi iteration()• pi select action()You can add additional helper methods and classes (either in solution.py or in files you create) if you wish.To ensure your code is handled correctly by the autograder, you should avoid using any try-except blocks inyour implementation of the above methods (as this can interfere with our time-out handling). Refer to thedocumentation in solution.py for more details.If you are unable to solve certain testcases, you may specify which testcases to attempt in testcases to attempt.Page 5Your assignment taskYour task is to develop planning algorithms for determining the optimal policy (mapping from states toactions) for the agent (i.e. the Dragon), and to write a report on your algorithms’ performance. You will begraded on both your submitted program (Part 1, 50%) and the report (Part 2, 50%). These percentageswill be scaled to the 20% course weighting for this assessment item.The provided support code formulates DragonGame as an MDP, and your task is to submit code implementing the following MDP algorithms:1. Value Iteration (VI)2. Policy Iteration (PI)Once you have implemented and tested the algorithms above, you are to complete the questions listed in thesection “Part 2 - The Report” and submit the report to Gradescope.More detail of what is required for the programming and report parts are given below.Part 1 — The programming taskYour program will be graded using the Gradescope autograder, using testcases similar to those in the supportcode.Interaction with the testcases and autograderWe now provide you with some details explaining how your code will interact with the testcases and theautograder (with special thanks to Nick Collins for his efforts making this work seamlessly).Implement your solution using the supplied solution.py template file. You are required to fill in theconstructor and the following method stubs of the Solver class:https://www.foxitsoftware.cn/editmac/?MD=MshanchuAssignment 2: DragonGame MDPGrading rubric for the programming component (total marks: 50/100)For marking, we will use 5 different testcases to evaluate your solution. Each test case is scored out of 10.0marks in the following categories:a) Completion/reaching goal (0.5 pts pass/fail)b) Number of Iterations vs target (up to 1 pt with partial marks)c) Average time per iteration vs target (up to 1 pt with partial marks) [disqualified if max time per iterationis much greater than the mean]d) Convergence of values/policy (1pt pass/fail)e) Values - distance from reference solution values (up to 1 pt with partial marks) [assessed for VI only]f) Reward vs target (up to 1 pt with partial marks)This totals to 5.5 marks per testcase for VI, 4.5 marks per testcase for PI, resulting in 10 marks per testcase,and 50 marks total over 5 testcases.Part 2 — The reportThe report tests your understanding of MDP algorithms and the methods you have used in your code, andcontributes 50/100 of your assignment mark.Question 1. MDP problem definition (18 marks)a) Define the State space, Action space, Transition function, and Reward function components of theDragonGame MDP planning agent as well as where these are represented in your code. (10 marks)b) Describe the purpose of a discount factor in MDPs. (3 marks)c) State and briefly justify what the following dimensions of complexity are of your agent in the DragonGame MDP. (See https://artint.info/3e/html/ArtInt3e.Ch1.S5.html for definitions)(5 marks)• Planning Horizon• Sensing Uncertainty• Effect Uncertainty• Computational Limits• LearningQuestion 2. Comparison of algorithms (17 marks)This question requires comparison of your implementation of value iteration (VI) and policy iteration (PI).If you did not implement PI, you may receive partial marks for this question by providing insightful relevantcomments on your implementation of VI. For example, if you tried standard VI and asynchronous VI, you maycompare these two approaches for partial marks.a) Describe your implementations of Value Iteration and Policy Iteration in one sentence each. Includedetails such as whether you used asynchronous updates, and how you handled policy evaluation in PI.(2 marks)b) Pick three representative testcases to compare the performance of VI and PI, reporting the numericalvalues for the following performance measures: (6 marks)Page 6https://www.foxitsoftware.cn/editmac/?MD=MshanchuAssignment 2: DragonGame MDP• Time to converge to the solution.• Number of iterations to converge to the solution.c) Discuss the difference between the numbers you found for VI and PI. Explain and provide reasons forwhy the differences either make sense, or do not make sense. (9 marks)Question 3. Investigating optimal policy variation (15 marks)One consideration in the solution of a Markov Decision Process (i.e. the optimal policy) is the trade off betweena risky higher reward vs a lower risk lower reward, which depends on the probabilities of non-deterministicdynamics of the environment and the rewards associated with certain states and actions.Consider testcase L4.txt, and explore how the policy of the agent changes with ladder f all prob andgame over penalty. The agent must cross from the bottom left (to collect a gem) to the bottom right (toreach the exit). There are 3 possible paths from left to right (bottom, middle and top). Because of thechance to fall when performing a walk action while on top of a ladder, there is a risk of falling into the lavain the centre.Figure 2: Testcase L4.txt• The bottom path has the lowest cost (fewest jumps needed) and highest risk, as the lava tiles aboveprevent using jump+glide (which is more expensive than walking but has no fall risk).• The middle path also has lava tiles above preventing jump+glide, but if the agent falls once it canglide and land in a safe part of the bottom path for a second chance at making it across.• The top path has enough headroom to allow jump+glide (which eliminates the risk of falling), butrequires a large number of jumps to reach, so is expensive.The level of risk presented by the lower paths can be tuned by adjusting the ladder f all prob and thegame over penalty.If you did not implement PI, you may change the solver type to VI in order to answer this question.a) Describe how you expect the optimal path to change as the ladder f all prob and game over penaltyvalues change. Use facts about the algorithms or Bellman optimality equation to justify why you expectthese changes to have such effects. (7.5 marks)b) Picking three suitable values for ladder f all prob, and three suitable values for the game over penalty,explore how the optimal policy changes over the 9 combinations of these factors. You should presentthe results in a table, indicating whether the agent’s optimal policy is to traverse the top, middle orbottom path, or something else, using colours to denote the optimal behaviour for each combination.Do the experimental results align with what you thought should happen? If not, why?(7.5 marks)Page 7https://www.foxitsoftware.cn/editmac/?MD=MshanchuAssignment 2: DragonGame MDPAcademic MisconductIt is the responsibility of the student to ensure that you understand what constitutes Academic Misconductand to ensure that you do not break the rules. If you are unclear about what is required, please ask.It is also the responsibility of the student to take reasonable precautions to guard against unauthorised accessby others to his/her work, however stored in whatever format, both before and after assessment.In the coding part of assignments, you are allowed to draw on publicly-accessible resources and providedtutorial solutions, but you must make reference or attribution to its source, by doing the following:• All blocks of code that you take from public sources must be referenced in adjacent comments in yourcode or at the top of your code.• Please also include a list of references indicating code you have drawn on in your solution.pydocstring.If you have utilised Generative AI tools such as ChatGPT, you must cite the tool and version in your codeas well as describe in the report. A failure to reference generative AI use may constitute student misconductunder the Student Code of Conduct.However, you must not show your code to, or share your code with, any other student under anycircumstances. You must not post your code to public discussion forums (including Ed Discussion)or save your code in publicly accessible repositories (check your security settings). You must notlook at or copy code from any other student.All submitted files (code and report) will be subject to electronic plagiarism detection and misconduct proceedings will be instituted against students where plagiarism or collusion is suspected. The electronic plagiarismdetection can detect similarities in code structure even if comments, variable names, formatting etc. aremodified. If you collude to develop your code or answer your report questions, you will be caught.Late submissionStudents should not leave assignment preparation until the last minute and must plan their workloads to meetadvertised or notified deadlines. It is your responsibility to manage your time effectively.It may take the autograder up to an hour to grade your submission. It is your responsibility to ensure youare uploading your code early enough and often enough that you are able to resolve any issues that may berevealed by the autograder before the deadline. Submitting non-functional code just before the deadline, andnot allowing enough time to update your code in response to autograder feedback is not considered a validreason to submit late without penalty.Assessment submissions received after the due time (or any approved extended deadline) will be subject to a100% late penalty. A one-hour grace period will be applied to the due time after which time the 100% latePage 8https://www.foxitsoftware.cn/editmac/?MD=MshanchuAssignment 2: DragonGame MDPPage 9penalty will be imposed. This grace period is designed to deal with issues that might arise during submission(e.g. delays with Blackboard or Turnitin) and should not be considered a shift of the due time. Please keepa record of your submission time.https://www.foxitsoftware.cn/editmac/?MD=Mshanchu