Summary of the Project
Our project will be based on Malmo, similar to the tutorial where the agent tries to find blue tile through a map and avoid lava which kills the player. However, the tutorial was simple and easy to implement, to increase the level of difficulty, we decided to add more variables to the environment. For the size of the map, we decided to quadruple the moving area. Also, we want to add three more blue blocks to the map. There will be enemies such as zombies on certain tiles, they will not be moving as it might increase the level of difficulty too much. The agent is given a weapon that can be used once to eliminate an enemy, this can potentially create a shorter path to achieve a higher score. Since there are a total number of 4 blue tiles, our goal is to collect as many blue tiles as possible while achieving the highest score. We will experiment with the scoring system to adjust the score we contribute to collecting blue tiles and moving around, and killing zombies as well.
The performance of our project is going to be measured by the overall score achieved at a certain state, this may be measured by limiting the number of moves, or collecting a number of blue tiles. All of these metrics can be implemented by the functions Malmo provides. The time it takes to get all the blue tiles and the training time are also important to measure metrics. (BTW if there is any visualization of loss and improvement in RL it will be great) We will design different environments in different stages to lift the difficulty as time goes on. In the beginning, we will put the blue tiles on the ground level as it will be easier for the agent to find. Later on, we will add zombies to the environment and put the blue tiles on the ground. The baseline performance might be a hard-coded agent move towards the rewards (?) We expect our baseline agent will get half of the blue tiles and survived from the zombies. And our main method might be based on deep Q learning (?). we expect our approach will find all the four blue diamonds and kill zombies for a higher score. It will be awesome that our approach can achieve the goal in a relatively short time and take a short time to train.
train a simple Q learning agent that will acquire some of the blue tiles without training time limit.
First milestone: design and implement the problem environment and implement the baseline agent before February 7.
Second milestone: train a simple Q learning agent and observe the performance.
Train an agent based on other reinforcement learning (?) algorithm that will acquire all the blue tiles and kill zombies in less than 4 days of training.
Third milestone: Adjust blue tiles location and add zombies
Fourth milestone: Agent achieves a near-optimal score
Train an agent to find the optimal solution to achieve a realistic goal.
Fifth milestone: Adding vertical blocks (walls), cost points to destroy (-5 pts?)
Milestone 2 Report