Last weekend we spent a day taking our first steps towards building a Pommerman agent.
In addition to a full game simulation environment, the team running the competition were kind enough to provide helpful documentation and some great examples to help people get started.
There are a few particularly useful things included:
- A few example implementations of agents. One just takes random actions, another is heuristic based, and a third uses a tensorforce implementation of PPO to learn to play the game.
- A Jupyter notebook with a few examples including a step-by-step explanation of the tensorforce PPO agent implementation. (this is probably the best place to start)
- A visual rendering of each game simulation.
Before we get anywhere, we hit a few small stumbling blocks.
- It took us a few attempts, installing different versions of Python, before we got TensorFlow running. Now we know that TensorFlow doesn’t support Python 3.7, or any 32-bit versions of Python.
- The tensorforce library, which the included PPO example is based on, has been changing rapidly. Some of the calls to this library no longer worked. While the code change required was minimal, it took at least an hour of digging through tensorforce code before we knew what exactly needed to be changed. We committed a small fix to the notebook here, which now works with version 0.4.3 of tensorforce, available through pip. (I wouldn’t recommend using the latest version of tensorforce on GitHub as we encountered a few bugs when trying that)
I was hoping we’d get to an agent which could beat the heuristics-based SimpleAgent at FFA, but we didn’t manage to get there. In the end, we managed to:
- Get the Jupyter notebook with examples running
- Understand how the basic tensorforce PPO agent works
- Set up a validation mechanism for running multiple episodes with different ages, and save each game so we can replay it for debugging purposes.
- Train a tensorforce PPO agent (though it was technically training, we didn’t actually manage to get it to beat the SimpleAgent in any games yet)
To be continued…