Ristinolla - Tic Tac Toe
About the Game
Objective: player who first gets five symbols in a row (horizontal, vertical or diagonal) wins.
Computer player: By default, the computer player uses a simple policy: it counts how many symbols are missing from 5 symbol long vector and assigns a value for each empty square based on that (0-9). Needless to say, it is fairly easy to beat this strategy.
Iteration: In first policy iteration (see MDP page) each possible action is evaluated by playing the game until end using the standard policy for further moves (both sides). The best initial action is then chosen based on the result of these games.
The resulting policy is never worse than the original, and in practice policy iteration converges often quickly to the optimum.
|Esa Hyytiä, 2001.|