We explore ultimate tic-tac-toe by building several agents and simulating games between them to compare their performance. We start with a Random Agent, which randomly chooses an action from the possible actions, and we move toward a Minimax Pruning Agent and an Expectimax Agent, which search a game tree to a maximum depth before relying on a linear evaluation function learned through TD learning updates and Monte Carlo Tree Search.
More details in our paper.