Version 1

Description:

I implemented my first attempt using a Monte Carlo simulation. At each turn, the computer would play a large number of games with random move sequences. It would sort the games by the first move and count the number of wins, losses, and ties in each group. A win counted +1, a loss -1, and a tie 0. Then it would select the move with the highest score.

For example: Suppose the human player begins by playing in column 3. Then, the computer would play a hypothetical continuation by selecting a random first move. Say it selects 6. It would then simulate its opponent's response with another random move, and so on. If the sequence ends in a win for the computer, it adds +1 to the score corresponding to the 6 column (6 being its initial move). And so on.

Why should this work? If, in theory, you could play out all possible games, you would want to select the next move with the most winning continuations. There are too many total games, so you select a certain number of random games, hoping for a representative cross section.

This technique is called "Monte Carlo" because Monte Carlo is known for its gambling, and that is exactly what you are doing - playing the odds.

The beauty of using a Monte Carlo technique is that you don't need to know how to play the game. You could in theory play any turn-based game this way: given a function that tells you when the game is over (and who won), and another function that enumerates all legal moves from a given position, you could apply the Monte Carlo algorithm to play the game - without ever knowing what the game was! This makes the programming incredibly simple. My two main routines totalled about 30 lines of code!

Screenshot:

 you win!
|         |
|         |
|         |
|         |
| OO      |
| XXXX  O |
+-1234567-+

(I am X, computer is O.)

Code:

get it! (gzip'ed)

Caveat: it's not pretty.

There is one class, "Board", which keeps track of the board position. it has functions to play a move, tell you if a move is legal, tell you if the board is full, tell you if there is a winner, and print itself on the screen.

Then there are two game playing routines. The first, "sim", plays a whole passel o' simulated games and chooses the initial move that leads to the most wins. The second, "random_game", is a helper for the first and plays one random game, returning the result (W, L, or T).

All this is driven by an I/O loop which prints out the board, asks for the player's move, calculates a response, prints out the board again, and repeats until the game is over.

Conclusion:

It sucked! Usually losing immediately. Making seemingly random moves.

Analysis:

There are many possible reasons it sucked, some obvious, some not.

Too few simulated games played, namely 1000, not giving a representative cross section of all games.
Ignoring obvious moves by the opponent, like when there is an immediate win.

Point #2 is the basic problem with a pure Monte Carlo technique, namely it ignores strategy. While the computer may select a move with many winning continuations, its opponent might not (in fact, will not) choose each possible continuation with equal likelihood.

To make the computer play better, we had to take the opponent's strategy into account.

We might also try increasing the number of simulated games, but the program was already slow, and it would probably have to be increased several orders of magnitude to play well (like from 1000 up to 1,000,000). Infeasible (except possibly in C with a highly tuned algorithm, which sounded like too much work).

Therefore, on to version 2...

Hosted by www.Geocities.ws