Topic: Hnefatafl AI tournament - code your way to glory (Read 10190 times)

Fishbreath · « **on:** March 01, 2016, 12:38:22 pm »

Hello, Bay12! One of the things I've been doing lately, instead of producing high-quality wargame AARs for your perusal, is writing a computer implementation of an Old Norse board game called hnefatafl. In doing so, I've ended up doing some research into the complexity of tafl games, and it turns out that they make for a very interesting AI challenge—many variants are significantly harder than chess for computers to play.

Tafl games are asymmetric abstract strategy board games, where the players have different forces and different aims, which makes them different from almost every modern abstract strategy board game. One side, the king's side, starts at the center of the board, and their goal is to get the king to an edge space or a corner space, depending on the rules variant. The other side, the besieging side, starts along the edges, and must, ultimately, capture the king. Although the players' ultimate objectives are the same, the way each side plays during a game is very different. The king's side must move with daring and enterprise to slip past the besieger's cordon, while the besieger must play cautiously and craftily to enclose the king in an impenetrable net.

Now, as far as I can tell, I'm the only person on the planet who's done any formal research into hnefatafl's complexity and AIs to play it. In the hopes of spurring development of computer players and expanding interest in both the game and its mathematical significance, I've decided to organize a tafl AI tournament, with a small prize to (very slightly) sweeten the pot. You can find tournament rules and more information behind the link. Here's a commented version of a game I played against a friend a while ago using an older version of OpenTafl, which should give you a better idea of the flow of the game.

Anyway, I hope this is interesting to some of you, and I hope to see an entry or two out of it!

Zekka · « **Reply #1 on:** March 06, 2016, 03:17:11 pm »

I don't intend to jump on this right now because I have another AI project that's currently sucking up all my time, but this is an interesting problem that anyone who's not already working on an AI project should be jumping on.

Fishbreath · « **Reply #2 on:** March 07, 2016, 10:57:53 am »

Thanks for the endorsement! Here's hoping you solve your problem before December, and have the time to jump in later in the year.

Zekka · « **Reply #3 on:** March 09, 2016, 09:26:10 pm »

Quote from: Fishbreath on March 07, 2016, 10:57:53 am

Thanks for the endorsement! Here's hoping you solve your problem before December, and have the time to jump in later in the year.

I have no idea! Last summer when I started it I thought to myself "I'll have this finished to present to my friends in September." In December I thought I almost had a working prototype -- then in Feb I made a significant model change since the old model didn't account for a few things I really cared about, and now in early March I think I almost have a working prototype again under the radically different model. Right now I predict I free up in May, but I don't trust myself.

It's probably going to be hard for me not to take a week off and explore something like this though.

Arx · « **Reply #4 on:** March 13, 2016, 06:28:06 am »

I've had a look at this. I might give it a shot, but I'd probably have to study hnefatafl theory relatively extensively, which seems like work...

Zekka · « **Reply #5 on:** March 14, 2016, 12:58:24 pm »

Quote from: Arx on March 13, 2016, 06:28:06 am

I've had a look at this. I might give it a shot, but I'd probably have to study hnefatafl theory relatively extensively, which seems like work...

You can probably try doing it with tree search over every possible move (all that requires is a complete implementation of the rules, which Fishbreath has already done) along with a hacky value-of-pieces scoring function -- then start pruning options that, based on your strategic insight, look obviously not relevant. Most of your rules to do this will probably take a lot more computation than just exploring the tree, mind, so you will probably need to be pretty sure you are exploring very accurately once you start cutting moves, or you'll lose overall depth.

Implementing this correctly will probably result in an AI that's not very good at macro plans but is pretty good at not losing pieces due to two-or-three-move-long gambits, and possibly decent at throwing gambits like those at you.

Another more experimental option might be to find a way to train a neural net to evaluate board states, like AlphaGo did -- I don't know exactly how AlphaGo did it, but I'm guessing it did so by taking a board state, tree-searching it to the end (using whatever rules you used for your base tree search), and then giving that board state (and most of the intermediate ones) a value based on the win-lose probabilities that tree search turned up.

Unfortunately there's not a large corpus of hnefetafl games out there, so you can't do what AlphaGo did in seeding the tree search algorithm with a ton of data on what interesting-to-pro moves look like. My guess is that not having that data means that your tree search is going to need to err on the side of exhaustive no matter what.

If you wanted to sit down and learn a little strategy, an implementation that uses hacky guesswork to try to set itself up for long-term play might be interesting. One of the first successful checkers-playing programs trained itself on a large number of metrics that the authors thought might be relevant to successful checkers play -- stuff like "number of diamonds" or "density of flanks vs density of middle board" -- then optimized using a genetic algorithm to find the winning combinations for those long-term strategy metrics -- then used a scoring function based on all that math. It probably wouldn't be too hard to come up with metrics like that in Hnefetafl, and that might get you a significantly stronger scoring function than just "count pieces and see who has more." (Of course, "count pieces and see who has more" is a stronger scoring function than you think -- combined with a lot of ability for short-term tactics, a lot of chess bots did pretty well with it, IIRC.)

Those are three pretty closely related takes on implementing hnefetafl that don't require you to know very much strategy in advance. I bet you could probably do all of them at once, and I bet your result would be a hnefetafl program with a really dogmatic view of long-term planning but a lot of versatility in the short term. (because evaluating short-term boards near-exhaustively means not being surprised hardly ever.)

Fishbreath actually knows a lot about AlphaGo IIRC (his blog has tons-of-fun coverage of the games it played, at least, which spurred me towards actually watching the games in their four-to-six-hour glory) so ask him about all the things I said "gee I dunno" about. I am not an AI expert!!!!!!

EDIT: Minor note -- it looks kind of like the algorithm I'm talking about when I say "tree search" is not strictly Monte Carlo tree search. After doing some quick reading it looks like it's typical to evaluate all the way to the end of the game making random choices all the way. I'm specifically saying you should evaluate not-too-deep, and substitute a better "evaluate board" function when you get there, so that when you make your random picks they're likely to be representative. There are enough possible moves from each hnefetafl state that evaluating six or seven moves down at near random will probably not give you a good valuation of the position AFAICT.

Zekka · « **Reply #6 on:** March 14, 2016, 01:31:29 pm »

FYI, I'm probably not going to stop longposting in this topic until I convince someone who isn't me to write an AI for this contest, or I convince myself to.

Arx · « **Reply #7 on:** March 14, 2016, 02:51:45 pm »

I'm familiar with some of the basic AI concepts I could implement with minimal understanding, but the goal here is to write a good one.

If I can hack together a basic AI, I might look at mixing in some machine learning to improve it.

Fishbreath · « **Reply #8 on:** March 15, 2016, 10:21:22 am »

That would be super-nifty.

A friend of mine has a general-purpose neural network he's aiming to try to teach tafl, but the small corpus of pro games is the issue, as it stands. (If you visit aagenielsen.dk, though, there is a history of past tournaments. Not all of the players in the tournaments are good, but there are a few hundred games, at least, to start with.)

Zekka, you're mostly on-point in your long post, but I was awake until (*counting on fingers*) six hours ago, and have been awake since, uh... (*more counting*) two and a half hours ago? So the points I want to clarify will have to wait until tonight, possibly, after I've had a nap, or more likely, tomorrow during the day.

Most of the things I've implemented in the built-in OpenTafl AI have close analogues in general AI and chess engines, and I went over the Mediocre Chess development blog to catch up on the current state of chess engines.

Anyway, it would be great to have either/both of you on board, and I'm happy to answer tafl or AI questions to the best of my (still somewhat limited) knowledge.

Fishbreath · « **Reply #9 on:** March 16, 2016, 08:28:33 am »

I have had an amount of sleep sufficient to elevate me above counting on fingers, so a reply appears.

Quote from: Zekka on March 14, 2016, 12:58:24 pm

<snip>

Implementing this correctly will probably result in an AI that's not very good at macro plans but is pretty good at not losing pieces due to two-or-three-move-long gambits, and possibly decent at throwing gambits like those at you.

One of the problems with this approach for tafl games is that simple evaluation functions like counting material on the board don't really work all that well: your average tafl game has very few captures, and barring accidentally bad play, the first capture usually happens beyond the search horizon.

Encoding some strategic knowledge into move generation is something I've considered, but that's one of the things that made go AI so hard—strategic knowledge is inherently situational, and working out rules for when to apply other rules can turn into rules-ception pretty quickly. I've had good results with good old-fashioned tree search with alpha-beta pruning, so far, along with 2-3 plies of extension search on the five-ish best moves to help catch any horizon effects; the hard part is a good board evaluation function, which I haven't yet hit on.

Quote

Another more experimental option might be to find a way to train a neural net to evaluate board states, like AlphaGo did -- I don't know exactly how AlphaGo did it, but I'm guessing it did so by taking a board state, tree-searching it to the end (using whatever rules you used for your base tree search), and then giving that board state (and most of the intermediate ones) a value based on the win-lose probabilities that tree search turned up.

One of the facts from AlphaGo's paper was that its board evaluation is really good: good enough that even just generating all possible board states one move ahead and evaluating those makes it as good as or better than every computer go engine to date. Perfect board evaluation is impossible, but great board evaluation is very possible with NNs.

The way AlphaGo did its training initially was to look at one board state per game (more or less randomly-selected, to avoid any bias toward a particular style), and rate that board state on whether it led to victory or defeat. Once it had a solid foundation, it began to improve itself by self-play, generating games and rating states in those based on victory or defeat.

The problem here is, as you say, the corpus of games. One of the reasons I'm running this tournament is to get some quality computer players so I can begin to generate such a corpus.

Quote

(Of course, "count pieces and see who has more" is a stronger scoring function than you think -- combined with a lot of ability for short-term tactics, a lot of chess bots did pretty well with it, IIRC.)

As I said, there's unfortunately no good analogue to this for tafl, as far as I'm aware. Evaluation functions are necessarily complex, more akin to the checkers stuff, and tafl has the added issue of lacking opening or endgame databases. (The former can be fixed, but the latter is basically impossible: tafl endgames are too complex to build a database for, although I'd love to build a database of solved corner states, sometime.)

Quote

EDIT: Minor note -- it looks kind of like the algorithm I'm talking about when I say "tree search" is not strictly Monte Carlo tree search. After doing some quick reading it looks like it's typical to evaluate all the way to the end of the game making random choices all the way. I'm specifically saying you should evaluate not-too-deep, and substitute a better "evaluate board" function when you get there, so that when you make your random picks they're likely to be representative. There are enough possible moves from each hnefetafl state that evaluating six or seven moves down at near random will probably not give you a good valuation of the position AFAICT.

You'd think that about go, too, which has a much bigger branching factor, but consider this: my tafl engine is relatively slow, but can still generate and evaluate 30,000 states per second on my laptop, and 50,000-100,000 on my much faster desktop. At their longest, tafl games end up being about 150 moves. In thirty seconds of searching, that's at least 6,000 playouts on my laptop (more, since OpenTafl's AI spends a lot of time sorting potential moves at each state by 'best'), if you make the assumption that random playouts longer than 150 moves are unlikely to be uniquely good. 6,000 is enough to start getting pretty good pathways, if you have a decent move generator.

The thing about MCTS is that it doesn't require an explicit evaluation function, which is a huge benefit for go, since go positions are so hard to evaluate (at least, without a fancy neural network). The hybrid you're describing is interesting, though, a sort of depth-limited MCTS which falls back to an evaluation if it can't do a random playout quickly enough. This might prove to be a useful innovation for tafl games—it isn't nearly as hard to write an evaluation function for tafl games as it is for go, and it gets a lot of depth cheaply, which lets your evaluation function be simpler. (Generally speaking, it's almost always better to search more deeply than to write a better evaluation function.)

Zekka · « **Reply #10 on:** March 18, 2016, 09:13:28 pm »

I'm a bit tired to longpost today. But I'll leave this note -- I greatly underestimated the number of playouts you typically evaluate out in MCTS, and overestimated how costly it was to finish the game once you start it. It's probably a pretty good bet if you assume immediate decisions (of which there are very few) are much more important than faraway decisions (of which there are very many) -- my guess is that it misses out occasionally on really good conditional plays in the distant future, but on average, for near decisions, probably does pretty well.

TheBiggerFish · « **Reply #11 on:** March 18, 2016, 10:31:48 pm »

PTW!

I have no idea what one would actually do to code any of this but I am nonetheless interested!

Bauglir · « **Reply #12 on:** March 20, 2016, 02:52:09 pm »

I suppose if the barren corpus is a problem, you could set up an NN to compete against itself to generate more games. It'll start out stupid and so take a lot longer (maybe so much so that it's impractical), and you run the very big risk of the durn thing learning to play around its own nonsense idiosyncracies, but it's something. Randomly mixing in real games might help, especially if you have time to play some against it.

at least you'd be able to laugh at it?

Fishbreath · « **Reply #13 on:** March 20, 2016, 05:06:17 pm »

Yeah, the concern with a NN would be finding a local maximum.

A possible option would be the wicked-awesomely-named neuroevolution, using a genetic algorithm to train neural networks, which would have the added bonus that you, at the end of writing your neuroevolution net, would have a neural network you can use for basically any board game.

Arx · « **Reply #14 on:** March 21, 2016, 10:23:28 am »

I've actually been thinking of doing something similar! Bauglir's idea kind of describes one of the things I've been bouncing around in my head, and then it occurred to me that I could run essentially a tournament of randomly weighted neural nets, and then use some genetics stuff to improve the top stock, and so on.

I'm currently looking at using a very shallow tree search based on board position weightings (determined by practice) as a starting point, so I at least have something. It's being a bit finicky, but I have reasonable hopes.

Bay 12 Games Forum

News:

Author Topic: Hnefatafl AI tournament - code your way to glory (Read 10190 times)

Fishbreath

Hnefatafl AI tournament - code your way to glory

Zekka

Re: Hnefatafl AI tournament - code your way to glory

Fishbreath

Re: Hnefatafl AI tournament - code your way to glory

Zekka

Re: Hnefatafl AI tournament - code your way to glory

Arx

Re: Hnefatafl AI tournament - code your way to glory

Zekka

Re: Hnefatafl AI tournament - code your way to glory

Zekka

Re: Hnefatafl AI tournament - code your way to glory

Arx

Re: Hnefatafl AI tournament - code your way to glory

Fishbreath

Re: Hnefatafl AI tournament - code your way to glory

Fishbreath

Re: Hnefatafl AI tournament - code your way to glory

Zekka

Re: Hnefatafl AI tournament - code your way to glory

TheBiggerFish

Re: Hnefatafl AI tournament - code your way to glory

Bauglir

Re: Hnefatafl AI tournament - code your way to glory

Fishbreath

Re: Hnefatafl AI tournament - code your way to glory

Arx

Re: Hnefatafl AI tournament - code your way to glory