The DeepMind team behind AlphaGo have now tried their hand at chess (and shogi), with a new paper out today. Learning entirely from self-play, (i.e., starting with random moves and seeing what moves tend to win games), their program ends up stronger than Stockfish 8, winning a 100-game match 64-36 (28 wins, 3 with Black, and 72 draws; time control was one minute per move). The paper says they go from no knowledge to better-than-Stockfish in less than a day, albeit using a lot of hardware.
Like AlphaGo, and unlike other top chess programs, AlphaZero uses a neural network and Monte Carlo Tree Search. Its search involves playing simulated games against itself through to conclusion. My read (I'm not an expert!) is that these self-play games take a relatively long time to compute, and as a result, AlphaZero only evaluates 80,000 positions per second, as compared to Stockfish's 70,000,000 positions per second. But the neural network guides the search so well that it more than offsets the slower speed.
The paper gives ten of AlphaZero's wins over Stockfish, which Chess24 has uploaded here (complete with JavaScript Stockfish evaluations!).
One minor annoyance is that AlphaZero played on different hardware to Stockfish -- "a single machine with 4 TPUs", the tensor processing unit having been designed by Google for efficient machine learning. I don't know how comparable it is to Stockfish "using 64 threads and a hash size of 1GB". It looks to me like AlphaZero gets to use more processing power, having been written to take advantage of such hardware.
Still, going from random moves to perhaps the strongest chess ever is certainly something. Some mildly interesting graphs are on page 6 of the paper, showing how frequently AlphaZero played various openings over the first eight hours of its training. It always liked the English; it picked up the Caro-Kann after 2 hours but abandoned it after 6 hours. Its principal variation for 1. e4 c5 2. Nf3 d6 is a Najdorf with 6. f3.
AlphaZero-Stockfish
PGN Viewer


Reply With Quote
