PDA

View Full Version : AlphaZero



pappubahry
06-12-2017, 11:55 PM
The DeepMind team behind AlphaGo have now tried their hand at chess (and shogi), with a new paper (https://arxiv.org/abs/1712.01815) out today. Learning entirely from self-play, (i.e., starting with random moves and seeing what moves tend to win games), their program ends up stronger than Stockfish 8, winning a 100-game match 64-36 (28 wins, 3 with Black, and 72 draws; time control was one minute per move). The paper says they go from no knowledge to better-than-Stockfish in less than a day, albeit using a lot of hardware.

Like AlphaGo, and unlike other top chess programs, AlphaZero uses a neural network and Monte Carlo Tree Search. Its search involves playing simulated games against itself through to conclusion. My read (I'm not an expert!) is that these self-play games take a relatively long time to compute, and as a result, AlphaZero only evaluates 80,000 positions per second, as compared to Stockfish's 70,000,000 positions per second. But the neural network guides the search so well that it more than offsets the slower speed.

The paper gives ten of AlphaZero's wins over Stockfish, which Chess24 has uploaded here (https://chess24.com/en/watch/live-tournaments/alphazero-vs-stockfish/1/1/1) (complete with JavaScript Stockfish evaluations! :P ).

One minor annoyance is that AlphaZero played on different hardware to Stockfish -- "a single machine with 4 TPUs", the tensor processing unit (https://en.wikipedia.org/wiki/Tensor_processing_unit) having been designed by Google for efficient machine learning. I don't know how comparable it is to Stockfish "using 64 threads and a hash size of 1GB". It looks to me like AlphaZero gets to use more processing power, having been written to take advantage of such hardware.

Still, going from random moves to perhaps the strongest chess ever is certainly something. Some mildly interesting graphs are on page 6 of the paper, showing how frequently AlphaZero played various openings over the first eight hours of its training. It always liked the English; it picked up the Caro-Kann after 2 hours but abandoned it after 6 hours. Its principal variation for 1. e4 c5 2. Nf3 d6 is a Najdorf with 6. f3.

AlphaZero-Stockfish
1. Nf3 Nf6 2. d4 e6 3. c4 b6 4. g3 Bb7 5. Bg2 Be7 6. 0-0 0-0 7. d5 exd5 8. Nh4 c6 9. cxd5 Nxd5 10.
Nf5 Nc7 11. e4 d5 12. exd5 Nxd5 13. Nc3 Nxc3 14. Qg4 g6 15. Nh6+ Kg7 16. bxc3 Bc8 17. Qf4 Qd6
18. Qa4 g5 19. Re1 Kxh6 20. h4 f6 21. Be3 Bf5 22. Rad1 Qa3 23. Qc4 b5 24. hxg5+ fxg5 25. Qh4+
Kg6 26. Qh1 Kg7 27. Be4 Bg6 28. Bxg6 hxg6 29. Qh3 Bf6 30. Kg2 Qxa2 31. Rh1 Qg8 32. c4 Re8 33.
Bd4 Bxd4 34. Rxd4 Rd8 35. Rxd8 Qxd8 36. Qe6 Nd7 37. Rd1 Nc5 38. Rxd8 Nxe6 39. Rxa8 Kf6 40.
cxb5 cxb5 41. Kf3 Nd4+ 42. Ke4 Nc6 43. Rc8 Ne7 44. Rb8 Nf5 45. g4 Nh6 46. f3 Nf7 47. Ra8 Nd6+
48. Kd5 Nc4 49. Rxa7 Ne3+ 50. Ke4 Nc4 51. Ra6+ Kg7 52. Rc6 Kf7 53. Rc5 Ke6 54. Rxg5 Kf6 55.
Rc5 g5 56. Kd4 1-0

Desmond
07-12-2017, 11:22 AM
Learning entirely from self-play, (i.e., starting with random moves and seeing what moves tend to win games), their program ends up stronger than Stockfish 8, winning a 100-game match 64-36 (28 wins, 3 with Black, and 72 draws; time control was one minute per move). The paper says they go from no knowledge to better-than-Stockfish in less than a day, albeit using a lot of hardware.

Wow that is very impressive, they must have re-written a lot of theory

AlexDavies
07-12-2017, 12:06 PM
The DeepMind team behind AlphaGo have now tried their hand at chess (and shogi), with a new paper (https://arxiv.org/abs/1712.01815) out today. Learning entirely from self-play, (i.e., starting with random moves and seeing what moves tend to win games), their program ends up stronger than Stockfish 8, winning a 100-game match 64-36 (28 wins, 3 with Black, and 72 draws; time control was one minute per move).

28-0 is pretty impressive, even considering that AlphaZero had a hardware advantage. For comparison, in games on chessgames.com, Deep Thought beat other computers 21-2 (with draws). However, the two losses were against Hans Berliner's HiTech, which also had specialised hardware. (Darryl Johansen had a win too, but his hardware is definitely too specialised to be considered in scope for this analysis).

In addition, Deep Blue won the 1994 ACM Computer Chess International with 4 wins and one default due to a power outage, but only scored 3.5/5 in the Hong Kong 1995 World Championship (losing to Fritz). Combining all this gives a 28-3 score for Deep Thought/Deep Blue. Probably it wouldn't be too hard to find a lot more games though. (E.g., I remember that Deep Thought used to play on the American Internet Chess Server).

So AlphaZero has a better record versus its peers (peer?) than Deep Thought/Deep Blue if draws are ignored, but not if they are included. It's hard to compare, since the level of play is much higher now; computer matches are now usually mainly draws with one side often winning most or all of the remainder (mostly with White).

MichaelBaron
07-12-2017, 03:53 PM
The most amazing part is that AlphaGo's play appears to be quite ''human'' in nature!

Patrick Byrom
07-12-2017, 06:08 PM
An amazing breakthrough! It will be fascinating to see the technology applied to other areas, such as mathematics.

triplecheck
08-12-2017, 09:11 PM
It's called AlphaZero not AlphaChess because it can play any game of this type once you add a module telling it what the rules of that game are. So it also dusted off it's own predecessor AlphaGo that beat the world's best Go player, and rubbed it in by beating the champion Shogi program (which however got in a few wins of its own).

LyudmilTsvetkov
09-12-2017, 11:12 PM
It's called AlphaZero not AlphaChess because it can play any game of this type once you add a module telling it what the rules of that game are. So it also dusted off it's own predecessor AlphaGo that beat the world's best Go player, and rubbed it in by beating the champion Shogi program (which however got in a few wins of its own).

Visit Talkchess Forum to know more, there is where you will find some 3000 programmers.
This was all a scam.
Alpha played on 30 times bigger hardware than SF, 4TPUs vs 64 cores.
4TPUs is around 1000 cores or even more.
Alpha had simulated opening book, trained on countless top GM winning games.
SF had very little hash.
TC was fixed at 1 minute per move, which is again detrimental to SF, which has advanced time management.
TPUs lack the SMP inefficiencies with more cores, so the hardware advantage was even bigger.
Etc, etc., so basically, this was just a huge publicity stunt on the part of Google.
Currently, Alpha is around 2800 on single core, so 400 elos below SF, and will not advanced much in the future, as, from now on, it will need advanced evaluation it will not be able to discover.
Concerning the 4-hours issue, well, LOL, this was 48 hours ago, so now Alpha is at 5000 elo?
Come on.

Andrew Hardegen
10-12-2017, 07:06 PM
Visit Talkchess Forum to know more, there is where you will find some 3000 programmers.
This was all a scam.
Alpha played on 30 times bigger hardware than SF, 4TPUs vs 64 cores.
4TPUs is around 1000 cores or even more.
Alpha had simulated opening book, trained on countless top GM winning games.
SF had very little hash.
TC was fixed at 1 minute per move, which is again detrimental to SF, which has advanced time management.
TPUs lack the SMP inefficiencies with more cores, so the hardware advantage was even bigger.
Etc, etc., so basically, this was just a huge publicity stunt on the part of Google.
Currently, Alpha is around 2800 on single core, so 400 elos below SF, and will not advanced much in the future, as, from now on, it will need advanced evaluation it will not be able to discover.
Concerning the 4-hours issue, well, LOL, this was 48 hours ago, so now Alpha is at 5000 elo?
Come on.

Yes, I also have similar misgivings. I would guess that the other factors contributed to the result, but that the hardware advantage was decisive.

Vlad
10-12-2017, 11:46 PM
Reminds me of Deep Blue beating Kasparov: similar story but different actors.

LyudmilTsvetkov
11-12-2017, 12:11 AM
Reminds me of Deep Blue beating Kasparov: similar story but different actors.

Precisely.
Deep Blue would have been 2500 or so on single core.

Desmond
11-12-2017, 07:09 PM
Precisely.
Deep Blue would have been 2500 or so on single core.

What is the relevance of comparison on single core? Seems to just gimp applications that take advantage of multi-threading.

Ian Rout
12-12-2017, 02:05 PM
It's impressive that AlphaZero is able to be taught the rules of anything. This at least raises the prospect of robot arbiters, which could uncontroversially enforce, say, rules about two-handed castling or dress codes.

Less impressive is that as far as I can tell the developers simply put it back in the box after flogging a neutered Stockfish. It briefly sounds good but Stockfish, running on a bigger computer, could also beat loser-Stockfish. So it's not really an advance.

In the spirit of genuine scientific exploration I would want to

(a) set it a harder task to see exactly how good it is, like beating a better Stockfish, or running both opponents on identical laptops from Harvey Norman, or evaluating the famous drawn minor piece ending from Karjakin-Carlsen.

(b) play against strong human opponents; that sounds silly but if the argument is that it is in some sense "thinking" rather than just taking advantage of its electronic advantage that might be interesting - unfortunately this would be of no marketing benefit if it won (computers already beat humans) but would be a bit of a downer if it lost, which perhaps explains why it wasn't tried.

(c) let it teach itself for longer and try again (after first getting a banchmark against a stronger opponent) to see if it keeps getting better, rather than just making impressive strides for four hours then hitting a wall.

I haven't seen anything (as revealed by Google) about what they intend to do next with chess, if anything - does anybody know more? Let's hope that the plan is not to simply announce that they've solved the problem and move on to teaching it soccer or line dancing.

triplecheck
14-12-2017, 12:35 AM
Less impressive is that as far as I can tell the developers simply put it back in the box after flogging a neutered Stockfish. It briefly sounds good but Stockfish, running on a bigger computer, could also beat loser-Stockfish.
Well, AlphaZero also beat the best available Go and Shogi programs, which it had more trouble against. By the way, how is Stockfish coming along with its Go game??


like beating a better Stockfish, or running both opponents on identical laptops from Harvey Norman
Stockfish, I read would gain only 10 Elo from doubling its number of processors, and another 5 from doubling again. Rather more helpful would be its tablebases and opening book, but what's the interest in playing an opponent who after every move runs over to the bookshelf and comes back and plays the recommended move. Is this AI? It's more like being a librarian. I doubt the Google team want to build their own libraries. It's not interesting to do.

You understand that it would be running on the graphic processor of the Harvey Norman laptop?


play against strong human opponents; that sounds silly
I guess.


let it teach itself for longer and try again (after first getting a banchmark against a stronger opponent) to see if it keeps getting better, rather than just making impressive strides for four hours then hitting a wall.
Actually it hit a wall after about 2 hours and only gained about another 40 Elo in the last two. And one estimate was that it was drawing 1.25 megawatts while doing this training. You want to pay the power bill?


I haven't seen anything (as revealed by Google) about what they intend to do next with chess, if anything - does anybody know more? .
Probably not much. Go and Shogi seem to have prior claims to a title shot.


move on to teaching it soccer or line dancing
More like medical diagnostics (which they will talk about) and military drone swarm tactics (which they won't talk about).

No, it isn't really clear how strong it is. I went looking for estimates and they varied between 3400 and 3500. Say it's 3500 - if it isn't then it will be soon, since these large machine-learning chips are new and are being rapidly developed. The ultimate limit for a chessplayer (that is, you can calculate any position right to the end) is conjectured to be between 3500 and 3600 with near certainty that it's less than 3600.

Once you're 3500 then you are as close as dammit to perfect. Nothing ever will be able to score more than 60% against you. You can draw 80% of your games against God (maybe much better if you decide to start playing for a draw!). There's not much room for further improvement and you might as well just believe what it tells you will be the final result if you play this move here. When Stockfish dives in to a position then with 30-ply searches and tablebases it is frequently bumping along the bottom. There's not enough water under it to dive deeper.

Kevin Bonham
23-12-2017, 07:21 AM
A view from Ken Regan that I thought was worth posting:

https://rjlipton.wordpress.com/2017/12/17/truth-from-zero/

Vlad
23-12-2017, 08:20 PM
What I am really puzzled by is that they seem to publish a paper in Nature. How do they manage to hide the majority of the games at the same time? Will not the refereeing process require to provide this information? Why not cooperate with other scientists and evaluate the true potential of this program? :doh:

triplecheck
25-12-2017, 11:14 PM
@Vlad ^

Are you talking about Go or Chess?

There's a paper on the ArXiv on AlphaZero Chess.

There's a paper in Nature on AlphaZero Go.

In the Nature Go paper there appears to be a mass of games in the Extended Data section of the online version of the paper. If the Arxiv paper eventually turns up in Nature after a few months refereeing, I would surmise there would be all the Stockfish games in the Extended Data. Has it been submitted to Nature? I don't know what you mean by "seem to publish".

Vlad
26-12-2017, 11:26 PM
I was reading the link provided by Kevin with Ken Regan's view. Many of his comments indicated that there is only limited number of games provided and as a result it is very hard for Ken to make any conclusions. There is also mentioning of a paper in Nature. Quite possibly you are right and that paper is related to Go.

My general frustration is that their approach does not seem to be very scientific. I would be extremely happy if somebody else was interested in reading/commenting my papers. What exactly do they gain by hiding most of the games? Somebody can copy their approach? Highly unlikely. More likely is that they are worried that somebody will find a problem with their approach.

TheARBChessSys
02-01-2018, 12:16 AM
How Stockfish 8 Could Have Drawn Each Game Against Google Deep Mind AI AlphaZero...

https://www.youtube.com/watch?v=ZGypfNUXM2U

TheARBChessSys
02-01-2018, 12:17 AM
International Master Erik Kislik looks through all of AlphaZero’s published wins against Stockfish and demonstrates how Stockfish could have drawn in all of them. With stronger hardware, the latest Stockfish version, a bigger hash table and tablebases, probably only two or three out of the ten games should have been lost.

In game 1: 31. Kg2 would have prevented …Bh3, when White’s king would have been safe enough to defend. Here is my analysis: http://view.chessbase.com/cbreader/2017/12/12/Game18859335.html
Note that Kg2 is a human move to stop an obvious threat.

In game 2: 38. Rg1! would have defended. This is a human move, intending to play f2-f4 and obtain a natural pawn break. White needs to seek counterplay immediately or else he will lose the c4 pawn: http://view.chessbase.com/cbreader/2017/12/12/Game18902765.html

In game 3: Stockfish had a draw in hand with 22. …Nh5: http://view.chessbase.com/cbreader/2017/12/11/Game382065231.html
and even later on in the game likely could have drawn with perfect play with 25. …Rad8 or 49. …Kf8.

In game 4: Stockfish had a drawn endgame after 52. …Rf1 down a pawn: http://view.chessbase.com/cbreader/2017/12/12/Game19249290.html

In game 5, 17. …Qd8 would have defended, rather than having to put the queen on the terrible h7 square. Qd8 is obviously the more human defense: http://view.chessbase.com/cbreader/2017/12/12/Game19274188.html

In game 6, 18. …Be6 would have defended, although 20. …Nc5 was also fine: http://view.chessbase.com/cbreader/2017/12/12/Game19294936.html

In game 7: Stockfish could have drawn by 39. …Rdd8, planning to play …Kf7 and cover the h4 pawn by …Rh8 when needed. It lost this position most likely due to lack of depth. I checked this position at depth 58 on the December 11th Stockfish development version and confirmed that this a draw. With over 12 million tablebase hits, I got a score of +.5, yet no plan to make progress: http://view.chessbase.com/cbreader/2017/12/12/Game19347212.html

In game 8: 33. …Qf7 would have held the position. Black’s kingside is sufficiently solid to withstand any attempts to break it down: http://view.chessbase.com/cbreader/2017/12/11/Game384743783.html

In game 9: Stockfish could have defended with 28. …Qe7! and then …Bh6-xg5, drawing and nullifying all of White’s previously enterprising play, emphasizing that White’s concept of Kxd2-e3 was not particularly special after all because Black’s position was just too solid: http://view.chessbase.com/cbreader/2017/12/12/Game19414698.html

In game 10, the human 27. …Bxe4 would have drawn, illustrating that the deep sacrifice played by AlphaZero was only sufficient for equality and not more. 27. …Bg6? was a very bad move, permanently weakening Black’s king: http://view.chessbase.com/cbreader/2017/12/12/Game19451171.html

In short, my suggested game-saving improvements are:

Game 1: 31. Kg2, Game 2: 38. Rg1, game 3: 22. ...Nh5, game 4: 52. ...Rf1, game 5: 17. ...Qd8, game 6: 18. ...Be6, game 7: 39. ...Rdd8, game 8: 33. ...Qf7, game 9: 28. ...Qe7, game 10: 27. ...Bxe4. Only games 5 and 6 were losing in the early middlegame, and that was due to poor queen placement. Games 1, 2, 4, 7, 8, 9 and 10 should have been drawn under TCEC conditions, and perhaps 2 or 3 of the other games as well. The games were fantastic to see and I really hope we get to see more of them under any conditions.

triplecheck
27-01-2018, 10:10 PM
Noting that China has more or less cloned AlphaZero and produced a self-learning Go program that beat China's best Go player - while giving a handicap!

http://www.abc.net.au/news/2018-01-25/china-determined-to-match-western-competitors-in-ai/9357048

FM_Bill
01-02-2018, 06:04 PM
The AlphaZero - StockFish was an unfair match for all the reasons given. Even a small different in hardware can lead to a huge difference in results. For example an engine playing itself at twice the speed, would crush it.

Nonetheless, an impressive resulr for the AlphaZero team and the approach has some major advantages (and diadvantages)
over conventional chess engines.

To clarify the 80,000 nodes v 70,000,000, its like ly the random playouts by AZ (random games from the currently analysed position) are not counted. Each playout could average 100 moves or more. If all 80,000 nodes have 800 playouts, thats 80,000 x 800 x 100 = 6,400,000,000.

triplecheck
03-02-2018, 05:35 PM
To clarify the 80,000 nodes v 70,000,000, its like ly the random playouts by AZ (random games from the currently analysed position) are not counted. Each playout could average 100 moves or more. If all 80,000 nodes have 800 playouts, thats 80,000 x 800 x 100 = 6,400,000,000.

Muddify?

There are no playouts.
None.

In the training phase, they do 800 tree searches from a given game position. These are done in parallel. Each one needs 4 (model 1) TPUs. So they use 3200 TPUs just to do that.
The tree searches do not go the end of the game.

In playing against Stockfish (playing now not training), they use 4 (model 2) TPUs, so they can do 1 (that's ONE) tree search. That generates 80000 nodes in the tree every second.

Why so few? Because every time they reach a position, they have to do a passive evaluation of it. So (simplifying) does Stockfish. Stockfish's evaluation counts 3 points for a bishop (whatever), 9 for a queen, but with special values for B+N v. R+P, and different again whether it's B+N v. R+P in the middlegame or endgame, and subtracts points for knight-on-the-edge-brings-woe and loads of other things. It's big.

But AlphaZero's evaluation function is HUGE (AlphaGo Zero's had 46 million parameters). It calculates it by running the input data through a really big neural net (not the biggest ever, but it's up there). It uses a lot of parallel processing in the TPUs to calculate it, but it's still slow. It's also very very good.

* TPU = Tensor Processing Unit, a proprietary Google chip designed to support neural net software. About equivalent to 2 of NVIDIA's top-of-the-line graphic cards, which similarly do massive parallel processing. The earliest AlphaGo versions ran on GPUs but when TPUs became available, they switched to those.

*AlphaGo Zero is confusingly not the same as AlphaZero Go ( the latest one).

Kevin Bonham
08-12-2018, 06:41 PM
Further developments in Alpha Zero, which seem to have addressed criticisms of the initial results:

https://deepmind.com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/

There is also a paper in Science:

https://deepmind.com/documents/260/alphazero_preprint.pdf

MichaelBaron
09-12-2018, 01:32 AM
GM Sadler's book on Alpha is apparently an interesting read.

Joschobam
18-10-2020, 07:16 PM
Is there any statement on whether Alpha will participate in the WCCC?


Further discussions about the true strength would be finished.

game_analyst
13-03-2021, 02:38 PM
The paper they published in Nature was long and written in an obscure manner. It did not reveal any essential details about the training method. It looked to me like it was mainly for PR, and to attract talent to the Deep Mind project.