

When starting from each human opening, AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play. Each of these openings is independently discovered and played frequently by AlphaZero during self-play training. Table 2 analyses the most common human openings (those played more than 100,000 times in an online database of human chess games ). Scalability of AlphaZero with thinking time, measured on an Elo scale.Ī Performance of AlphaZero and Stockfish in chess, plotted against thinking time per move.ī Performance of AlphaZero and Elmo in shogi, plotted against thinking time per move.įinally, we analysed the chess knowledge discovered by AlphaZero. 3 3 3The prevalence of draws in high-level chess tends to compress the Elo scale, compared to shogi or Go. AlphaZero’s MCTS scaled more effectively with thinking time than either Stockfish or Elmo, calling into question the widely held belief that alpha-beta search is inherently superior in these domains. Figure 2 shows the scalability of each player with respect to thinking time, measured on an Elo scale, relative to Stockfish or Elmo with 40ms thinking time. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations – arguably a more “human-like” approach to search, as originally proposed by Shannon. AlphaZero searches just 80 thousand positions per second in chess and 40 thousand in shogi, compared to 70 million for Stockfish and 35 million for Elmo. We also analysed the relative performance of AlphaZero’s MCTS search compared to the state-of-the-art alpha-beta search engines used by Stockfish and Elmo. Each program was given 1 minute of thinking time per move. Tournament evaluation of AlphaZero in chess, shogi, and Go, as games won, drawn or lost from AlphaZero’s perspective, in 100 game matches against Stockfish, Elmo, and the previously published AlphaGo Zero after 3 days of training. AlphaZero convincingly defeated all opponents, losing zero games to StockfishĪnd eight games to Elmo (see Supplementary Material for several example games), as well as defeating the previous version of AlphaGo Zero (see Table 1). Stockfish and Elmo played at their strongest skill level using 64 threads and a hash size of 1GB.

AlphaZero and the previous AlphaGo Zero used a single machine with 4 TPUs. We evaluated the fully trained instances of AlphaZero against Stockfish, Elmo and the previous version of AlphaGo Zero (trained for 3 days) in chess, shogi and Go respectively, playing 100 game matches at tournament time controls of one minute per move. Elo ratings were computed from evaluation games between different players when given one second per move.Ī Performance of AlphaZero in chess, compared to 2016 TCEC world-champion program Stockfish.ī Performance of AlphaZero in shogi, compared to 2017 CSA world-champion program Elmo.Ĭ Performance of AlphaZero in Go, compared to AlphaGo LeeĪnd AlphaGo Zero (20 block / 3 day). Further details of the training procedure are provided in the Methods. 1 1 1The original AlphaGo Zero paper used GPUs to train the neural networks. Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs to generate self-play games and 64 second-generation TPUs to train the neural networks. We trained a separate instance of AlphaZero for each game. Unless otherwise specified, the same algorithm settings, network architecture, and hyper-parameters were used for all three games. We applied the AlphaZero algorithm to chess, shogi, and also Go.
