资讯

The policy network was trained initially by supervised learning to accurately predict human expert moves, and was subsequently refined by policy-gradient reinforcement learning.
A computer Go program based on deep neural networks defeats a human professional player to achieve one of the grand challenges of artificial intelligence.