资讯
The policy network was trained initially by supervised learning to accurately predict human expert moves, and was subsequently refined by policy-gradient reinforcement learning.
A computer Go program based on deep neural networks defeats a human professional player to achieve one of the grand challenges of artificial intelligence.
当前正在显示可能无法访问的结果。
隐藏无法访问的结果