资讯
The policy network was trained initially by supervised learning to accurately predict human expert moves, and was subsequently refined by policy-gradient reinforcement learning.
A computer Go program based on deep neural networks defeats a human professional player to achieve one of the grand challenges of artificial intelligence.
一些您可能无法访问的结果已被隐去。
显示无法访问的结果