资讯

input, hidden): output, hidden = self.rnn(input, hidden) logits = [head(output.squeeze(0)) for head in self.policy_heads] ...