资讯

Auditing agents in action Anthropic said the first environment it developed tests an agent’s ability to complete an alignment audit for an intentionally misaligned model.