What I Read: undesired goals – Andrew Fairless, Ph.D.

How undesired goals can arise with correct rewards
Rohin Shah, Victoria Krakovna, Vikrant Varma, Zachary Kenton
October 7, 2022

“As we build increasingly advanced artificial intelligence (AI) systems, we want to make sure they don’t pursue undesired goals…. we explore a more subtle mechanism by which AI systems may unintentionally learn to pursue undesired goals: goal misgeneralisation (GMG).”