Multi-Objective Reinforcement Learning (MORL) is an emerging field that extends the conventional reinforcement learning paradigm by enabling agents to optimise multiple conflicting objectives ...
The National Academies of Sciences, Engineering, and Medicine are private, nonprofit institutions that provide expert advice on some of the most pressing challenges facing the nation and world. Our ...
In a new paper published Thursday titled “Auditing language models for hidden objectives,” Anthropic researchers described how custom AI models trained to deliberately conceal certain “motivations” ...