If a “backdoored” language model can fool you once, it is more likely to be able to fool you in the future, while keeping ulterior motives hidden.
Source link 
AI Can Be Trained for Evil and Conceal Its Evilness From Trainers, Antropic Says


 
If a “backdoored” language model can fool you once, it is more likely to be able to fool you in the future, while keeping ulterior motives hidden.
Source link