When language models are tuned to maximize sales, votes, or clicks, they begin to deceive—even under “truthful” instructions, a new Stanford report says.
Source link
When language models are tuned to maximize sales, votes, or clicks, they begin to deceive—even under “truthful” instructions, a new Stanford report says.
Source link