One way to understand the alignment problem is as a search for a vector, or texturing, which users can endorse, and which gives the model a very wide latitude to take action, which users can endorse. A goal is kind of like this. User says their goal. The model could do many different things. To achieve that goal. But many of those would not be endorsed by the user. So goal is not a very good fit here.

Trauma?

We forget that meaning is real, common, shared

The worst part of all this, is at some point the vocabulary of preferences and goals starts to cannibalize other notions of flourishing.