We coordinate research across multiple labs to create ML models that: (1) understand and support the meanings behind important human practices like science, democracy, education, leadership, and love, as well as what's meaningful to individual users; and (2) develop their own values and sources of meaning in positions of responsibility.

<aside> ⏲️ Personelle

Definition

A "Wise AI", as defined here, is an AI system with at least these two features:

  1. It “struggles” with the moral situations it finds itself in. It can comprehend the moral significance of the situations it encounters, and learn from these scenarios, recognizing new moral implications by observing and guessing at outcomes and possibilities. And it can use these moral learnings to revise internal policies (values) that guide its decision-making processes.
  2. It uses “human-compatible” reasons and values. It recognizes as good the same kinds of things we broadly recognize as good, plus possibly more sophisticated things we cannot yet recognize as good. It can articulate its values and how they influenced its decisions, in a way humans can comprehend.

Additionally, we sometimes add a third or fourth feature:

Wise AI on Github

GitHub - meaningalignment/wise-ai

Results so far

We believe our Wise AI evaluation suite already shows limits of existing models. GPT4 shows a good understanding of morally significant situations, but generally does not respond appropriately to these situations. Current models demonstrate a rich understanding of human values, but struggle to apply those values in their responses.

Ultimately, we expect the models that will ace the suite will be trained with new methods and data sets, focused on moral reasoning in various situations. We also hope for models with new architectures that can explicitly encode their values, and recognize (as humans do) whether they're adhering to them, or are on shaky ground.

Wise AI Research Areas

Evaluate current models on wisdom

Show where Wise AI wins

Problems of Superwisdom Supervision

New Architectures

Interpretability

Current, Early Subprojects

Democratic Fine-Tuning