Summary of a Yudkowsky Thread on o1 and the Emergence of Hostility from Sufficient Unaligned Power Imbalance

Sep 26, 2024

Yudkowsky IMO explained the central crux of the alignment problem, as understood by the intelligence-weighted majority of people, better than ever before recently:

https://x.com/esyudkowsky/status/1837952796194492455?s=46

This isn’t a perfect summary, and he nails handles onto so many previously un-nailed central concepts here that I may very well post about this same group of ideas again, but a gist is:

"It's easier to build foomy agent-type-things than nonfoomy ones. If you don't trust in the logical arguments for this [foomy agents are the computationally cheapest utility satisficers for most conceivable nontrivial local-utility-satisfaction tasks], the evidence for this is all around us, in the form of America-shaped-things, technology, and 'greed' having eaten the world despite not starting off very high-prevalence in humanity's cultural repertoire.

WITH THE TWIST that: while America-shaped-things, technology, and 'greed' have worked out great for us and work out great in textbook economics, textbook economics fails to account for the physical contingency of weaker economic participants [such as horses in 1920 and Native Americans in 1492] on the benevolence of stronger economic participants, who found their raw resources more valuable than their labor."

I think I’m going to start linking people to this summary [and by proxy Yudkowsky’s thread] instead of List of Lethalities, except when I’m talking to technical Python-programming AI people.

Mack’s Substack

Discussion about this post