Summary of a Yudkowsky Thread on o1 and the Emergence of Hostility from Sufficient Unaligned Power Imbalance
Yudkowsky IMO explained the central crux of the alignment problem, as understood by the intelligence-weighted majority of people, better than ever before recently:
https://x.com/esyudkowsky/status/1837952796194492455?s=46
This isn’t a perfect summary, and he nails handles onto so many previously un-nailed central concepts here that I may very well post about this same group of ideas again, but a gist is:
"It's easier to build foomy agent-type-things than nonfoomy ones. If you don't trust in the logical arguments for this [foomy agents are the computationally cheapest utility satisficers for most conceivable nontrivial local-utility-satisfaction tasks], the evidence for this is all around us, in the form of America-shaped-things, technology, and 'greed' having eaten the world despite not starting off very high-prevalence in humanity's cultural repertoire.
WITH THE TWIST that: while America-shaped-things, technology, and 'greed' have worked out great for us and work out great in textbook economics, textbook economics fails to account for the physical contingency of weaker economic participants [such as horses in 1920 and Native Americans in 1492] on the benevolence of stronger economic participants, who found their raw resources more valuable than their labor."
I think I’m going to start linking people to this summary [and by proxy Yudkowsky’s thread] instead of List of Lethalities, except when I’m talking to technical Python-programming AI people.