Cultural alignment of machine-vision representations

October 17, 2022

Deep neural network representations of visual entities have been used as inputs to computational models of human mental representations. Though these models have been increasingly successful in predicting behavioral and physiological responses, the implicit notion of “human” that they rely upon often glosses over individual- level differences in subjective beliefs, attitudes, and associations, as well as group- level cultural constructs. Here, we align machine-vision representations to the consensus among a group of respondents by extending Cultural Consensus Theory to include latent constructs structured as fine-tuned pretrained machine-vision systems. We apply the model to a large-scale dataset of people’s first impressions of others. We show that our method creates a robust mapping between machine- vision representations and culturally constructed human representations.