Many human languages have rich vocabularies devoted to communicating emotions. Although not all emotion words are common—the German word Sehnsucht refers to a strong desire for an alternative life and has no direct translation in English—there are many words that appear to name similar emotional states across the world’s spoken languages. Translation dictionaries, for example, suggest that the English word love can be equated with the Turkish word sevgi and the Hungarian word szerelem. But does this mean that the concept of “love” is the same in English, Turkish, and Hungarian? Here, we explore this question by examining the meaning of emotion concepts in a sample of 2474 languages from 20 major language families. Using a new method from comparative linguistics, we examine sources of variation and structure in emotion semantics across this global sample of languages.
许多人类语言都有丰富的词汇表,专门用于交流情感。虽然并非所有情感词都很常见——德语单词 Sehnsucht 指的是对替代生活的强烈渴望,在英语中没有直接翻译——但世界上有许多词似乎代表了类似的情绪状态。例如,翻译词典表明,英语单词 love 可以与土耳其语单词 sevgi 和匈牙利语单词 szerelem 等同。但这是否意味着“爱”的概念在英语、土耳其语和匈牙利语中是相同的?在这里,我们通过检查来自 20 个主要语系的 2474 种语言样本中情感概念的含义来探讨这个问题。使用比较语言学的新方法,我们检查了这个全球语言样本中情感语义的变异和结构来源。
Early theories of emotion, drawing from Darwin (
1), suggested that there are a discrete number of universal emotions from which all other emotions are derived (
2–
4). Many of these theories claimed that, just as there are primary colors (e.g., yellow, red), there may be primary emotions (e.g., anger, sadness) that evolved in mammalian brains (
4). In turn, many languages may develop words for primary emotion concepts such as “anger” and “sadness” because these concepts name experiences derived from universal biological structures that are shared by all humans (
2–
4). These theories do allow for cultural and linguistic variation in emotion, but tend not to model or predict this variation.
早期的情绪理论借鉴了达尔文 (1),表明存在离散数量的普遍情绪,所有其他情绪都是从中得出的 (2-4)。其中许多理论声称,正如存在原色(例如,黄色、红色)一样,哺乳动物大脑中可能存在进化的主要情绪(例如,愤怒、悲伤)[4]。反过来,许多语言可能会发展出主要情绪概念的词,例如“愤怒”和“悲伤”,因为这些概念命名了来自全人类共有的普遍生物结构的经验 (2-4)。这些理论确实允许情感的文化和语言差异,但往往不建模或预测这种变化。There is a growing recognition, however, that emotions can vary systematically in their meaning and experience across culture and language (
5–
7). Constructionist models of emotion in particular claim that concepts such as “anger” and “sadness” do not derive from dedicated brain structures (
8), but occur when humans make socially learned inferences about the meaning of basic physiological processes linked to maintaining the body’s homeostasis (
9,
10). The meaning of emotion concepts (i.e., “emotion semantics”) should thus draw from both culturally evolved conceptualizations as well as biologically evolved physiology.
然而,人们越来越认识到,情绪的含义和体验可以因文化和语言而异 (5-7)。特别是建构主义的情绪模型声称,“愤怒”和“悲伤”等概念并非源自专门的大脑结构 (8),而是当人类对与维持身体体内平衡相关的基本生理过程的含义做出社会学的推断时发生的 (9, 10).因此,情绪概念的含义(即“情绪语义学”)应该来自文化进化的概念化以及生物进化的生理学。If cultural evolutionary processes shape the meaning of emotion concepts, the historical relationships between language groups should predict which languages have the most similar emotion semantics. Language groups in closer geographic proximity are the most likely to engage in borrowing (the sharing of concepts, norms, etc.) and also tend to share more recent common ancestors than geographically distant groups (
11). We thus hypothesize that emotion semantics are associated with a language group’s geographic location: Language groups in close geographic proximity may have more similar emotion semantics than distant groups. Although cultural variation in emotion is plausible under many models of emotion, a link between geographic distance and emotion semantics would support constructionism’s claim that emotions are conceptualized using social learning.
如果文化进化过程塑造了情感概念的含义,那么语言群体之间的历史关系应该预测哪些语言具有最相似的情感语义。地理上距离较近的语言群体最有可能进行借用(概念、规范等的共享),并且与地理上相距较远的群体相比,他们也倾向于共享更新的共同祖先 (11)。因此,我们假设情感语义与语言群体的地理位置相关:地理上靠近的语言群体可能比遥远的语言群体具有更相似的情感语义。尽管在许多情感模型下,情感的文化差异是合理的,但地理距离和情感语义之间的联系将支持建构主义的主张,即情感是通过社会学习概念化的。Biologically evolved physiology should provide universal structure to emotion semantics, but the exact sources of this structure are not clear. Constructionist models of emotion emphasize the roles of valence—the hedonic pleasantness versus unpleasantness of emotions—and activation—the physiological arousal associated with experiencing emotions (
8–
10). According to these models, valence and activation reflect basic neurophysiological processes that signal when the body shifts away from homeostasis (
9)
, and the universal importance of these processes may lead all languages to differentiate emotions primarily on the basis of their degree of valence and activation. Other accounts, however, suggest that factors such as dominance, certainty, sociality, and approach-avoidance may also represent universal dimensions of variance in emotion semantics (
12–
15).
生物进化的生理学应该为情绪语义提供通用结构,但这种结构的确切来源尚不清楚。建构主义情绪模型强调效价的作用——情绪的享乐愉快与不愉快——和激活——与体验情绪相关的生理唤醒 (8-10)。根据这些模型,效价和激活反映了基本的神经生理学过程,当身体脱离体内平衡时发出信号 (9),这些过程的普遍重要性可能导致所有语言主要根据它们的效价和激活程度来区分情绪。然而,其他解释表明,诸如支配性、确定性、社交性和接近回避等因素也可能代表了情绪语义学中方差的普遍维度 (12-15)。Predictions about the influence of culture and biology on emotion have long been examined and debated, yet findings from past studies are mixed. An early study found that human subjects from remote Papua New Guinea matched posed facial expressions to emotional situations at similar rates to North Americans (
16), whereas recent field studies among other small-scale societies have found considerably more cultural variability in people’s conceptualization of emotion (
17). These mixed results may be due to methodological limitations of past research. Owing to logistical challenges, the vast majority of cross-cultural studies have been two-group comparisons (
17), and the few multigroup studies on emotion have sampled predominantly from industrial and globalized nations (
18,
19). Moreover, human subject–based studies seldom present emotions as they naturally occur, instead using posed facial expressions, fictional vignettes, and exaggerated vocalizations as test stimuli. Finally, human subject–based studies may be susceptible to demand characteristics and researcher bias: Studies with imposed training phases and forced choice paradigms have found evidence for universal recognition of emotion (
16), whereas studies with fewer constraints have found more cultural variability (
17).
关于文化和生物学对情绪影响的预测长期以来一直受到检验和辩论,但过去研究的结果喜忧参半。一项早期研究发现,来自偏远的巴布亚新几内亚的人类受试者将姿势面部表情与情绪情境相匹配的比率与北美人相似 (16),而最近对其他小规模社会的实地研究发现,人们对情绪概念化的文化差异要大得多 (17).这些喜忧参半的结果可能是由于过去研究的方法学局限性。由于后勤方面的挑战,绝大多数跨文化研究都是两组比较 (17),少数关于情绪的多组研究主要来自工业化和全球化国家 (18, 19)。此外,基于人类受试者的研究很少呈现自然产生的情绪,而是使用摆姿势的面部表情、虚构的小插曲和夸张的发声作为测试刺激。最后,基于人类受试者的研究可能容易受到需求特征和研究人员偏见的影响:具有强加训练阶段和强制选择范式的研究发现了对情感的普遍认可的证据 (16),而限制较少的研究发现了更多的文化可变性 (17)。As an alternative to human subjects–based research, analyses of naturally occurring language can have high ecological validity and do not rely on human subject recruitment. Language may be an imprecise metric of experience, but analyzing how people use words can reveal how they experience emotions as similar or different. Several linguistic studies have conducted these analyses by qualitatively comparing the meaning of emotion words by searching for semantic primitives that have similar meanings across many languages (
20)
. Yet few studies have quantitatively compared the meaning of emotion words because the field lacks metrics that quantify the semantic distance between words such as the English
love and the Turkish
sevgi (
21).
To overcome this challenge, we take a new quantitative approach to estimate variability and structure in emotion semantics. Our approach examines cases of colexification, instances in which multiple concepts are coexpressed by the same word form within a language. Colexifications are useful for addressing questions about semantic structure because they often arise when two concepts are perceived as conceptually similar (
22,
23) (see fig. S5). Persian, for instance, uses the word-form
ænduh to express both the concepts of “grief” and “regret,” whereas the Sirkhi dialect of Dargwa uses the word-form
dard to express both the concepts of “grief” and “anxiety.” Persian speakers may therefore understand “grief” as an emotion more similar to “regret,” whereas Dargwa speakers may understand “grief” as more similar to “anxiety.”
Past research has used colexification patterns across languages to examine the semantic structure of non-emotion concepts. Youn and colleagues coded dictionaries from 81 languages to show that concepts such as “sun,” “river,” “mountain,” and “hill” had universal patterns of colexification that reflected concepts’ material and functional properties (
21). For instance, languages were more likely to colexify concepts such as “water” and “sea,” than concepts such as “sun” and “water,” implying that speakers of these languages viewed “water” and “sea” as semantically similar concepts and “sun” and “water” as distinct. We use a similar approach to estimate the variation and structure of emotion semantics across language families.
To gather a high-powered sample, we computationally aggregated colexifications into a database of cross-linguistic colexifications (CLICS) featuring 2474 languages and 2439 distinct concepts—including 24 emotion concepts. We then used a random walk probability procedure to generate colexification networks (
24). In these networks, nodes represented emotion concepts, and edges represented colexifications between these concepts, weighted by the number of languages that possessed a particular colexification. We used this procedure to construct a network for all languages in our database, and then for 20 individual language families whose colexification networks had a significant level of modularity (
ps < 0.001). Although nodes in each language family network were labeled with the same emotion concepts (“anger”), comparing patterns of colexification across language families allowed us to test whether these nodes actually showed universal semantic equivalence or whether their patterns of association reflected semantic variation (see supplementary text for more details).
A key step in these network comparisons involved identifying communities: clusters of emotion concepts that are more tightly colexified with one another than with emotion concepts outside of the community. For each network, we computed community structure using the Cluster Optimal algorithm (
25).
Figure 1Opens in image viewer displays the global colexification network and the five largest language family–specific networks, and fig. S1 displays the remaining language families. Family-specific colexification networks allowed us to estimate global variability in emotion semantics and to predict variation and structure in emotion semantics across language families.
We estimated global variation in emotion semantics by comparing the community structures of language family networks. We quantified agreement in community structure using adjusted Rand indices (ARIs), which indicate the similarity of two networks’ community structures (
26). Negative ARI values indicate that two networks’ community partitions vary more than would be expected by chance, ARI values of 0 indicate that two networks’ community partitions vary at a level that would be expected at chance, and ARI values approaching 1 reflect high agreement in community structure between two networks. The distribution of raw ARIs indicated high variability in community structure across language families, with a mean ARI of 0.09 (SD = 0.11). Because ARIs can be artificially low in networks with few edges owing to isolated nodes, we also examined the ARI values for a thresholded set of community comparisons. Through a series of permutation tests, we identified pairs of communities that were more similar than would be expected by chance and then thresholded our sample to only include these permutation-robust community comparisons. With this more conservative set of comparisons, the mean ARI was 0.22 (SD = 0.09), still reflecting high variability in emotion semantics across language families.
To test whether variation in emotion colexification patterns merely arose from methodological factors, such as the way that concepts were glossed in our database, we next compared the ARI values from our emotion concept comparisons to ARI values for colexification networks involving color concepts. Color concepts have also been studied cross-linguistically (
27) and are frequently compared to emotion concepts (
4), making them an appropriate sample of comparison concepts. In the full sample of comparisons, color concepts had a mean ARI of 0.35 (SD = 0.17), significantly higher than the full sample of emotion concept comparisons,
t(390) = 18.51,
p < 0.001. In the permutation-robust sample of comparisons, color concepts had a mean ARI of 0.41 (SD = 0.15), again showing more universality than the permutation-robust sample of emotion concept comparisons,
t(158) = 11.44,
p < 0.001 (
Fig. 2Opens in image viewer). This difference also replicated when equating the number of color and emotion concepts,
t(334) = 15.52,
p < 0.001 (see materials and methods for more details). Emotion semantics thus vary widely across language families, and their variation is significantly greater than variation in color semantics.
Our next analysis investigated whether geographic proximity predicted the pattern of variation in emotion semantics across language families. We tested this hypothesis by correlating the geographic proximity of language families (via the latitude and longitude coordinates of their languages) with their pairwise ARI values. As predicted, language families with higher pairwise ARI values were in closer geographic proximity, both in the full sample of our ARI comparisons,
r(188) = −0.26,
p < 0.001, and in the smaller permutation-robust sample,
r(55) = −0.29,
p = 0.03 (
Fig. 3Opens in image viewer). These associations suggest that emotion semantics do not vary randomly; their variation is tied to the cultural evolutionary relationship between language families.
Finally, we tested whether any psychophysiological dimensions could predict the semantic structure of emotion across language families. We examined the explanatory power of six dimensions (valence, activation, dominance, certainty, approach-avoidance, and sociality) by testing whether they predicted the community membership of emotion concepts across colexification networks. Using ratings of 200 online participants (90 female, 110 male; Mage = 34.11, SDage = 10.52), we first classified our emotion concepts on these dimensions using a 1-10 Likert-type scale. We also classified a set of five “neutral” concepts (ordinary, nondescript, indifferent, neutral, and impartial). Using a multilevel structural equation model in which participants’ ratings of emotion concepts on these dimensions predicted the community membership of emotion concepts, we were then able to test how well each dimension differentiated emotion communities from our set of neutral words. If a dimension was highly predictive, the model’s Akaike information criteria (AIC) fit would show a large decrement when the dimension was removed from the model. By contrast, removing nonpredictive dimensions would have less of an impact on the model’s AIC fit. We ran this analysis for all language families except the Nuclear Macro-Je, for which models did not converge because only a single community contained multiple emotion concepts.
The results of this leave-one-out analysis revealed higher predictive power for valence and activation than for other dimensions (
Fig. 4Opens in image viewer). Valence was the most predictive dimension, with the highest AIC fit decrements (
MAIC = 323.50) for the all-family network and for 13 of the 19 language families in our analysis. Activation was the most predictive dimension for the remaining six language families (
MAIC = 208.76). Approach (
MAIC = 35.82), certainty (
MAIC = 30.26), dominance (
MAIC = 26.18), and sociality (
MAIC = 7.41) had far less predictive power than valence and activation, and comparing the distributions of fit decrements across language families revealed that both valence (
ps < 0.001) and activation (
ps < 0.001) had significantly higher decrements (i.e., explained more variance) than these other dimensions, and that valence had a higher average fit decrement than activation,
t(19) = 2.70,
p = 0.01. These findings suggest that languages around the world primarily differentiate emotions on the basis of valence and activation (see materials and methods for further analyses and discussion).
Our findings reveal wide variation in emotion semantics across 20 of the world’s language families. Emotion concepts had different patterns of association in different language families. For example, “anxiety” was closely related to “fear” among Tai-Kadai languages, but was more related to “grief” and “regret” amongst Austroasiatic languages. By contrast, “anger” was related to “envy” among Nakh-Daghestanian languages, but was more related to “hate,” “bad,” and “proud” among Austronesian languages. We interpret these findings to mean that emotion words vary in meaning across languages, even if they are often equated in translation dictionaries. The supplementary materials contain an extended discussion of why other technical and sampling artifacts are unlikely to account for the variation that we observed in emotion semantics.
Geography partly explained variation in emotion semantics, such that geographically closer language families tended to colexify emotion concepts in more similar ways than distant language families. Geographically proximal societies often have more opportunities for contact through trade, conquest, and migration and share more recent common ancestry than distant groups (
11). This suggests that historical patterns of contact and common ancestry may have shaped cross-cultural variation in how people conceptualize emotions. We encourage future research to examine the specific vertical and horizontal transmission processes that give rise to geographic variation in emotion semantics.
Despite this variation, we find evidence for a common underlying structure in the meaning of emotion concepts across languages. Valence and physiological activation—which are linked to neurophysiological systems that maintain homeostasis (
9)—served as universal constraints to variability in emotion semantics. Positively and negatively valenced emotions seldom belonged to the same colexification communities, although there were notable exceptions to this pattern. For example, some Austronesian languages colexified the concepts of “pity” and “love,” which implies that these languages may conceptualize “pity” as a more positive (or “love” as a more negative) concept than other languages. The ability of valence and activation to consistently predict structure in emotion semantics across language families suggests that these are common psychophysiological dimensions shared by all humans.
Questions about the meaning of human emotions are age-old, and debate about the nature of emotion persists in scientific literature. The colexification approach that we take here provides a new method and a set of metrics to answer these questions by creating vast networks of how people use words to name experiences. Analyzing these networks sheds light on the cultural and biological evolutionary mechanisms underlying how emotions are ascribed meaning in languages around the world. Although debates about the relationship between language and conscious experience are notoriously difficult to resolve (
28), our findings also raise the intriguing possibility that emotion experiences vary systematically across cultural groups. More broadly, our study shows the value of combining large comparative linguistic databases with quantitative network methods. Analyzing the diverse ways that people use language promises to yield insights into human cognition on an unprecedented scale.
Acknowledgments
We acknowledge the many linguists who provided the word lists necessary to detect and analyze colexifications across languages. We also acknowledge the feedback of our editor and six anonymous reviewers; K. Gray, K. Payne, E. McCormick, J. Leshin, and N. Caluori; and the research assistance of R. Drabble, I. Khismatova, and A. Veeragandham.
Funding: This study was supported by a National Science Foundation Graduate Research Fellowship and a Thomas S. and Caroline H. Royster Fellowship to J.C.J. The compilation of the CLICS data and software used in this study was funded by the Max Planck Society (as part of the CLLD project,
https://clld.org), the Max Planck Institute for the Science of Human History and the Royal Society of New Zealand (Marsden Fund grant 13-UOA-121 and GlottoBank project,
https://glottobank.org), the DFG research fellowship grant 261553824 and the ERC Starting Grant 715618 (both awarded to J.M.L.), and the ARC’s Discovery Project DE 120101954 and the ARC Center of Excellence CE140100041 (both awarded to S.J.G.). J.W. is supported by funds from the Templeton Religious Trust (TRT0153). No funding agency was involved in the conceptualization, design, data collection, analysis, decision to publish, or preparation of this manuscript, and the views expressed in this manuscript do not necessarily reflect the views of our funding agencies.
Author contributions: J.C.J., J.W., and K.L. conceptualized and designed the study. J.C.J., J.W., T.H., J.M.L., and P.J.M. acquired and analyzed the data. J.M.L., R.F., and S.J.G. contributed software and data used in our analyses. J.C.J., J.W., T.H., P.J.M., and K.L. interpreted the analysis. J.C.J., J.W., K.L., J.M.L., and R.D.G. wrote the manuscript. All authors approved the submitted manuscript.
Competing interests: The authors have no competing interests to declare.
Data and materials availability: All data, scripts, and materials are available at
https://osf.io/d9tr5/.