This paper on happy words makes me angry

There’s a new paper out in the Journal of Positive Psychology: “Towards a positive cross-cultural lexicography: Enriching our emotional landscape through 216 ‘untranslatable’ words pertaining to well-being”, by Dr. Tim Lomas, in the Department of Psychology, at the University of East London. I don’t read this journal regularly, and I only heard about it through a Huffington Post article.

As you can tell from the title, the paper is about so-called “untranslatable” words. These are words from other languages which are extremely precise in meaning, and difficult to render into English. There have been plenty of books published on the topic, and lots of websites exist too. If you just google the phrase “untranslatable words” you’ll see what I mean, and actually you’ll be doing the same amount of research as Lomas did for his paper. More on that later.

It’s important to note from the very beginning that untranslatable words are, in fact, translatable. Otherwise, it would be impossible to write books about them and explain them to English speakers. Saying a word is “untranslatable” really means that it’s difficult to find an exact equivalent in English. In many cases, it takes a full English sentence to explain what a single word can convey. Lomas acknowledges this in the introduction to the paper:

Such words are not literally untranslatable, of course, since their meaning can be conveyed in a sentence. Rather, they are deemed ‘untranslatable’ to the extent that other languages lack a single word/phrase for the phenomenon.

The concept of “untranslatable” is slippery and ill-defined, made worse by the fact that Lomas doesn’t provide any formal criteria. It’s assumed that we will all share his opinion on this. Even when he starts presenting examples later in the paper, he rarely gives any reasons about why those particular words should be deemed untranslatable. For example:

There are terms for ‘vicarious embarrassment’, in which one shares in sympathy the shame of others, including myötähäpeä (Finnish), Fremdschämen (German) and pena ajena (Spanish).

That’s it. A single sentence, three data items crammed in there, and no further explanation or argumentation. Why are these untranslatable? What’s wrong with the term “vicarious embarrassment” and what does that fail to convey? Are there are other words in English that we might consider (“cringe-worthy” comes to mind)? What have professional translators done with works that contain these words? This is how most of the paper goes. It’s packed with examples of foreign words, but not much else.

The focus is specifically on untranslatable words that have to do with happiness or states of well-being. Lomas’ idea is that we should introduce English speakers to these words, because this could improve their own well-being by allowing them to label certain emotional states. That is, maybe we could all be happier, if we had more words for talking about happiness. This would allow us to notice and focus on more happy moments in our lives, and experience more happiness over all. As he explains:

The existence of ‘untranslatable’ words pertaining to well-being implies that there are positive emotional states which have hitherto only been explicitly recognized by particular cultures. However, this does not mean that people in other cultures may not have had a comparable experience. Yet, lacking a specific term for it, such people have arguably not had the opportunity to specifically identify that particular state, which instead thus becomes just another un-conceptualised ripple in the on-going flux of subjective experience…the value of exploring ‘untranslatable’ words is that, if people are introduced to a foreign term, this may then be used to give voice to these hitherto unlabeled states.

I have no idea whether or not this would actually work. I’m not a psychologist, so I don’t know if learning to label emotional states can increase your well-being. But that doesn’t even matter. We are not even going to get as far as talking about this issue, because the data collected for this paper is completely untrustworthy.

Indeed, you might wonder how Lomas actually went about collecting data in the first place, given the lack of criteria for what counts as “untranslatable”. The answer is actually pretty amazing. Let’s head over to the Methods section…

So, with the aim of enriching the emotional vocabulary of the English language, this paper offers a quasi-systematic review of ‘untranslatable’ words pertaining to well-being. It is quasi-systematic since there was insufficient source material in academic journals, meaning that a true systematic review, utilizing conventional academic databases, was not possible.

Insufficient source material? Couldn’t use conventional academic databases? Quasi-systematic (i.e. non-systematic) search? Hmm….It sounds like Lomas couldn’t find anything support his ideas about untranslatable words, but he really really wanted to write this paper so he just did it anyway.

Look, I understand that it’s difficult to locate information about the semantics of other languages. You have to do a lot of grunt work going through grammars, dictionaries, field work reports, and obscure conference proceedings. I expect that in a preliminary paper like this, there will be some simplifications.

Nothing could have prepared me for the actual research methodology.

This stage featured three main search strategies. First, I examined 20 websites and blogs devoted to ‘untranslatable words’. These were located by entering the phrase ‘untranslatable words’ into google, and picking the first 20 such websites and/or blogs…This search strategy generated 131 words.

Uh wut? Am I really reading something from an academic journal right now? Instead of even attempting to be marginally rigorous, Lomas just throws caution to the wind and literally decides to ask the internet. Research methods: “I just googled a phrase and took whatever I came across first”. Come on.

To be clear, this is not always a bad idea for linguistic research. For instance, if you’re looking at how language is used on the internet in different contexts, then googling is absolutely appropriate. If you’re looking to see roughly how frequent a particular phrase is, compared to another, then a search engine isn’t the worst thing (but Google N-Gram is probably better).

When you’re looking at a nebulous concept like “untranslatable words”, you don’t blindly trust the blogs you found on page 1 of a search. This is the kind of topic that any random yahoo can, and will, write about. Websites about untranslatable words are just as likely to contain fictitious words as they are real ones.

Second, I searched google one language at a time. This involved entering ‘_____ concept of’ and ‘well-being’ into the search engine, with a different language in the underlined space each time. I would proceed through the first ten pages for each search, looking for references to emotions or qualities relating to well-being that were presented as being unique to a particular culture. This strategy generated a further 77 words.

I’m not sure this is really any better than the first strategy. At least it’s not just restricted to blogs. And hey, he went all the way to the 10th page of a Google search. That’s impressive. I don’t think I usually go past page 3 or 4.

Third, I canvassed staff and students at my institution, as well as friends and acquaintances, which yielded another 8 words.

I think it’s a good idea to go talk to native speakers of other languages, but Lomas doesn’t actually specify that he did this. He just says he talked to friends and colleagues. We don’t know if these people are fluent, or if they are just repeating a story they heard somewhere. This third search technique depends on naïve intuitions about word meanings and the translations are provided by non-professionals. It’s strange that this turned up such a low number of words too, but maybe there was a lot of overlap with the words generated by the previous two searches.

As a result, 216 relevant terms were located. These words and their descriptions were checked for accuracy by consulting online dictionaries, as well as peer-reviewed academic sources (if such were available for a given word). Thus, I based my analysis on the definitions provided by dictionaries and academic sources (rather than the original websites/blogs where I first located some terms).

This is an attempt for Lomas to save himself a little bit. By double checking in a dictionary later, he can claim that he didn’t actually do all his research by looking through blogs and chatting with friends. However, if Lomas really did this, he certainly wasn’t very rigorous. He frequently cites data without providing any reference at all, and I was able to find several instances of non-academic, non-dictionary sources. I’ll discuss this in a moment.

After collecting all these words, Lomas began to categorize them using something called “grounded theory”. It’s basically recursive grouping.

Having compiled a list of words, I analyzed these using a qualitative methodology known as grounded theory (GT) (Strauss & Corbin, 1998). In GT, the aim is to allow theory to ‘emerge’ inductively from the data. GT involves three main stages: open coding, axial coding and selective coding…For instance, I found five words which pertained to friendship (philotimo [φιλότιμο], cariño, confianza, nakama [仲間] and ah-un [阿吽]). I therefore grouped these words together under the label ‘friendship’. The next stage was axial coding, in which the themes themselves are clustered together into meta-themes. For example, I took the themes of ‘friendship’, ‘affection’, ‘desire’ and ‘love’, and grouped these into a meta-theme of ‘intimacy’.

This categorization process seems to contradict the basic premise of the paper. If these are untranslatable words, then you cannot group them using the English translations because those translations are, necessarily, imprecise. You end up selecting foreign words based on your understanding of the English words in the translation, rather than based on the original meaning. The final categorization will reflect the author’s own biases about how these concepts are connected in English, rather than reflecting anything about human languages in general.

It’s also completely unscientific. If ten different people are asked to code the same data, they’ll probably come up with ten different groupings, because it’s based on personal ideas about how the data is organized, not on objective criteria.

The whole Methods section of the paper is a mess, and to top it off, Lomas never provides a comprehensive list of the words and languages that he used. Instead, he provides examples sprinkled throughout the paper, without giving the global overview. Having a full list of the languages is important, because we need to know how many of them are related to each other. Related languages are, obviously, going to have words in common, so if you tend to sample from a particular family then your overall set of words is going to be biased in a particular direction. This problem crops up within a few paragraphs of the Results section when Lomas is discussing words for positive feelings.

Perhaps the dominant state in this regard is happiness, for which most languages have a translative [sic] equivalent. Interestingly, many of these derive etymologically from terms pertaining to luck (McMahon, 2004), including heureux (French), onni (Finnish), Gluck (German) and felicità (Italian). Indeed, the English term derives from the old Norse happ, which alludes to fate, as in ‘happenstance’.

It is not at all “interesting” that these words have something in common, because almost all of the examples in this paragraph are from European languages (French, German, Italian, English, and Old Norse) which descend from the same ancestor language. Only one example from another family is provided (Finnish is in its own branch of the Finno-Ugric family). It’s also not true that these words all derive etymologically from terms for “luck”. The Italian word felicità is from Latin felicitas, which meant happiness but also fertility. The Latin word, in turn, comes from a Proto-Indo-European root *dhe(i) which carried a meaning of “suckle, produce, yield”.

There are also some inconsistencies in the way that language names are presented. It’s clear that Lomas doesn’t even have passing familiarity with language typology, and he never bothered to check with an expert. For example, look at this:

Further forms of merrymaking include: mbuki-mvuki (Bantu), to ‘shuck off one’s clothes in order to dance’ (Rheingold, 2000, p. 28)

Bantu is not a single language. It’s a family consisting of hundreds of languages. There is no way that Lomas did a fact-check on this one, because this is something you can find on the first page of google, and I know he’s capable of looking there. That citation is not an academic source or a dictionary either. It’s a popular publication “They Have a Word For It” (I have a copy of this book actually. It’s a fun read, if you take it with a grain of salt.)

This is not the only time that a family name is used where an individual one should be. In addition to this “Bantu”, we also find a reference to “Nguni Bantu”, which is not a language, but a subgroup inside of the bigger Bantu family. There’s also one word from “Gaelic”, which is a generic term for several Celtic languages. A language called “Chinese” is mentioned throughout the paper, without ever specifying which variety (although it appears to be Mandarin).

The Inuktitut language is called Inuit, which is incorrect. The term Inuit just means “people”. The Huron language is called called a “Native American” language even though that’s a geographical region, not a valid language family (the correct classification is Iroquoian). Pintupi is referred to as “Aboriginal Pintupi”, which is not a helpful qualifier because aboriginal people live all over the world. Pintupi is actually a Pama–Nyungan language spoken in Australia. In one case, Lomas talks about “the Australian aboriginal term dadirri” without even naming the language it comes from.

Even when the language name is correct, the rest of the information might not be. Consider this claim about Tshiluba.

Finally, in the Tshiluba language of the Democratic Republic of Congo, ilunga – rated by linguists as the world’s most difficult word to translate (Conway, 2004) – refers to a person who is ready to forgive abuse the first time, and tolerate it a second time, but never a third time.

That side-note about the difficulty of translation strikes me as odd. This is not normally the kind of pronouncement made by linguists. Let’s check the source. Who is Conway?

It’s Oliver Conway, who writes for BBC News. He’s not a linguist. The BBC story was actually about translators and interpreters, but they were incorrectly identified as “linguists”. Moreover, the news article is only 11 sentences long, and it gives us no information about how or why the translators went about selecting this as a “difficult” word. In fact, the whole thing appears to be an advertisement for the company “Today Translations”, masquerading as a news article.

Do you recall when Lomas claimed “I based my analysis on the definitions provided by dictionaries and academic sources”? Yeah, I’d almost forgotten about that too.

Here is another extremely suspicious claim:

This sentiment is epitomised by the Cherokee battle-cry yutta-hey, which translates as ‘it is a good day to die’, embodying the feeling that one is leaving life at its zenith, departing in glory.

Note first of all that this comes without a reference. I did my own fancy google searching, and I can’t find any evidence that this is true Cherokee. I found a few places online with Cherokee-English translations (here and here) but neither the word “yutta” or “hey”, or a combination, appears anywhere. I also looked up English words like “good”, “day”, and “die”, and none of the translations looks anything like “yutta” or “hey”.

The original source for this claim is probably this web page here. I’ll leave it to you to decide if that’s an appropriate academic source.

There’s one word provided from Yagán, which is also known as Yaghan and Yamana. This is a nearly-extinct language spoken in Tierra del Fuego.

mamihlapinatapei [is] from the Chilean Yagán language (a look between people that expresses unspoken but mutual desire)

Again, there is no reference given here, but I think it probably comes from the book “They Have A Word For It” which I mentioned earlier. Since I have a copy, I checked the bibliography, and there is nothing. The book simply claims the word comes from a “native source”, whatever that means.

Using Google, I found this word a few other places too, including the Guinness Book of World Records 1994, but there was never any primary source. Wikipedia gives a grammatical breakdown, and mentions that the word is based on the root ihlapi, which means “to be at a loss for what to do next”, but this is also unsourced. Eventually, I tracked down a copy of a Yagan-English dictionary (probably the only one ever written), and neither the full word, nor the root ihlapi, appear in there anywhere.

So in summary….

  • There is no definition given for “untranslatable”
  • No arguments are made for why any specific examples should be accepted as untranslatable.
  • The data was culled from blogs and web pages, not academic sources
  • Data is frequently presented without a source at all, making it impossible to verify
  • Names of languages are incorrectly or inappropriately used
  • There are several instances of false claims about etymology or meaning

What a mess. The goal of the paper was see if we can increase well-being and happiness by introducing English speakers to these untranslatable words. I can tell you right now that I do not feel any happier after reading it.


I collected all the names of languages that I could find, and they are listed below. This might not be a complete list.

  1. Arabic
  2. Balinese
  3. Bantu
  4. Boro
  5. Catalan
  6. Cherokee
  7. Chinese
  8. Danish
  9. Dutch
  10. Farsi
  11. Fijian Hindi
  12. Finnish
  13. French
  14. Gaelic
  15. Georgian
  16. German
  17. Greek
  18. Hungarian
  19. Huron
  20. Icelandic
  21. Inuit
  22. Italian
  23. Japanese
  24. Javanese
  25. Korean
  26. Norwegian
  27. Pashto
  28. Pintupi
  29. Russian
  30. Sanskrit
  31. Spanish
  32. Swahili
  33. Swedish
  34. Tagalog
  35. Tshiluba
  36. Turkish
  37. Unnamed Australian language
  38. Urdu
  39. Welsh
  40. Yagán
  41. Yiddish


Filed under Linguistics

11 responses to “This paper on happy words makes me angry

  1. Many years ago – long before Google – we were taught “How to read a [medical/surgical] Paper”. The paper you critique here wouldn’t get far in this process; there are so many papers in some areas that it’s impossible to read them all, so a filtering process is very necessary.

    A comment; these so-called untranslatable words can, of course, be rendered into a word or phrase in other languages. Thus ‘Schadenfreude’ in German becomes ‘taking enjoyment in another’s misfortune’ in English. Except that the concept of Schadenfreude supposedly doesn’t exist in English, that is, the word ‘Schadenfreude’ is also a short-form for a certain philosophy. To fully understand a word in another language, it’s then necessary to understand the totality of the background.

    One further criticism of this paper; the author and you both used Google to search for items. You have assumed (I think) that what Google produces for you is the same as would have been produced for the article’s author. This may not be correct. I understand that Google searches are based, in part, on your own previous search history. Such searches aren’t necessarily neutral, they can be biased.

    • That’s a good point about the searches. You’re right that the results can depend on things like where you are physically located, which computer you’re using, your previous searches, and the popularity of the web sites, and so on.


  2. Great post. Made me think about the pre google days. Greetings from Germany

  3. The lack of critical rigor seems to be a sign of the times in far too many cases. At least this guy has some reference to facticity, even if it’s very tenuous.

    I have long believed there is some degree of truth to the idea that language can facilitate thought. Pick any profession, and you’ll find it filled with “terms of art” regarding that profession. Discussion and thought within that profession isn’t impossible without the terminology, but it’s very difficult, imprecise, and verbose (and possibly error-prone).

    (There was a cute NCIS episode recently in which two people fix a car engine because ‘that thingy isn’t connected to that whatchamacallit.’)


    • I agree that there’s some connection between language and thought. Personally, I like the view espoused by Dan Slobin, which is called “thinking for speaking”. Roughly, the idea is that learning a language is learning a way of thinking about the world. To say a sentence in a language, you have to learn to pick out characteristics of events and objects which are (a) considered salient and (b) easily encoded in the language.

      Not every language encodes the same kinds of information, you so have to learn to “think” differently about situations, depending on which language you are using to describe them.

  4. priscilaursulazenigmail



  6. Sound job! This kind of shite makes me very angry too! When are people going to start teaching LINGUISTICS in schools instead of prescriptivist grammar?

