In addition to this blog I have a Twitter account called LinguaDiem, where the goal is to post about as many languages as possible. I generally do one per day. I’ve had this going for about a year and a half now, and I recently hit a landmark of 500 languages. They are all listed over here. For this blog post, I wanted to talk about LinguaDiem, and a few things I learned from 500 languages.
I’ll begin with the obvious: I do not have personal knowledge of 500 languages. I am fluent in two, and I have basic skills in a few others. I get my information by reading linguistics papers or descriptive grammars. I use Google Scholar and Google Books a lot. I just search for some grammatical terms plus the name of a language or language family, e.g. “grammar australia languages” or “morphology tone mixtec“. I have also frequently consulted the SIL and WALS. Sometimes I actually visit a real library and read actual books. It generally takes me less than 30 minutes to do the necessary research for a tweet, and I find it an nice thing to do with coffee in the morning. Most of the time, I need to find a new source for each language that I do, but occasionally I’ll come across a book about language families, or typology, and that can provide me with a week or more of material.
My aim for each tweet is to provide a fact about a language then follow it up with some data. Some examples taken at random:
The data is the key part for me. If I can’t easily show some examples of what I’m talking about, then it probably won’t qualify for a tweet. I think that giving data is important in a general scientific sense (I know it’s Twitter, but I still care). I also think this is what sets LinguaDiem apart from other language-related accounts on Twitter, like @aboutworldlangs or @Languagebandit. Not that I’m criticizing anyone: I follow both these accounts, I like their content, and they have a much bigger audience than I do. I’m just saying that I’m going for something different.
One thing that has surprised me about LinguaDiem is how easy it has been. I always hear about how little documentation exists for languages, but finding at least one source for 500 languages was not hard. I’m really wondering when this is going to slow down. On the other hand, I’m only looking for one paper to read, and I don’t care about the topic. If you’re a linguist who wants to do research on a specific language, or investigate a particular phenomenon, then things would be much harder.
I have only once had a request for a language. And that was the only time that I couldn’t dig up any information, damn it.
If there’s someone reading who knows about Kono, please leave a comment, or contact me on Twitter.
One of the things that I struggle with is how to write the name of a language. Languages have an enormous array of consonants and vowels, and the English alphabet is not well-suited to writing all of them. For this reason, there are often disagreements about how to transcribe names. For instance, there is a language spoken in China that can be spelled Akeu, Akheu, Akui, or Aki. The group that speaks this language is variously known as the Akha, the Aini, or the Kha Kaw.
I can’t possibly know the history of every culture and language, and I just don’t have the time to do careful verification in every case. I go with whatever language/culture name is given in the source material that I’m using. This is the only practical solution, but it has two downsides.
First, I might accidentally use a name that has become outdated or, perhaps, even offensive. I’ve only ever once been contacted about this, when I tweeted about Dogrib and @TlichoOnline sent me a message telling me that the name Tlicho was preferred.
Second, it’s possible that I have tweeted about two languages that I thought were distinct, but in fact are actually the same thing. For example, there is a Bantu language called Kikuyu, but it is also sometimes called Gikuyu. The reason for this has to do with the phonology of the language. The ku- or gu- part at the beginning of the name is a prefix. The prefix is normally pronounced as [ku], but if it is attached to a stem that begins with a /k/ then it is pronounced [ɣu], and the sound [ɣ] is written as “g” in the spelling system. Some people prefer to write “Gikuyu” because it better reflects the pronunciation. Others write “Kikuyu” to represent the underlying prefix, ignoring the predictable change in pronunciation.
In this particular case, I am familiar enough with the language not to get confused. Actually, this pattern in Kikuyu pronunciation is well-known, because it is used in a lot of undergraduate phonology courses. However, if you didn’t know about this rule, you might (reasonably) believe that Kikuyu and Gikuyu are different dialects, or perhaps different languages. Obviously, I can’t I know about such pronunciation rules for every language, so it’s quite possible I’ve included the same language more than once on my list.
If you spot an outdated name on my list, or duplicates, please let me know here or on Twitter. I’ll investigate and if necessary I’ll change my total count.
As a general rule, I try to stick to living, modern languages, but in a few cases I have chosen extinct tongues:
I also tend to stick to spoken languages, because there is more material available for them and because they have more-or-less standardized transcription systems. Sign languages, on the other hand, are far more poorly documented, and there is no easy way of transcribing them, so I have had to break my “always include data” rule in these cases.
I try to vary my stuff so that I’m posting about different aspects of language, but some are harder than others. Things that are really easy to put into tweet format include morphology, semantics, and the lexicon. Syntax is by far the hardest things to do. Phonology falls somewhere in the middle.
Morphology has to do with the way the words are put together, for example the way that roots and suffixes are combined. This is something that’s easy to put into a table format and easy to convey to non-linguists. For this reason I often tweet about verb conjugations, noun inflection, derivation, etc. The causative seems especially popular.
The lexicon is the set of words in a language, and this is of course easy to tweet about, since it just involves translations, and people love seeing words in foreign languages that convey apparently weirdly specific information.
Syntax is hard because explaining it often involves long example sentences, and with only 140 characters I don’t have space to do this. I also generally need to do interlinear glossing for syntax, and I can’t really count on people knowing how to read the glosses (plus this makes it even harder to fit into the 140 character limit). Nonetheless, I still attempted syntax a few times:
Phonology and phonetics can be hard to tweet about because they inherently involve sound, which isn’t easy to demonstrate through writing. I have to use the IPA, but I can’t really count on people knowing that. There are exceptions, and sometimes a sound pattern is obvious enough from transcription that I think it could work. For instance, I have given a lot of examples of reduplication. This is a process whereby portions of a word are repeated to change the meaning of the word.
Vowel and consonant length are also easy to explain and fit neatly into a tweet.
Finally, some miscellaneous stuff that doesn’t fit elsewhere in this post:
– Language names with certain punctuation marks break hashtags. In particular, the apostrophe is a problem. Many languages use apostrophes to indicate glottal stops or ejectives (or other kinds of glottalization), such as Yup’ik or ‘Bëlï. Hyphens also occasionally occur too. It was suggested by @AnaMBennasar that I use the underscore for the hyphen because that won’t break a tag. This is a good idea, and one that I’ve since adopted.
– The exclamation point symbol ! will break a hashtag. On the other hand, the IPA symbol for an alveolar click ǃ does not. Yes, I know they look exactly the same, but they are different as far as a computer is concerned. Interesting fact: the IPA symbol is not supposed to be an exclamation mark, but actually it’s the pipe symbol | with a dot underneath.
– There is a Lakota version of the Berenstain Bears called “Matȟó Waúŋšila Thiwáhe”, which is pretty neat. Watch an episode on YouTube.
– I didn’t realize how much tone was used for grammatical processes in African languages. I feel stupid saying that, and it’s not exactly insightful for a linguist, but anyway.
– I think I understand serial verbs now. I’ve never made them the subject of a tweet because I can’t think of a good way of doing this, but I’ve seen enough examples by now to better understand them.
– Same thing with switch reference. This concept makes way more sense to me than it used to, although it’s still extremely hard to put into a tweet.
– I’m way faster at making charts and tables than I used to be. Tables are an easy to way to explain something, and they can be attached as a picture so that I can cheat a bit on the 140 character limit for tweets.
– Lastly, I did a few silly things for #KittensAndLinguisticDiversity. The idea is to grab an example sentence from a grammar, and then find a cat picture you can match up with it.