How many words could English have?

How many words are there in English? A popular method for counting words in a language is to use the total number in a dictionary (or some other corpus). But counting words this way doesn’t tell us anything very interesting about “English”, because not every speaker of English knows every word in the dictionary.

For example, you might know the word beech refers to a tree, but not know how to identify one in the woods. You might know that some governments are jingoistic, but not know which ones or why. Maybe you confuse yams and sweet potatoes all the time.

So does beech count as “part of English” if not every speaker of English knows it, uses it, or understands it? The words you know depend on things like education, job, dialect, how much you read, where you grew up, your hobbies, how much you’ve traveled and so on.

You can’t really say that “English” has X words. At best, you might be able to say something like “an average Canadian with an undergraduate education knows X words”. And that’s not terribly interesting.

So instead of asking how many actual words there are in use, let’s ask how many possible words there could be. What’s a possible word? Whether something counts as an English word isn’t just a matter of people using it to mean something. The word also has to conform to certain sound patterns. Have a look at these made-up words:

(1) blick, thrich, gakt
(2) *bnick, *thkich, *gatk

The words in (1) are “possible” words because they conform to sound patterns found in other existing English words that are used to mean something. These are words that could be given meanings and put into circulation, they just haven’t been (so far as I know, and even if they have, that doesn’t really detract from the point).

The made up words in (2) are marked with a star because they are not possible words of English. They contain sequences of sounds which do not exist in other words. Recognizing that blick is a word, even though it doesn’t have a meaning, is a bit like recognizing that beech is a word even if you don’t know what it really refers to.

These patterns are known as “phonotactic rules”. Every language has these. Phonotactic rules describe which sounds can appear next to others: phono means ‘sound’ and tactic means ‘touching’ (cf. ‘tactile’).

The existence of phonotactic rules means there is a finite number number of possible syllables in English, indeed in every language, since there are a finite number of sounds and not every combination of sounds is possible. And since words are made up of syllables, we can get a rough count of the number of possible English words by first figuring out how many monosyllables are possible, and then multiplying.

So how do you count syllables? Syllables are normally represented like this:

That’s the verb “seeks” in IPA transcription. Syllables contain at minimum a nucleus, which is generally a vowel. Consonants that come before that vowel are called the onset. Consonants that come after that vowel are called the coda. The nucleus and coda are grouped into a sub-unit called a Rhyme, but it doesn’t really matter why right now.

The number of syllables in a language reduces to the number of things that can go in each slot in the syllable tree. Multiply the number of onsets by the number of nuclei times by the number of codas and you get the total number of monosyllables.

For English, the possible onsets and codas as listed in Wikipedia will do for this thought experiment. There are 20 possible onsets consisting of a single consonant plus 59 complex onsets makes for 79 possible onsets. And if we include the fact that a word can have no onset at all (e.g. ice, out, over) then there are 80 possible onsets.

Let’s say there are 12 vowels that act as a nucleus. This will vary a lot according to dialect so treat this as a made-up number. Finally, Wikipedia gives 18 simple codas + 77 complex codas + no coda = 96 codas. So given these very rough numbers, this makes for 80*12*96 = 92,160 possible monosyllables. To find out how many bisyllabic words there are, you just multiply the number by itself (thanks SynedraAcus!). Then add those numbers together to find out how many one and two syllables words are possible. And so on. The general formula for how many possible words there are up to N syllables is:

So how big should N be? In theory, there’s no limit to how many syllables a word can have. In real life there are limits on how long things can be because of memory limits, the need to breathe, and so on. But consider a word like supercalifragilisticexpialidocious. That’s 14 syllables long. That’s way longer than just about any word you would normally use in conversation, it doesn’t really mean anything, and it still sounds totally natural, so our phonotactic intuitions scale up to some pretty big sizes.

And just for fun, let’s say that 14 syllables is the longest a word can be in English. Using the formula above you can find there are a massive 2.7596316758737486 x 10^70 possible words in English. Treat this number with extreme caution. I’ve been pretty loose about counting possible onsets and codas, and some of the numbers change with dialect. I also didn’t include any effects of stress.

Now you can technically use this method to compare the “size” of different languages, since each has different phonotactic rules. Let’s take Hawaiian, for example, which has one of the smallest sound inventories in the world. This language has only 8 consonants which can serve as onsets, and there are no complex onsets. So including the fact that there can be no onset, that makes for 9. We’ll also give the most generous count for vowels and say there are 25 of them, which means 25 nuclei. Hawaiian syllables never have a coda, which means there is only one possibility there. (By the way, this is pretty normally cross-linguistically. Lots of languages ban codas, especially in he Polynesian family that Hawaiian belongs to. It’s also common for languages to ban anything except nasal consonants in coda.)

The math is a lot simpler than for English: 9*25*1 = 225 possible monosyllables. And using the formula above, there are measly 8.560315144325182 x 10^32 possible words up to 14 syllables long. That is smaller than English, but certainly more than enough to have a word for just about anything you’d need to talk about day to day. Also, Hawaiian words can be very long, so 14 syllables might not be as unusual as English, and there’s probably a little more homophony.

It’s also worth mentioning that people don’t necessarily invent new words by just randomly gluing some sounds together. That happens (language is arbitrary after all) but people also coin new terms in other ways. For example, I haven’t considered compounding, which would up the word count again.

But as I said earlier, treat all these numbers with some skepticism. The final count doesn’t matter anyway. The number of words in a language isn’t nearly as interesting as the mental processes that generate those words in the first place.

6 Comments

Filed under Linguistics

A grammar book for you and I…oops me!

One serious problem with the available books on English grammar is that there are so many written by unqualified people. Take this one for instance: A grammar book for you and I…oops me!. The author is a lawyer. He has no special education related to grammar or language analysis. What makes him think he can write a book on the subject? And more to the point, why do people buy things like this? Could I write a book on law and get taken seriously? I should hope not. Why on earth would anyone expect a lawyer to know anything about grammar analysis?

I mean just look at what the first chapter is called: “the eight parts of speech”. This a bad start. When the author says 8, he really means just 8.

every time you reach in, grab a word, and use it in a particular sentence the word comes from just one of these eight compartments.

Amazing, huh? You can compartmentalize the entire English language into just eight categories.

It’s not amazing. It’s wrong. English words don’t have inherent or fixed parts of speech. And eight parts is not enough to capture all of English. And interjections are damn near useless as a category. This post explains everything.

When you reach into your bag of words and plunk them down on paper, you organize them into chunks. This group of words does this thing, that group of words does that thing. Rarely do you pull out just one word and end the writing project right there

This is true, and a point worth making, although I’d like to clarify that this chunking happens in your brain first, and on paper second. The more technical term for these chunks is “constituents“. This is a concept that is basic to syntax. If you don’t understand what a constituent is, you can’t understand what a phrase is. And speaking of phrases…

All groups of words break down into two types: phrases and clauses. The sole distinguishing features of a clause is the presence of a conjugated verb.

One thing I like about this definition is that it’s clear and simple. But despite those positive attributes, it’s still wrong.

Consider this sentence:

I am eating apples with peanut butter.

Inside that sentence is the string eating apples with peanut butter. This string forms a constituent, meaning it behaves as “one unit” for certain purposes. One way you can show this is that you can replace the entire thing with so as in:

I am eating apples with peanut butter and so is Jenny.

As an aside, that word so would be called a pro-verb. This is like a pro-noun, except that it replaces verb phrases instead of noun phrases. More generally words of this type are called “pro-forms”. (This is something not enough grammar books mention.)

Anyway, since eating apples with peanut butter has a ‘conjugated’ verb, by the author’s rules this would either not be a phrase (which is incorrect, it’s a verb phrase) or it would necessarily be a clause (which is also incorrect). I think, judging from his examples, that he would not consider it a constituent at all, and instead call all of I am eating apples a clause. Or maybe the whole thing would be a clause? He doesn’t give a definition of “sentence”, so I’m not sure if there’s any level higher than clause. It’s all a little confusing. It’s a nice looking definition, the simple language makes it more appealing, but it needs some work. This misconceptions about phrases carries on throughout the chapter on nouns.

How do you make a noun possessive? For singular nouns, just add “apostrophe s”; for plural nouns ending in -s just add an apostrophe. The rule is easy to follow but trips up a lot of people

That’s because the rule is wrong. Indeed, I see that the author himself has just been tripped up by it. Possessive is not marked on nouns – it’s marked on noun phrases. For example:

The man in the white coat
The man in the white coat’s passport.
*The man’s in the white coat passport.

The possessive goes at the end of the phrase, on coat, even though the passport doesn’t belong to the coat. It’s ungrammatical to put the possessive right on man, even though the passport belongs to the man.

And the problems go deeper than phrases. Turns out the author can’t reliably identify some specific words either. In discussing the American national anthem, he takes the time to point out something about the line by the dawn’s early light.

Notice that the word dawn’s does not really serve a traditional function as a noun. It’s really acting as an adjective. You can see this feature of our language by recognizing that dawn’s is not the object of the preposition by.

This is just so wrong. So what if it isn’t the object of the preposition? That doesn’t mean it can’t also be a noun. Let’s diagram that phrase out.

But of course, that’s only a picture. You could just as easily write “Adjective” there and prove his analysis correct. The reason that dawn sits inside a noun phrase is because it has a possessive affix on it. And that’s the part that blows my mind – this appears after a section on noun possession. Did he read his own book?

Or maybe he is above his own advice. He does seem to endorse an authoritarian approach to grammar. You should just do whatever the authorities tell you, even if it turns out they are incompetent.

And people in power- bosses, teacher, lovers – know the rules by heart. Or pretend they do. They expect you to follow the rules.

Wait a minute…the author is someone in power, or at least presents himself as an authority. So was that just a tacit admission that he is only pretending to know the rules? This is a terrible argument anyway. It’s letting the crazies win.

This book is filled with appeals to authority and unjustified arrogance. I think the peak moment is his argument that the passive voice doesn’t exist in the future progressive. The reason? Because he wants to feel smugly superior to anecdotal college professors:

Then one day I got a rather nasty letter from some English professor at a small college in a state I won’t name. He tore into me, saying ‘Oh yeah? What’s wrong with the future progressive passive will be being shown?’ I soothingly wrote back:
‘There’s nothing wrong with the construction, sir. It simply doesn’t exist, except perhaps in [name of state].’ Whew. Writing a book on grammar takes some guts.

It should take brains. I don’t see why we should consider this guy’s personal opinion to be special. He is trained to be a lawyer and has no particular qualifications to be writing a book on grammar, whereas an English prof might actually know something about passives. An ad hominem attack on an anonymous college professor is not a persuasive argument, and does nothing to support his contention that the passive is not used in the future progressive. A quick search on Google turns up thousands of results for ‘will be being shown’, so it obviously exists in many places, other than [name of state].

At the time, I couldn’t find an authority for my statement. But now I can cite Thomas Katen.

At the time? So when he replied to the college professor, it was an argument based on…gut feeling? Personal whim? A desire to always be right? Why wouldn’t he do the research part first? After all, the professor wrote to him; it wasn’t a question in a live debate. There was time to go look up a supporting source before replying.

As for citing Thomas Katen, it never actually happens. There’s no actual citation or quote or even a hint of who Thomas Katen is. He just mentions the name, hoping that we’ll be impressed by this lame appeal to authority, and leaves it at that. It’s not really even a citation – it’s a name drop.

It’s no wonder we lament the public’s poor grasp of grammar. When people as ill-informed and pompous as this can write a book on language and get taken seriously, things really are in bad shape.

6 Comments

Filed under Book Review, Prescriptive

Tarzan and Jane’s Guide to Grammar

This is an unusual one. It’s a grammar book, but it’s written in a narrative style. And just look at that cover.

To give you a flavour of the book, here’s a little passage:

“You see, a noun is a word that – “
Just then my body froze because a truly enormous, hairy spider was crawling up the table leg. I let out a scream that must have been heard for miles around. Tarzan was momentarily startled, but when he spotted the creature he grabbed a knife and slit the thing in half. Then he carried both halves outside.
When he came back in and sat down I seemed to be breathing normally again, so I continued, “Basically a noun is a word that names something. It can name a person, place, an idea, or an action”.

The story is that Jane has arrived in Africa, meets Tarzan, and wants to write some newspaper articles about him. Jane’s brilliant idea is that she could get Tarzan to write some of the articles. But to do that, she’ll have to teach him grammar, and that’s supposed to be the backdoor way she teaches the reader about grammar. It could be clever. It could be funny. But instead its a premise with hole so large you could squeeze an ungreased elephant through. The first thing that doesn’t smell right to me is when Tarzan responds to Jane’s idea by saying: “grammar boring”.

The whole point was that he didn’t know what grammar was, but this makes it sound like he gave it a shot at one point and doesn’t like it. When did that happen? Was there some kind of class in the treetops? How does he know it’s boring if he doesn’t know anything about it?

Jane’s expectations for the project are low:

“Judging from his speech, I didn’t have any doubt that he would make every grammatical error under the sun.”

What’s with the condescending attitude? Sorry lady, he’s only been raised in isolation by gorillas. It’s a miracle he can speak at all. I think you should view this as an opportunity to teach language to another human, rather than some burdensome task of correcting a deviant.

And besides, if you actually look at what Tarzan has said so far, even his crude two word utterances carry some basics of English grammar, like the correct word-ordering. For instance, he seems to know that adjectives precede nouns. He knows that the subject precedes the verb. He can do question word movement in “how teach grammar” and he know how to inflect for plural “how many rules?”

But ignoring all that Jane carries on with the monumental task and soon, Tarzan is cured:

“I worked with Tarzan over the next couple of weeks on basic sentence structure. … Anyway, after a while, Tarzan was speaking pretty much the way that normal people do and we were able to carry on normal conversation.”

So in one paragarph he just learns all of English syntax, morphology, semantics and pragmatics. AND THE BOOK GIVES NONE OF THAT INFORMATION. Why do this? What could the next 19 chapters possibly contain, if not that? I don’t understand. This “couple of weeks” they gloss over is what the whole book should have been about: Jane teaching him about English grammar.

So if Tarzan can speak normally – what else is there to do? Jane makes this incoherent offer:

“OK Tarzan,” I said with a smile, “are you ready to start learning grammar?”

What? If she’s only going to start teaching him grammar now, then what the hell was she doing the last three weeks? How can Tarzan possibly have learned about sentence structure without learning any grammar? And Tarzan responds with an equally baffling:

“Go ahead and teach me; I want to learn”.

What does he think he just spent a month learning? I mean, just look at that response he gave. It’s two complete sentences. From that alone we see he know how to form imperatives, how to conjoin sentences, how to mark pronouns for case, when to use infinitival forms, and probably some other stuff. I think that shows a pretty solid understanding of ‘grammar’, don’t you? What is there to improve upon? I don’t see a single error in there. Does the author even know what “grammar” is?

That’s a rhetorical question of course. The answer is no. The term seems to mean some murky collection of ‘things the author personally can remember about language and thinks are important enough to include in a book’. Topics range from parts of speech to punctuation and there’s no thread holding it all together. It’s the same laundry list of topics you can find in any old traditionally-based grammar book.

This one mistake really undermines the book and the author’s credibility from here on in. This is such a basic misunderstanding, or misuse of terms, that it’s hard to believe the author has ever received any proper education in grammar. I hope it doesn’t seem like I’m over-reacting, but this is like a biology text called a whale a fish. Would you really just let that slip?

There’s one novelty: Jane actually goes through the 9 parts of speech. Usually, there are only eight. Where does she get the extra one from? She splits nouns and pronouns into two categories. And I approve of this departure from tradition: nouns and pronouns behave in different ways, and it makes sense to present them as distinct but related categories. Still, her definition of “noun”, which I quoted earlier, is mostly semantic, which is not the right approach.

To break up the grammar lessons, the book also has a slightly awkward romance plot:

“Well,” I said, “in the sentence ‘I consider myself lucky to have met a nice man like Tarzan’, the word myself is a reflexive pronoun. It reflects back on the word I.”
Tarzan blushed. And I think under the tale his foot moved back to where it was before.

Overall not the worst grammar book ever, but like so many grammar books it’s written by someone without a strong education in language analysis, so take it with a grain of salt. The novel format is cute, and maybe it will appeal to some people, but I personally found it distracting and hard to follow. And I just couldn’t get over the flawed premise and the nonsensical idea that you don’t know grammar if you don’t know explicit terminology.

The real problem with a novel format is that information is dispersed through out and hard to find. Nothing gets summed up in a table, or explained in any logical order. It’s just whatever Jane thought of in whatever order. Priority goes to the storyline, rather than to organizing this book in a way a grammar learner would want. This makes the book useless as reference material, because you have to go searching through the chapter to find what you want. But if you liked it, and you haven’t yet had enough of classic story characters teaching you about English, maybe you would enjoy the author’s other book “The Wizard of Oz Vocabulary Builder”.

5 Comments

Filed under Book Review, Linguistics

Spelling is not grammar

One of the things that really bugs me about the “grammar police” type is their inability to distinguish grammar from spelling. I am sick of reading “grammar” posts on your vs. you’re. It is not a grammar mistake – it’s just a spelling mistake. Native English speakers absolutely know the difference. I am 100% sure that the mental grammar of an English speaker, the only grammar that really matters, distinguishes the lexical items your and you’re, even if the speaker doesn’t always know how to spell them.

Here’s something I would accept: most people “can’t tell the difference” between who and whom. I feel fairly comfortable saying that in a non-judgemental descriptive way. If you look and listen to English speakers, you’ll find that many vary between the two in places where, traditionally, whom was used.

But it’s only a matter of “not knowing the difference” so long as there are speakers of both the traditional and modern dialect around. Eventually, the distinction will completely disappear, and everyone will just say who. It won’t be “not knowing the difference” so much as “this pronoun is not marked for accusative case in English”.

And any argument that this change leads to confusion is nonsense. All nouns in English used to show accusative case, and now no nouns at all except pronouns show accusative case, and that doesn’t seem to hamper communication at all. The idea that language change = language decay is purist tosh. (But if you want to argue that modern English is worse for wear because of this loss, I’d love to hear about it in the comments.)

Anyway, if people really couldn’t tell the difference between your and you’re, and if this really was a grammatical problem and not just a spelling mistake, there would be evidence beyond the occasional confusion in writing. Here’s an actual grammatical property to consider: you’re is a contraction of a verb and a pronoun, and in question formation, this verb get moved before the subject, e.g. You’re happy becomes Are you happy? (I wrote about this movement in more detail recently.)

Note that when there’s a contraction, you have to “undo” it and move only the verb are to the front. The pronoun can stay put. If English speakers were truly, utterly confused about the difference between you’re and your, then would expect them to occasionally “undo” the wrong one:

(1a) You’re walking your dog.
(1b) Are you walking your dog?
(1c) *Are your walking you dog?

(2a) Your parents are home.
(2b) Are your parents home?
(2c) *Are you parents are home?

No one would ever utter the (c) forms. Those are clearly ungrammatical in standard English. For everyone. The confusion between your and you’re is a spelling mistake. Plain and simple. Those words have really similar phonetic forms, and so people get confused about how to write them. Or they just have a “slip-of-the-fingers” when typing (cf. a “slip of the tongue”). Or maybe they lack the education to put into words what the difference is. But no one is fundamentally confused about it.

Now, I’m not saying it is therefore totally cool to just write whatever you like. Of course not. This kind of sloppiness suggests that the writer otherwise doesn’t care about detail or about the quality of her own work. I can maybe excuse one such mistake on an assignment or in formal correspondence (everyone screws up now and again), but too many and I start wondering.

14 Comments

Filed under Linguistics

Not a question?

Another post inspired by QI. In this episode, Stephen Fry asks the question “Why do the columns of the Parthenon look straight?” (youtube clip is here.)

And the answer turns out to be:

“Because they are straight”

One of the other guests goes bananas about this. “That’s not a question!” he complains. The whole scene is actually pretty funny, with Fry crumbling under the pressure. But it’s a valid point. Is that really a question? Probably most people feel that something is at least a little wrong with it.

The question all by itself is, strictly speaking, a well-formed question. What I mean is that the syntax and phonology were all in order: the wh-word got fronted, there’s been do-insertion, and Fry uttered it with appropriate wh-question intonation.

The problem is not with the question, but with the question-answer pairing. If the answer to the question had been “it’s an optical illusion, they actually bulge”, then no one would have said anything.

To put it another way, the problem is pragmatic, not grammatical. The question wasn’t used properly. We expect questions to be requests for information. For instance, if someone asks Where is the milk?, a good answer is In the fridge or I drank it all because they provide the questioner with some useful information. An answer like Where it should be is not useful. (Although I find myself doing this all the time with my three year old: “Where is my doll?” “Where you left it.”)

A question like Why are the columns straight? isn’t really a request for information when the answer is because they are. The way the question is formed already takes for granted that the pillars are straight, so the answer provides nothing new. I think this is essentially the argument that the guest on QI was trying to make.

The oddness of the question is why it appeared on QI in the first place. The questions are often designed to trick guests into offering some commonly believed, but incorrect, information. As Fry goes on to explain, it was once thought that there was some visual illusion, but it turns out that the pillars are actually straight.

On the other hand, it seems just a little unfair to challenge this as “not a question” because it isn’t informative. It wouldn’t be the first question like that Fry has asked. In the context of events like game shows or school we normally suspend this requirement the questions be genuine requests for information. When a history teacher asks a student “when did Napoleon die?”, it’s not because she doesn’t know. The student’s answer won’t inform the teacher. Similarly, when Stephen Fry asks a question, it isn’t a genuine request for information because, as Sean Connery said on Celebrity Jeopardy, the guy’s just reading from a card. It still “counts” as a question because, presumably, not everyone on the panel or in the audience knows the answer. I guess this one just crossed the pragmatic threshold for that one guest.

2 Comments

Filed under Uncategorized

The shape of sentences

Sometimes on this blog I do sentence diagrams, and they always have a tree-like structure to them like this:

I don’t just label all the parts of speech like this:

I thought it might be interesting to talk about why that’s done. Why draw upper and lower levels? Why can’t sentences be “flat”?

Intuitively, they are flat-looking things. We write out words in a pretty strictly linear order, and speech is necessarily linear. You have no choice but to articulate one sound after another. But this isn’t how language actually works. The way that we process language and the way that it’s mentally represented is as hierarchical structures. That means that there are some parts of the sentences which are “above” and “below” each other, and linguists even talk about certain parts of a sentence “dominating” and “commanding” others.

There are lots of patterns in English that demonstrate how this works, but the example I am going to use is question formation, specifically the pattern known as Subject Auxiliary Inversion. Let’s take a simple sentence:

The man in the corner is smiling.

How do you turn that into a question? You move the auxiliary (is) to the front:

Is the man in the corner __ smiling?

The underscores represent where “is” used to be.
That’s easy enough, but what happens when there is more than one auxiliary? For instance:

The man who is standing in the corner is smiling.

Now what do you move? Do you still move the first auxiliary? Nope, that makes for an ungrammatical sentence:

*Is the man who __ standing in the corner is smiling?

You have to move the other one:

Is the man who is standing in the corner __ smiling?

And you could, in principle, end up with any number of auxiliaries in a row:

The man who is standing in the corner which is shaded is smiling
*Is the man who __ standing in the corner which is shaded is smiling?
*Is the man who is standing in the corner which __ shaded is smiling?
Is the man who is standing in the corner which is shaded __ smiling?

So maybe the rule is that you always move the last auxiliary? That turns out not to work either:

The man is drinking a beer which is overflowing with foam.
*Is the man is drinking a beer which __ overflowing with foam?
Is the man __ drinking a beer which is overflowing with foam?

You can’t formulate a rule like “invert the first/second/last auxiliary”, because where the auxiliary falls in the linear word order has nothing to do with whether it should be moved to the front. What matters is the auxiliary’s structural relation to the subject. You have to move the auxiliary that is attached to the main clause, which is the “highest” clause. And you can only express these relationships with a tree-like structure. If you just label a string of words, then you are missing an important peice of English grammar, and you won’t be able to explain patterns like subject-auxiliary inversion.

Here is a syntax tree for “The man who is standing in the corner is smiling”:

Notice how there are two S nodes in the tree, representing the fact that there are two sentences here: “the man is smiling” and “who is in the corner”. (I’m simplifying by calling them sentences, feel free to argue in the comments.) The auxiliary that gets moved in question formation is the one that is directly attached to the top-most S. The other auxiliary, even though it comes first in the surface string of words, is attached to a different S, lower down, and should not move.

Syntactic structures are an example of what linguists refer to as your “mental grammar”. They don’t exist in the physical speech signal. You can’t look at a waveform and find relative clauses or noun phrases. Listeners and readers have to mentally create the structures that goes “on top of” the linear sentences that they perceive. And it’s a pretty cool trick we can all perform.

4 Comments

Filed under Linguistics

Review of “Grammar Crammer”

Comma Sutra had some mistakes and poor arguments, but this book really takes it up a notch. The Grammar Crammer has some jaw-droppingly crazy material. I’ve organized this as replies to particular quotes from the book. Some of this book is available on google books, if you want to have a look at the larger context of any quote.

The basic problem with the book is that the authors haven’t got the slightest clue what linguistics is. And this has an effect on everything else in the book, because they’ve never learned how make and support arguments about language. They instead adopt the classic prescriptive approach, which is just to make stuff up. They also misuse a number of technical terms that anyone doing grammar should know (like “person”). The confusion about linguistics, and generally about how to study language, comes up only three pages in:

“Right now the study of English grammar is undergoing a revolution. Experts like Noam Chomsky are rehabilitating our changing spoken language, making it respectable once more.” (p.3)

There is no rehabilitation going on. Linguistics is fundamentally descriptive: linguists study language as it is, and as it changes, without any value judgement whatsoever. This is something you would know if you read even a single work of Chomsky’s. On the second page of this book he wrote he describes his work pretty clearly: “The basic concern is to determine and characterize the linguistic capacities of particular individuals”. In other words, the goal is to understand how humans use language, and most certainly NOT to impose on people a way of speaking.

And the follow-up doesn’t improve things much:

“They tell us that sentences can’t be separated into nameable parts, because much of what we say is condensed from longer, more involved thought.” (p.3)

It is an utter lie to claim that sentences can’t be separated into nameable parts, and worse than a lie to claim that linguists agree with this statement. Have the authors never heard of syntax? I guess not since they think linguistics is about language reform.

And that “because” clause doesn’t follow from the first part. It’s just a non sequitur for me. How do “involved thoughts” have anything to do with how parts of speech work? Parts of speech exist in English, and you determine them based on where words can appear in a sentence, and what kinds of affixes you can put on the word. I’m so sick of grammar authors getting this wrong, I wrote a whole post on this topic.

What really confuses me is that the Grammar Crammer itself has an entire chapter of parts of speech, so I guess even the authors don’t agree with themselves. I do not understand why this sentence was printed.

“They admit now even parts of speech aren’t easy to classify – something students have known since they struggled with the first grammar books.”(p.3)

Yeah. That’s it. Chomsky’s been working on syntax for more than 50 years, and he’s just suddenly realized that it’s too hard to name parts of speech. And the kids (on their first grammar books!) are like ‘yeah, we knew that all along you should have listened to us’. And Chomsky’s like ‘you kids are right, what have I wasted my time on!”

Of course students think it’s hard, they’re students. They don’t know the material yet. It gets easier after you finish a course on it. Linguists like Chomsky are expert on syntax; he fairly invented modern syntax theory himself. What he says something is “difficult” he doesn’t mean what students mean.

OK, let’s skip a few pages ahead, out of the introduction and into the middle of the book.

“In Spanish, German, Latin, and many other languages, lots of words change their endings to show such things as person (whether one or many males, females, or neuters), degree of comparison, and tense.” (p.31)

The term ‘person’ has a specific meaning in grammar, and this is not it. Grammatical ‘person’ refers to a participant in the conversation: first person (I,we) is the speaker(s), second person (you) is the listener(s), and third person (he, she, it, they) is anyone else. Nowhere is this mentioned. Instead, the authors mention three other categories: grammatical number (singular/plural), grammatical gender (‘neuter’), and biological sex (‘male and females’).

Do they have any idea what they are talking about? I’ve never seen such a confused mess. Even students in my intro classes don’t write garbage like this. In the future, the correct order to do things in is to learn something about language, then write a book on it.

“In English, aside from a few nouns (person: waiter, waitress) and modifiers (degree of comparison: fast, faster, fastest), the only words that change are the verbs, and most of them have only two big changes, for third person singular and past tense.”

The only words that change are the verbs? Are they mad? Did they even take 10 seconds to think about what they wrote before spewing ink on a page?

Adjectives can become:
nouns (e.g. red-redness)
adverbs (e.g. quick-quickly)
verbs (e.g. public-publicize)

Nouns can become:
adjectives (e.g. accident-accidental)
verbs (e.g. friend – befriend)
adverbs (e.g. friend – friendly)

And as for verbs…
English verbs have inflections for progressive (-ing), past (-ed, with many exceptions), perfect (-en, with many exceptions), and third person present (-s). There’s also the infinitive marker ‘to’ as in ‘to jump’, ‘to sing’, etc. That’s 5, and in the passive voice verbs take an inflection that looks like the perfect, so you could call that 6 different changes. Already three times as many as claimed in the Grammar Crammer.

But then, there’s a huge number of derivational affixes verbs can take. For example, there are at least 6 different ways that verbs can become adjectives that I can think of right now: agree-agreeable, rely-reliant, create-creative, relax-relaxing, confuse-confused, freeze-frozen. That’s now 12 different endings, six times as many as the grammar book claimed.

So to say that “most of them [verbs] have only two big changes” is a little misleading. Has either author ever taken a syntax class? Or did their education in “grammar” consist of reading and regurgitating this kind of tripe from other prescriptivists. Frankly, this is the kind of simple observation – words take affixes – that I would expect any competent grammar author to make even without having specifically learned about it in a class.

To finish off, some insanity about complex sentences from a little further along in the book:

“The complex sentence is a wonderful invention of modern writers.” (p.72)

This just makes no sense. My three year old utters complex sentences all the time. Just today she said “You made pancakes while I helped Mummy and we were in the living room”. I’m pretty sure that 3 year olds have been doing this for a long time now, and it’s totally absurd to pretend that complex sentences are a recent invention. But actually, this craziness about complex sentences goes back a few pages from here:

“Old English was similar to Biblical Hebrew in that it had ands, buts, and some either…ors, but complex sentences didn’t come into English until the Romans invaded England brining in their complex Latin language. Even some of the more complicated compound sentences are relative newcomers to English….” (p.69)

Hebrew is a Semitic language, and English is an Indo-European language, and those are two completely different families, with no known connections. I can see no way in which Old English could be said to be “similar” to Hebrew, any more than any two random languages could be similar.

As for the rest on sentence strucures…there are no words except WTF. How did this book get published? There’s no research here, no fact checking, just pure fiction.

The claim starts that Old English had conjunctions (“ands, buts, and some either…ors”), which the authors say didn’t allow for complex sentences. But this is obvious, since by definition conjunctions are not used to make complex sentences. Conjunctions are used for compound sentences. You know, the kind that are claimed not to show up until the Romans arrived.

I’m pretty sure that the authors of this book don’t know what the words “complex” and “compound” means. So let me do their job for them and explain the difference:

A complex sentence is an independent clause (a stand-alone sentence, like Frank ate a hotdog) with at least one dependant clause (something that isn’t stand-alone, like that his girlfriend bought him). The parts of the complex sentence are joined by words like that and which.

A compound sentence is two independent (stand-alone) sentences combined into one, as in “I took the bus and she drove”. Compound sentences are joined by conjunctions: and, or, etc.

As far as I know, both sentences types are attested in English as long as we have records of English existing. But I’m intrigued about their claim that some types only showed up with Romans. What evidence exists for this? This is an astounding claims at odds with fundamentals of linguistic theory, and probably everything ever published in the history of English linguistics. If anyone can find a reference for these claims, please post it in the comments.

Overall, this book manages to get a few things right, but mostly I imagine this is through parroting information you can find in any grammar book. The authors clearly have no education in language or linguistics. They should stay away from writing another word about English until they take the time to learn the rudiments of it.

6 Comments

Filed under Book Review, Prescriptive