When is a word not a word? The ‘dog debate’.

A post appeared recently on the TES Reading Theory and Practice blog about the relationship between synthetic phonics and comprehension. Its focus is the UK’s statutory phonics screening check taken annually by all eligible children in Years 1 and 2.

The post, written by thumbshrew (aka Tweeter @MarianRuthie) seemed to me well-reasoned, informative and remarkably uncontroversial considering the heated debate that has polarised around decoding and comprehension. But it wasn’t long before the controversy got under way. Another blogger, @oldandrewuk, took issue not with the post as a whole, but with one comment about the categorisation of words. This is the sentence he questioned;

The check is made up of nonwords and real words (although, whether any ‘word’ presented like this, in isolation, is genuinely a word is a debatable point).”

@oldandrewuk observed, on Twitter;

Phonics denialist claims it’s “debatable” whether a word presented in isolation is actually a word: Beyond satire.

A lively discussion ensued, which @MarianRuthie referred to as the ‘dog debate’. @oldandrewuk attempted to refute thumbshrew’s claim using the letter string ‘d-o-g’. His reasoning was that if ‘d-o-g’ is indisputably a word, then the claim “whether any ‘word’ presented like this, in isolation, is genuinely a word is a debatable point” is invalid.

For everyday purposes, I think @oldandrewuk’s argument carries some weight. To the best of my knowledge, the reasonable people on the Clapham omnibus don’t spend time debating whether or not ‘dog’, or ‘cat’ or ‘bus’ are words – they take that for granted and get on and use them as such. But thumbshrew’s post wasn’t discussing the status of words for everyday purposes, but their status in an assessment of children’s facility with phonemes and graphemes. @oldandrewuk was questioning the criteria she used to determine whether a letter string was a word or not. And words are notoriously slippery customers.

When is a word not a word?

We’re all familiar with variability in the meaning of words, but whether something is a word or not isn’t always clear either. Take words presented in isolation. When we speak, the way we say a particular word is affected by the words that precede and follow it. If I say aloud “the dog barked” the ‘g’ in ‘dog’ is barely audible, whereas if I say “the bark of the dog” the ‘g’ in ‘dog’ is much clearer. If ‘dog’ were to be excised from the first phrase, presented in isolation and people asked to say what word it was, they’d be just as likely to respond with ‘dot’ or ‘doll’ as ‘dog’. Or, if given the option, they might say it isn’t a word at all, since ‘d’ followed only by a short ‘o’ isn’t used in English.

Written English poses other challenges. Because English spelling is largely standardised and because we leave a space fore and aft when writing words, ‘dog’ looks pretty much the same in isolation as it does in a sentence, so people don’t have the same difficulty in identifying written words taken from sentences as they do with spoken English. But not all five year-olds (or adults unfamiliar with the Roman alphabet) perceive ‘dog’ in a sans serif font as being the same as ‘dog’ in a font with serifs – and could consider ‘dog’ in a gothic typeface not to be a word at all.

Then there are heteronyms (words that have the same spelling but different meanings and pronunciation) where context determines what meaning and pronunciation should be used – ‘wind’ as in whistling or ‘wind’ as in bobbin. I once found myself having an argument with my mother, then in her 50s, about the word ‘digest’. For the noun, the stress is on the first syllable. For the verb it’s on the second. My mum insisted they were both pronounced like the verb and claimed she couldn’t hear any difference. I thought she was just being difficult until it emerged that my son had a very similar problem differentiating between similar speech sounds.

Words are an integral part of spoken and written English. They are relatively rarely encountered in isolation, and the context in which they occur can be crucial in determining their meaning and pronunciation. You could, as thumbshrew implies, define ‘word’ in terms of the role a sound or letter string plays in spoken or written language and argue that by definition any letter string presented in isolation isn’t a word.

But individuals’ perceptions of spoken and written English don’t determine whether a letter string is a genuine word or not. Unlike French, English doesn’t have an official body to make such decisions. Scrabble players might appeal to the OED, but then use words in the course of conversation that aren’t yet in the dictionary. How we treat novel words highlights the criteria we use to determine whether a sound or letter string is a word or not.

A new word coined by an academic might gain immediate acceptance as a genuine word due to the academic’s expertise and the need to label a newly discovered phenomenon. A street slang term, in contrast, might have only a brief period of usage in within a small community. Would the slang term qualify as a word? Or what about ‘prolly’, a contraction of ‘probably’ used in social media. Does ‘prolly’ qualify as a genuine word or not? Is ‘digest’ used as a noun but with the stress on the second syllable a genuine word? Or how about a toddler who calls a fridge a ‘sputich’? If her family understand and use the word ‘sputich’ in conversation does that make it a genuine word?

Prototype theory

During the course of the ‘dog debate’ I attempted to shed some light on what makes a sound or letter string a word by appealing to prototype theory. In the 1970s Eleanor Rosch showed that people use the features of items to categorise them. Frequently occurring features are highly prototypical for particular categories; features that occur less frequently are less prototypical. For example, birds typically have beaks, wings, feathers, lay eggs and fly, but some birds can’t fly, so flight isn’t as highly prototypical a feature as the others. In a Venn diagram illustrating prototypicality in birds, robins would be near the middle of the circle because they show all prototypical features. Ostriches and penguins would be nearer the edge of the circle. The circle wouldn’t have a clear boundary because it would blur into feathered reptiles.

Words are also things that can be categorised. One prototypical feature of words is frequency of usage. Another is the degree of agreement on whether it’s a word or not. If a word is used very frequently by all and sundry, it’s highly prototypical. ‘Dog’, @oldandrewuk’s example, would be near the centre of a Venn diagram representing the category ‘word’. Chances are ‘sputich’ would fall outside it. As for ‘prolly’, there would likely be differences of opinion over whether or not it was a word, indicating that the category ‘word’ also has fuzzy rather than crisp boundaries.

Because the criteria for whether or not something is a genuine word or not – usage and agreement – are on a scale that could range from 0% to 100%, deciding whether or not a letter string is a word isn’t a straightforward task. And, as @ded6ajd pointed out in the Twitter debate, a word isn’t necessarily a letter string. Words can take the form of sequences of speech sounds, patterns of marks, and gestures, implying that a word is an actually an abstract construct.

So where does all that leave the ‘dog debate’?

Back to the ‘dog debate’

As I said, I think @oldandrewuk’s argument carries some weight for everyday purposes. However, in relation to the point he took issue with, I think there are two flaws in his challenge. If I’ve understood correctly, he’s saying that if a letter string exists whose status as a word is undisputed, that invalidates thumbshrew’s claim that it’s debatable whether any ‘word’ presented in isolation is genuinely a word. To demonstrate he needs only to cite one example, and chooses ‘dog’. If no one is debating the status of ‘dog’, then the status of ‘dog’ isn’t debatable, therefore the claim that the status of ‘any’ word is debatable is false.

But @oldandrewuk’s challenge rests on an implicit assumption that not finding anyone who disputes the status of ‘dog’ is the same as the status of ‘dog’ being undisputed. It isn’t. That’s equivalent to claiming that if no one has ever seen a crow whose plumage isn’t black, we can safely conclude all crows must be black. Of course that conclusion isn’t safe since we cannot possibly gather data on all crows.

The second flaw is this: Even if we were able to interrogate every living English speaker about their opinions on the status of ‘dog’ and we found universal agreement that ‘dog’ was a word, we still wouldn’t know whether all those people were using the same criterion for whether a letter string was a word or not, or what their views might be about other words. On the face of it there might appear to be no debate about the status of ‘dog’, but put a random sample of those in agreement about its status in a room together and get them to talk about their criteria for what constitutes a word, and it’s likely that a debate would start pretty quickly.

There is no universal standard for what constitutes a genuine word. If it were easy to establish one, the French, with their historic penchant for standardisation, would have come up with one by now. Words aren’t like weights or measures, where in principle you could cut a piece of metal to an arbitrary length, put it in a glass case and agree that it’s the international standard for a cubit. Words are more like populations of organisms, with new ones arising and old ones falling out of use and being forgotten continually. One person’s word might be another person’s non-word.

And that’s the nub of the problem. For @oldandrewuk near-as-dammit universal agreement about a letter string’s status as a genuine word, in isolation or not, is good enough. For thumbshrew, it isn’t; she’s aware of the different criteria people use for ‘word’ and concludes that the status of letter strings presented in isolation is debatable.

Personally, I think thumbshrew is right – that it is debatable whether any ‘word’ presented in isolation is genuinely a word and that @oldandrewuk’s challenge is flawed. But because research shows that people identify written words in isolation as words more readily than they do spoken words, frankly I don’t think it would be much of a debate.

A more important point, which @oldandrewuk doesn’t take up, is why there are ‘genuine words’ in the phonics check at all. It would make far more sense for all the letter strings to be pseudo-words. This would give a more accurate picture of children’s phonemic and graphemic awareness, would reduce the impact of confounding factors such as word recognition and avoid the need to have a debate about what constitutes a genuine word.