synthetic phonics, dyslexia and natural learning

Too intense a focus on the virtues of synthetic phonics (SP) can, it seems, result in related issues getting a bit blurred. I discovered that some whole language supporters do appear to have been ideologically motivated but that the whole language approach didn’t originate in ideology. And as far as I can tell we don’t know if SP can reduce adult functional illiteracy rates. But I wouldn’t have known either of those things from the way SP is framed by its supporters. SP proponents also make claims about how the brain is involved in reading. In this post I’ll look at two of them; dyslexia and natural learning.

Dyslexia

Dyslexia started life as a descriptive label for the reading difficulties adults can develop due to brain damage caused by a stroke or head injury. Some children were observed to have similar reading difficulties despite otherwise normal development. The adults’ dyslexia was acquired (they’d previously been able to read) but the children’s dyslexia was developmental (they’d never learned to read). The most obvious conclusion was that the children also had brain damage – but in the early 20th century when the research started in earnest there was no easy way to determine that.

Medically, developmental dyslexia is still only a descriptive label meaning ‘reading difficulties’ (causes unknown, might/might not be biological, might vary from child to child). However, dyslexia is now also used to denote a supposed medical condition that causes reading difficulties. This new usage is something that Diane McGuinness complains about in Why Children Don’t Learn to Read.

I completely agree with McGuinness that this use isn’t justified and has led to confusion and unintended and unwanted outcomes. But I think she muddies the water further by peppering her discussion of dyslexia (pp. 132-140) with debatable assertions such as:

“We call complex human traits ‘talents’”.

“Normal variation is on a continuum but people working from a medical or clinical model tend to think in dichotomies…”.

“Reading is definitely not a property of the human brain”.

“If reading is a biological property of the brain, transmitted genetically, then this must have occurred by Lamarckian evolution.”

Why debatable? Because complex human traits are not necessarily ‘talents’; clinicians tend to be more aware of normal variation than most people; reading must be a ‘property of the brain’ if we need a brain to read; and the research McGuinness refers to didn’t claim that ‘reading’ was transmitted genetically.

I can understand why McGuinness might be trying to move away from the idea that reading difficulties are caused by a biological impairment that we can’t fix. After all, the research suggests SP can improve the poor phonological awareness that’s strongly associated with reading difficulties. I get the distinct impression, however, that she’s uneasy with the whole idea of reading difficulties having biological causes. She concedes that phonological processing might be inherited (p.140) but then denies that a weakness in discriminating phonemes could be due to organic brain damage. She’s right that brain scans had revealed no structural brain differences between dyslexics and good readers. And in scans that show functional variations, the ability to read might be a cause, rather than an effect.

But as McGuinness herself points out reading is a complex skill involving many brain areas, and biological mechanisms tend to vary between individuals. In a complex biological process there’s a lot of scope for variation. Poor phonological awareness might be a significant factor, but it might not be the only factor. A child with poor phonological awareness plus visual processing impairments plus limited working memory capacity plus slow processing speed – all factors known to be associated with reading difficulties – would be unlikely to find those difficulties eliminated by SP alone. The risk in conceding that reading difficulties might have biological origins is that using teaching methods to remediate them might then called into question – just what McGuinness doesn’t want to happen, and for good reason.

Natural and unnatural abilities

McGuinness’s view of the role of biology in reading seems to be derived from her ideas about the origin of skills. She says;

It is the natural abilities of people that are transmitted genetically, not unnatural abilities that depend upon instruction and involve the integration of many subskills”. (p.140, emphasis McGuinness)

This is a distinction often made by SP proponents. I’ve been told that children don’t need to be taught to walk or talk because these abilities are natural and so develop instinctively and effortlessly. Written language, in contrast, is a recent man-made invention; there hasn’t been time to evolve a natural mechanism for reading, so we need to be taught how to do it and have to work hard to master it. Steven Pinker, who wrote the foreword to Why Children Can’t Read seems to agree. He says “More than a century ago, Charles Darwin got it right: language is a human instinct, but written language is not” (p.ix).

Although that’s a plausible model, what Pinker and McGuinness fail to mention is that it’s also a controversial one. The part played by nature and nurture in the development of language (and other abilities) has been the subject of heated debate for decades. The reason for the debate is that the relevant research findings can be interpreted in different ways. McGuinness is entitled to her interpretation but it’s disingenuous in a book aimed at a general readership not to tell readers that other researchers would disagree.

Research evidence suggests that the natural/unnatural skills model has got it wrong. The same natural/unnatural distinction was made recently in the case of part of the brain called the fusiform gyrus. In the fusiform gyrus, visual information about objects is categorised. Different types of objects, such as faces, places and small items like tools, have their own dedicated locations. Because those types of objects are naturally occurring, researchers initially thought their dedicated locations might be hard-wired.

But there’s also word recognition area. And in experts, the faces area is also used for cars, chess positions, and specially invented items called greebles. To become an expert in any of those things you require some instruction – you’d need to learn the rules of chess or the names of cars or greebles. But your visual system can still learn to accurately recognise, discriminate between and categorise many thousands of items like faces, places, tools, cars, chess positions and greebles simply through hours and hours of visual exposure.

Practice makes perfect

What claimants for ‘natural’ skills also tend to overlook is how much rehearsal goes into them. Most parents don’t actively teach children to talk, but babies hear and rehearse speech for many months before they can say recognisable words. Most parents don’t teach toddlers to walk, but it takes young children years to become fully stable on their feet despite hours of daily practice.

There’s no evidence that as far as the brain is concerned there’s any difference between ‘natural’ and ‘unnatural’ knowledge and skills. How much instruction and practice knowledge or skills require will depend on their transparency and complexity. Walking and bike-riding are pretty transparent; you can see what’s involved by watching other people. But they take a while to learn because of the complexity of the motor-co-ordination and balance involved. Speech and reading are less transparent and more complex than walking and bike-riding, so take much longer to master. But some children require intensive instruction in order to learn to speak, and many children learn to read with minimal input from adults. The natural/unnatural distinction is a false one and it’s as unhelpful as assuming that reading difficulties are caused by ‘dyslexia’.

Multiple causes

What underpins SP proponents’ reluctance to admit biological factors as causes for reading difficulties is, I suspect, an error often made when assessing cause and effect. It’s an easy one to make, but one that people advocating changes to public policy need to be aware of.

Let’s say for the sake of argument that we know, for sure, that reading difficulties have three major causes, A, B and C. The one that occurs most often is A. We can confidently predict that children showing A will have reading difficulties. What we can’t say, without further investigation, is whether a particular child’s reading difficulties are due to A. Or if A is involved, that it’s the only cause.

We know that poor phonological awareness is frequently associated with reading difficulties. Because SP trains children to be aware of phonological features in speech, and because that training improves word reading and spelling, it’s a safe bet that poor phonological awareness is also a cause of reading difficulties. But because reading is a complex skill, there are many possible causes for reading difficulties. We can’t assume that poor phonological awareness is the only cause, or that it’s a cause in all cases.

The evidence that SP improves children’s decoding ability is persuasive. However, the evidence also suggests that 12% – 15% of children will still struggle to learn to decode using SP. And that around 15% of children will struggle with reading comprehension. Having a method of reading instruction that works for most children is great, but education should benefit all children, and since the minority of children who struggle are the ones people keep complaining about, we need to pay attention to what causes reading difficulties for those children – as individuals. In education, one size might fit most, but it doesn’t fit all.

Reference

McGuinness, D. (1998). Why Children Can’t Read and What We Can Do About It. Penguin.

truth and knowledge

A couple of days ago I became embroiled in a long-running Twitter debate about the nature of truth and knowledge, during which at least one person fell asleep. @EdSacredProfane has asked me where I ‘sit’ on truth. So, for the record, here’s what I think about truth and knowledge.

1. I think it’s safe to assume that reality and truth are out there. Even if they’re not out there and we’re all experiencing a collective hallucination we might as well assume that reality is real and that truth is true because if we don’t, our experience – whether real or imagined – is likely to get pretty unpleasant.

2. I’m comfortable with the definition of knowledge as justified true belief. But that’s a definition of an abstract concept. The extent to which people can actually justify or demonstrate the truth of their beliefs (collectively or individually) varies considerably.

3. The reason for this is the way perception works. All incoming sensory information is interpreted by our brains, and brains aren’t entirely reliable when it comes to interpreting sensory information. So we’ve devised methods of cross-checking what our senses tell us to make sure we haven’t got it disastrously wrong. One approach is known as the scientific method.

4. Science works on the basis of probability. We can never say for sure that A or B exists or that C definitely causes D. But for the purposes of getting on with our lives if there’s enough evidence suggesting that A or B exists and that C causes D, we assume those things to be true and justified to varying extents.

5. Even though our perception is a bit flaky and we can’t be 100% sure of anything, it doesn’t follow that reality is flaky or not 100% real. Just that our knowledge about it isn’t 100% reliable. The more evidence we’ve gathered, the more consistent and predictable reality looks. Unfortunately it’s also complicated, which, coupled with our flaky and uncertain perceptions, makes life challenging.

seven myths about education: deep structure

deep structure and understanding

Extracting information from data is crucially important for learning; if we can’t spot patterns that enable us to identify changes and make connections and predictions, no amount of data will enable us to learn anything. Similarly, spotting patterns within and between facts enables us to identify changes and connections and make predictions will help us understand how the world works. Understanding is a concept that crops up a lot in information theory and education. Several of the proposed hierarchies of knowledge have included the concept of understanding – almost invariably at or above the knowledge level of the DIKW pyramid. Understanding is often equated with what’s referred to as the deep structure of knowledge. In this post I want to look at deep structure in two contexts; when it involves a small number of facts, and when it involves a very large number, as in an entire knowledge domain.

When I discussed the DIKW pyramid, I referred to information being extracted from a ‘lower’ level of abstraction to form a ‘higher’ one. Now I’m talking about ‘deep’ structure. What’s the difference, if any? The concept of deep structure comes from the field of linguistics. The idea is that you can say the same thing in different ways; the surface features of what you say might be different, but the deep structure of the statements could still be the same. So the sentences ‘the cat is on the mat’ and ‘the mat is under the cat’ have different surface features but the same deep structure. Similarly, ‘the dog is on the box’ and ‘the box is under the dog’ share the same deep structure. From an information-processing perspective the sentences about the dog and the cat share the same underlying schema.

In the DIKW knowledge hierarchy, extracted information is at a ‘higher’ level, not a ‘deeper’ one. The two different terminologies are used because the concepts of ‘higher’ level extraction of information and ‘deep’ structure have different origins, but essentially they are the same thing. All you need to remember is that in terms of information-processing ‘high’ and ‘deep’ both refer to the same vertical dimension – which term you use depends on your perspective. Higher-level abstractions, deep structure and schemata refer broadly to the same thing.

deep structure and small numbers of facts

Daniel Willingham devotes an entire chapter of his book Why don’t students like school? to the deep structure of knowledge when addressing students’ difficulty in understanding abstract ideas. Willingham describes mathematical problems presented in verbal form that have different surface features but the same deep structure – in his opening example they involve the calculation of the area of a table top and of a soccer pitch (Willingham, p.87). What he is referring to is clearly the concept of a schema, though he doesn’t call it that.

Willingham recognises that students often struggle with deep structure concepts and recommends providing them with many examples and using analogies they’re are familiar with. These strategies would certainly help, but as we’ve seen previously, because the surface features of facts aren’t consistent in terms of sensory data, students’ brains are not going to spot patterns automatically and pre-consciously in the way they do with consistent low-level data and information. To the human brain, a cat on a mat is not the same as a dog on a box. And a couple trying to figure out whether a dining table would be big enough involves very different sensory data to that involved in a groundsman working out how much turf will be needed for a new football pitch.

Willingham’s problems involve several levels of abstraction. Note that the levels of abstraction only provide an overall framework, they’re not set in stone; I’ve had to split the information level into two to illustrate how information needs to be extracted at several successive levels before students can even begin to calculate the area of the table or the football pitch. The levels of abstraction are;

• data – the squiggles that make up letters and the sounds that make up speech
• first-order information – letters and words (chunked)
• second-order information – what the couple is trying to do and what the groundsman is trying to do (not chunked)
• knowledge – the deep structure/schema underlying each problem.

To anyone familiar with calculating area, the problems are simple ones; to anyone unfamiliar with the schema involved, they impose a high cognitive load because the brain is trying to juggle information about couples, tables, groundsmen and football pitches and can’t see the forest for the trees. Most brains would require quite a few examples before they had enough information to be able to spot the two patterns, so it’s not surprising that students who haven’t had much practical experience of buying tables, fitting carpets, painting walls or laying turf take a while to cotton on.

visual vs verbal representations

What might help students further is making explicit the deep structure of groups of facts with the help of visual representations. Visual representations have one huge advantage over verbal representations. Verbal representations, by definition, are processed sequentially – you can only say, hear or read one word at a time. Most people can process verbal information at the same rate at which they hear it or read it, so most students will be able to follow what a teacher is saying or what they are reading, even if it takes a while to figure out what the teacher or the book are getting at. However, if you can’t process verbal information quickly enough, can’t recall earlier sentences whilst processing the current one, miss a word, or don’t understand a crucial word or concept, it will be impossible to make sense of the whole thing. In visual representations, you can see all the key units of information at a glance, most of the information can be processed in parallel and the underlying schema is more obvious.

The concept of calculating area lends itself very well to visual representation; it is a geometry problem after all. Getting the students to draw a diagram of each problem would not only focus their attention on the deep structure rather than its surface features, it would also demonstrate clearly that problems with different surface features can have the same underlying deep structure.

It might not be so easy to make visual representations of the deep structure of other groups of facts, but it’s an approach worth trying because it makes explicit the deep structure of the relationship between the facts. In Seven Myths about Education, one of Daisy’s examples of a fact is the date of the battle of Waterloo. Battles are an excellent example of deep structure/schemata in action. There is a large but limited number of ways two opposing forces can position themselves in battle, whoever they are and whenever and wherever they are fighting, which is why ancient battles are studied by modern military strategists. The configurations of forces and what subsequent configurations are available to them are very similar to the configurations of pieces and next possible moves in chess. Of course chess began as a game of military strategy – as a visual representation of the deep structure of battles.

Deep structure/underlying schemata are a key factor in other domains too. Different atoms and different molecules can share the same deep structure in their bonding and reactions and chemists have developed formal notations for representing that visually; the deep structure of anatomy and physiology can be the same for many different animals – biologists rely heavily on diagrams to convey deep structure information. Historical events and the plots of plays can follow similar patterns even if the events occurred or the plays were written thousands of years apart. I don’t know how often history or English teachers use visual representations to illustrate the deep structure of concepts or groups of facts, but it might help students’ understanding.

deep structure of knowledge domains

It’s not just single facts or small groups of facts that have a deep structure or underlying schema. Entire knowledge domains have a deep structure too, although not necessarily in the form of a single schema; many connected schemata might be involved. How they are connected will depend on how experts arrange their knowledge or how much is known about a particular field.

Making students aware of the overall structure of a knowledge domain – especially if that’s via a visual representation so they can see the whole thing at once – could go a long way to improving their understanding of whatever they happen to be studying at any given time. It’s like the difference between Google Street View and Google Maps. Google Street View is invaluable if you’re going somewhere you’ve never been before and you want to see what it looks like. But Google Maps tells you where you are in relation to where you want to be – essential if you want to know how to get there. Having a mental map of an entire knowledge domain shows you how a particular fact or group of facts fits in to the big picture, and also tells you how much or how little you know.

Daisy’s model of cognition

Daisy doesn’t go into detail about deep structure or schemata. She touches on these concepts only a few times; once in reference to forming a chronological schema of historical events, then when referring to Joe Kirby’s double-helix metaphor for knowledge and skills and again when discussing curriculum design.

I don’t know if Daisy emphasises facts but downplays deep structure and schemata to highlight the point that the educational orthodoxy does essentially the opposite, or whether she doesn’t appreciate the importance of deep structure and schemata compared to surface features. I suspect it’s the latter. Daisy doesn’t provide any evidence to support her suggestion that simply memorising facts reduces cognitive load when she says;

“So when we commit facts to long-term memory, they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition”(p.20).

The examples she refers to immediately prior to this assertion are multiplication facts that meet the criteria for chunking – they are simple and highly consistent and if they are chunked they’d be treated as one item by working memory. Whether facts like the dates of historical events meet the criteria for chunking or whether they occupy less space in working memory when memorised is debatable.

What’s more likely is that if more complex and less consistent facts are committed to memory, they are accessed more quickly and reliably than those that haven’t been memorised. Research evidence suggests that neural connections that are activated frequently become stronger and are accessed faster. Because information is carried in networks of neural connections, the more frequently we access facts or groups of facts, the faster and more reliably we will be able to access them. That’s a good thing. It doesn’t follow that those facts will occupy less space in working memory.

It certainly isn’t the case that simply committing to memory hundreds or thousands of facts will enable students to form a schema, or if they do, that it will be the schema their teacher would like them to form. Teachers might need to be explicit about the schemata that link facts. Since hundreds or thousands of facts tend to be linked by several different schemata – you can arrange the same facts in different ways – being explicit about the different ways they can be linked might be crucial to students’ understanding.

Essentially, deep structure schemata play an important role in three ways;

Students’ pre-existing schemata will affect their understanding of new information – they will interpret it in the light of the way they currently organise their knowledge. Teachers need to know about common misunderstandings as well as what they want students to understand.

Secondly, being able to identify the schema underlying one fact or small group of facts is the starting point for spotting similarities and differences between several groups of facts.

Thirdly, having a bird’s-eye view of the schemata involved in an entire knowledge domain increases students’ chances of understanding where a particular fact fits in to the grand scheme of things – and their awareness of what they don’t know.

Having a bird’s-eye view of the curriculum can help too, because it can show how different subject areas are linked. Subject areas and the curriculum are the subjects of the next post.

seven myths about education: facts and schemata

Knowledge occupies the bottom level of Bloom’s taxonomy of educational objectives. In the 1950s, Bloom and his colleagues would have known a good deal about the strategies teachers use to help students to acquire knowledge. What they couldn’t have known is how students formed their knowledge; how they extracted information from data and knowledge from information. At the time cognitive psychologists knew a fair amount about learning but had only a hazy idea about how it all fitted together. The DIKW pyramid I referred to in the previous post explains how the bottom layer of Bloom’s taxonomy works – how students extract information and knowledge during learning. Anderson’s simple theory of cognition explains how people extract low-level information. More recent research at the knowledge and wisdom levels is beginning to shed light on Bloom’s higher-level skills, why people organise the same body of knowledge in different ways and why they misunderstand and make mistakes.

Seven Myths about Education addresses the knowledge level of Bloom’s taxonomy. Daisy Christodoulou presents a model of cognition that she feels puts the higher-level skills in Bloom’s taxonomy firmly into context. Her model also forms the basis for a pedagogical approach and a structure for a curriculum, which I’ll discuss in another post. Facts are a core feature of Daisy’s model. I’ve mentioned previously that many disciplines find facts problematic because facts, by definition, have to be valid (true), and it’s often difficult to determine their validity. In this post I want to focus instead on the information processing entailed in learning facts.

a simple theory of cognition

Having explained the concept of chunking and the relationship between working and long-term memory, Daisy introduces Anderson’s paper;

So when we commit facts to long-term memory, they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition. Anderson puts it thus:

‘All that there is to intelligence is the simple accrual and tuning of many small units of knowledge that in total produce complex cognition. The whole is no more than the sum of its parts, but it has a lot of parts.’”

She then says “a lot is no exaggeration. Long-term memory is capable of storing thousands of facts, and when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20).

facts

This was one of the points where I began to lose track of Daisy’s argument. I think she’s saying this:

Anderson shows that low-level data can be chunked into a ‘unit of knowledge’ that is then treated as one item by WM – in effect increasing the capacity of WM. In the same way, thousands of memorised facts can be chunked into a more complex unit (a schema) that is then treated as one item by WM – this essentially bypasses the limitations of WM.

I think Daisy assumes that the principle Anderson found pertaining to low-level ‘units of knowledge’ applies to all units of knowledge at whatever level of abstraction. It doesn’t. Before considering why it doesn’t, it’s worth noting a problem with the use of the word ‘facts’ when describing data. Some researchers have equated data with ‘raw facts’. The difficulty with defining data as ‘facts’ is that by definition a fact has to be valid (true) and not all data is valid, as the GIGO (garbage-in-garbage-out) principle that bedevils computer data processing and the human brain’s often flaky perception of sensory input demonstrate. In addition, ‘facts’ are more complex than raw (unprocessed) data or raw (unprocessed) sensory input.

It’s clear from Daisy’s examples of facts that she isn’t referring to raw data or raw sensory input. Her examples include the date of the battle of Waterloo, key facts about numerous historical events and ‘all of the twelve times tables’. She makes it clear in the rest of the book that in order to understand such facts, students need prior knowledge. In terms of the DIKW hierarchy, Daisy’s ‘facts’ are at a higher level to Anderson’s ‘units of knowledge’ and are unlikely to be processed automatically and pre-consciously in the same way as Anderson’s units. To understand why, we need to take another look at Anderson’s units of knowledge and why chunking happens.

chunking revisited

Data that can be chunked easily have two key characteristics; they involve small amounts of information and the patterns within them are highly consistent. As I mentioned in the previous post, one of Anderson’s examples of chunking is the visual features of upper case H. As far as the brain is concerned, the two parallel vertical lines and linking horizontal line that make up the letter H don’t involve much information. Also, although fonts and handwriting vary, the core features of all the Hs the brain perceives are highly consistent. So the brain soon starts perceiving all Hs as the same thing and chunks up the core features into a single unit – the letter H. If H could also be written Ĥ and Ħ in English, it would take a bit longer for the brain to chunk the three different configurations of lines and to learn the association between them, but not much longer, since the three variants involve little information and are still highly consistent.

understanding facts

But the letter H isn’t a fact, it’s a symbol. So are + and the numerals 1 and 2. ‘1+2’ isn’t a fact in the sense that Daisy uses the term, it’s a series of symbols. ‘1+2=3’ could be considered a fact because it consists of symbols representing two entities and the relationship between them. If you know what the symbols refer to, you can understand it. It could probably be chunked because it contains a small amount of information and has consistent visual features. Each multiplication fact in multiplication tables could probably be chunked, too, since they meet the same criteria. But that’s not true for all the facts that Daisy refers to, because they are more complex and less consistent.

‘The cat is on the mat’ is a fact, but in order to understand it, you need some prior knowledge about cats, mats and what ‘on’ means. These would be treated by working memory as different items. Most English-speaking 5 year-olds would understand the ‘cat is on the mat’ fact, but because there are different sorts of cats, different sorts of mats and different ways in which the cat could be on the mat, each child could have a different mental image of the cat on the mat. A particular child might conjure up a different mental image each time he or she encountered the fact, meaning that different sensory data were involved each time, the mental representations of the fact would be low in consistency, and the fact’s component parts couldn’t be chunked into a single unit in the same way as lower-level more consistent representations. Consequently the fact is less likely to be treated as one item in working memory.

Similarly, in order to understand a fact like ‘the battle of Waterloo was in 1815’ you’d need to know what a battle is, where Waterloo is (or at least that it’s a place), what 1815 means and how ‘of’ links a battle and a place name. If you’re learning about the Napoleonic wars, your perception of the battle is likely to keep changing and the components of the facts would have low consistency meaning that it couldn’t be chunked in the way Anderson describes.

The same problem involving inconsistency would prevent two or more facts being chunked into a single unit. But clearly people do mentally link facts and the components of facts. They do it using a schema, but not quite in the way Daisy describes.

schemata

Before discussing how people use schemata (schemas), a comment on the biological structures that enable us to form them. I mentioned in an earlier post that the neurons in the brain form complex networks a bit like the veins in a leaf. Physical connections are formed between neighbouring neurons when the neurons are activated simultaneously by incoming data. If the same or very similar data are encountered repeatedly, the same neurons are activated repeatedly, connections between them are strengthened and eventually networks of neurons are formed that can carry a vast amount of information in their patterns of connections. The patterns of connections between the neurons represent the individual’s perception of the patterns in the data.

So if I see a cat on a mat, or read a sentence about a cat on a mat, or imagine a cat on a mat, my networks of neurons carrying information about cats and mats will be activated. Facts and concepts about cats, mats and things related to them will readily spring to mind. But I won’t have access to all of those facts and concepts at once. That would completely overload my working memory. Instead, what I recall is a stream of facts and concepts about cats and mats that takes time to access. It’s only a short time, but it doesn’t happen all at once. Also, some facts and concepts will be activated immediately and strongly and others will take longer and might be a bit hazy. In essence, a schema is a network of related facts and concepts, not a chunked ‘unit of knowledge’.

Daisy says “when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20). It doesn’t work quite like that, for several reasons.

the structure of a schema A schema is what it sounds like – a schematic plan or framework. It doesn’t consist of facts or concepts, but it’s a representation of how someone mentally arranges facts or concepts. In the same way the floor-plan of a building doesn’t consist of actual walls, doors and windows, but it does show you where those things are in the building in relation to each other. The importance of this apparently pedantic point will become clear when I discuss deep structure.

implicit and explicit schemata Schemata can be implicit – the brain organises facts and concepts in a particular way but we’re not aware of what it is – or explicit – we actively organise facts and concepts in a particular way and we aware of how they are organised.

the size of a schema Schemata can vary in size and complexity. The configuration of the three lines that make up the letter H is a schema, so is the way a doctor organises his or her knowledge about the human circulatory system. A schema doesn’t have to represent all the facts or concepts it links together. If it did, a schema involving thousands of facts would be so complex it wouldn’t be much help in showing how the facts were related. And in order to encompass all the different relationships between thousands of facts, a single schema for them would need to be very simple.

For example, a simple schema for chemistry would be that different chemicals are formed from different configurations of the sub-atomic ‘particles’ that make up atoms and configurations of atoms that form molecules. Thousands of facts can be fitted into that schema. In order to have a good understanding of chemistry, students would need to know about schemata other than just that simple one, and would need to know thousands of facts about chemistry before they would qualify as experts, but the simple schema plus a few examples would give them a basic understanding of what chemistry was about.

experts’ schemata Research into expertise (e.g. Chi et al, 1981) shows that experts don’t usually have one single schema for all the facts they know, but instead use different schemata for different aspects of their body of knowledge. Sometimes those schemata are explicitly linked, but sometimes they’re not. Sometimes they can’t be linked because no one knows how the linkage works yet.

chess experts

Daisy refers to research showing that expert chess players memorise thousands of different configurations of chess pieces (p.78). This is classic chunking; although in different chess sets specific pieces vary in appearance, their core visual features and the moves they can make are highly consistent, so frequently-encountered configurations of pieces are eventually treated by the brain as single units – the brain chunks the positions of the chess pieces in essentially the same way as it chunks letters into words.

De Groot’s work showed that chess experts initially identified the configurations of pieces that were possible as a next move, and then went through a process of eliminating the possibilities. The particular configuration of pieces on the board would activate several associated schemata involving possible next and subsequent moves.

So, each of the different configurations of chess pieces that are encountered so frequently they are chunked, has an underlying (simple) schema. Expert chess players then access more complex schemata for next and subsequent possible moves. Even if they have an underlying schema for chess as a whole, it doesn’t follow that they treat chess as a single unit or that they recall all possible configurations at once. Most people can reliably recognise thousands of faces and thousands of words and have schemata for organising them, but when thinking about faces or words, they don’t recall all faces or all words simultaneously. That would rapidly overload working memory.

Compared to most knowledge domains, chess is pretty simple. Chess expertise consists of memorising a large but limited number of configurations and having schemata that predict the likely outcomes from a selection of them. Because of the rules of chess, although lots of moves are possible, the possibilities are clearly defined and limited. Expertise in medicine, say, or history, is considerably more complex and less certain. A doctor might have many schemata for human biology; one for each of the skeletal, nervous, circulatory, respiratory and digestive systems, for cell metabolism, biochemistry and genetics etc. Not only is human biology more complex than chess, there’s also more uncertainty involved. Some of those schemata we’re pretty sure about, some we’re not so sure about and some we know very little about. There’s even more uncertainty involved in history. Evaluating evidence about how the human body works might be difficult, but the evidence itself is readily available in the form of human bodies. Historical evidence is often absent and likely to stay that way, which makes establishing facts and developing schemata more challenging.

To illustrate her point about schemata Daisy claims that learning couple of key facts about 150 historical events from 3000BC to the present, will form “the fundamental chronological schema that is the basis of all historical understanding” (p.20). Chronological sequencing could certainly form a simple schema for history, but you don’t need to know about many events in order to grasp that principle – two or three would suffice. Again, although this simple schema would give students a basic understanding of what history was about, in order to have a good understanding of history, students would need to know not only thousands of facts, but to develop many schemata about how those facts were linked before they would qualify as experts. This brings us on to the deep structure of knowledge, the subject of the next post.

references
Chi, MTH, Feltovich, PJ & Glaser, R (1981). Categorisation and Representation of Physics Problems by Experts and Novices, Cognitive Science, 5, 121-152
de Groot, AD (1978). Thought in Chess. Mouton.

Edited for clarity 8/1/17.

seven myths about education: a knowledge framework

In Seven Myths about Education Daisy Christodoulou refers to Bloom’s taxonomy of educational objectives as a metaphor that leads to two false conclusions; that skills are separate from knowledge and that knowledge is ‘somehow less worthy and important’ (p.21). Bloom’s taxonomy was developed in the 1950s as a way of systematising what students need to do with their knowledge. At the time, quite a lot was known about what people did with knowledge because they usually process it actively and explicitly. Quite a lot less was known about how people acquire knowledge, because much of that process is implicit; students usually ‘just learned’ – or they didn’t. Daisy’s book focuses on how students acquire knowledge, but her framework is an implicit one; she doesn’t link up the various stages of acquiring knowledge in an explicit formal model like Bloom’s. Although I think Daisy makes some valid points about the educational orthodoxy, some features of her model lead to conclusions that are open to question. In this post, I compare the model of cognition that Daisy describes with an established framework for analysing knowledge with origins outside the education sector.

a framework for knowledge

Researchers from a variety of disciplines have proposed frameworks involving levels of abstraction in relation to how knowledge is acquired and organised. The frameworks are remarkably similar. Although there are differences of opinion about terminology and how knowledge is organised at higher levels, there’s general agreement that knowledge is processed along the lines of the catchily named DIKW pyramid – DIKW stands for data, information, knowledge and wisdom. The Wikipedia entry gives you a feel for the areas of agreement and disagreement involved. In the pyramid, each level except the data level involves the extraction of information from the level below. I’ll start at the bottom.



Data

As far as the brain is concerned, data don’t actually tell us anything except whether something is there or not. For computers, data are a series of 0s and 1s; for the brain data is largely in the form of sensory input – light, dark and colour, sounds, tactile sensations, etc.

Information
It’s only when we spot patterns within data that the data can tell us anything. Information consists of patterns that enable us to identify changes, identify connections and make predictions. For computers, information involves detecting patterns in all the 0s and 1s. For the brain it involves detecting patterns in sensory input.

Knowledge
Knowledge has proved more difficult to define, but involves the organisation of information.

Wisdom
Although several researchers have suggested that knowledge is also organised at a meta-level, this hasn’t been extensively explored.

The processes involved in the lower levels of the hierarchy – data and information – are well-established thanks to both computer modelling and brain research. We know a fair bit about the knowledge level largely due to work on how experts and novices think, but how people organise knowledge at a meta-level isn’t so clear.

The key concept in this framework is information. Used in this context, ‘information’ tells you whether something has changed or not, whether two things are the same or not, and identifies patterns. The DIKW hierarchy is sometimes summarised as; information is information about data, knowledge is information about information, and wisdom is information about knowledge.

a simple theory of complex cognition

Daisy begins her exploration of cognitive psychology with a quote by John Anderson, from his paper ACT: A simple theory of complex cognition (p.20). Anderson’s paper tackles the mystique often attached to human intelligence when compared to that of other species. He demonstrates that it isn’t as sophisticated or as complex as it appears, but is derived from a simple underlying principle. He goes on to explain how people extract information from data, deduce production rules and make predictions about commonly occurring patterns, which suggests that the more examples of particular data the brain perceives, the more quickly and accurately it learns. He demonstrates the principle using examples from visual recognition, mathematical problem solving and prediction of word endings.

natural learning

What Anderson describes is how human beings learn naturally; the way brains automatically process any information that happens to come their way unless something interferes with that process. It’s the principle we use to recognise and categorise faces, places and things. It’s the one we use when we learn to talk, solve problems and associate cause with effect. Scattergrams provide a good example of how we extract information from data in this way.

Scatterplot of longitudinal measurements of total brain volume for males (N=475 scans, shown in dark blue) and females (N=354 scans, shown in red).  From Lenroot et al (2007).

Scatterplot of longitudinal measurements of total brain volume for
males (N=475 scans, shown in dark blue) and females (N=354 scans,
shown in red). From Lenroot et al (2007).

Although the image consists of a mass of dots and lines in two colours, we can see at a glance that the different coloured dots and lines form two clusters.

Note that I’m not making the same distinction that Daisy makes between ‘natural’ and ‘not natural’ learning (p.36). Anderson is describing the way the brain learns, by default, when it encounters data. Daisy, in contrast, claims that we learn things like spoken language without visible effort because language is ‘natural’ whereas we need to be taught ‘formally and explicitly’, inventions like the alphabet and numbers. That distinction, although frequently made, isn’t necessarily a valid one. It’s based on an assumption that the brain has evolved mechanisms to process some types of data e.g. to recognise faces and understand speech, but can’t have had time to evolve mechanisms to process recent inventions like writing and mathematics. This assumption about brain hardwiring is a contentious one, and the evidence about how brains learn (including the work that’s developed from Anderson’s theory) makes it look increasingly likely that it’s wrong. If formal and explicit instruction are necessary in order to learn man-made skills like writing and mathematics, it begs the question of how these skills were invented in the first place, and Anderson would not have been able to use mathematical problem-solving and word prediction as his examples of the underlying mechanism of human learning. The theory that the brain is hardwired to process some types of information but not others, and the theory that the same mechanism processes all information, both explain how people appear to learn some things automatically and ‘naturally’. Which theory is right (or whether both are right) is still the subject of intense debate. I’ll return to the second theory later when I discuss schemata.

data, information and chunking

Chunking is a core concept in Daisy’s model of cognition. Chunking occurs when the brain links together several bits of data it encounters frequently and treats them as a single item – groups of letters that frequently co-occur are chunked into words. Anderson’s paper is about the information processing involved in chunking. One of his examples is how the brain chunks the three lines that make up an upper case H. Although Anderson doesn’t make an explicit distinction between data and information, in his examples the three lines would be categorised as data in the DIKW framework, as would be the curves and lines that make up numerals. When the brain figures out the production rule for the configuration of the lines in the letter H, it’s extracting information from the data – spotting a pattern. Because the pattern is highly consistent – H is almost always written using this configuration of lines – the brain can chunk the configuration of lines into the single unit we call the letter H. The letters A and Z also consist of three lines, but have different production rules for their configurations. Anderson shows that chunking can also occur at a slightly higher level; letters (already chunked) can be chunked again into words that are processed as single units, and numerals (already chunked) can be chunked into numbers to which production rules can be applied to solve problems. Again, chunking can take place because the patterns of letters in the words, and the patterns of numerals in Anderson’s mathematical problems are highly consistent. Anderson calls these chunked units and production rules ‘units of knowledge’. He doesn’t use the same nomenclature as the DIKW model, but it’s clear from his model that initial chunking occurs at the data level and further chunking can occur at the information level.

The brain chunks data and low-level units of information automatically; evidence for this comes from research showing that babies begin to identify and categorise objects using visual features and categorise speech sounds using auditory features by about the age of 9 months (Younger, 2003). Chunking also occurs pre-consciously (e.g. Lamme 2003); we know that people are often aware of changes to a chunked unit like a face, a landscape or a piece of music, but don’t know what has changed – someone has shaved off their moustache, a tree has been felled, the song is a cover version with different instrumentation. In addition, research into visual and auditory processing shows that sensory information initially feeds forward in the brain; a lot of processing occurs before the information reaches the location of working memory in the frontal lobes. So at this level, what we are talking about is an automatic, usually pre-conscious process that we use by default.

knowledge – the organisation of information

Anderson’s paper was written in 1995 – twenty years ago – at about the time the DIKW framework was first proposed, which explains why he doesn’t used the same terminology. He calls the chunked units and production rules ‘units of knowledge’ rather than ‘units of information’ because they are the fundamental low-level units from which higher-level knowledge is formed.

Although Anderson’s model of information processing for low-level units still holds true, what has puzzled researchers in the intervening couple of decades is why that process doesn’t scale up. The way people process low-level ‘units of knowledge’ is logical and rational enough to be accurately modelled using computer software, but when handling large amounts of information, such as the concepts involved in day-to-day life, or trying to comprehend, apply, analyse, synthesise or evaluate it, the human brain goes a bit haywire. People (including experts) exhibit a number of errors and biases in their thinking. These aren’t just occasional idiosyncrasies – everybody shows the same errors and biases to varying extents. Since complex information isn’t inherently different to simple information – there’s just more of it – researchers suspected that the errors and biases were due to the wiring of the brain. Work on judgement and decision-making and on the biological mechanisms involved in processing information at higher levels has demonstrated that brains are indeed wired up differently to computers. The reason is that what has shaped the evolution of the human brain isn’t the need to produce logical, rational solutions to problems, but the need to survive, and overall quick-and-dirty information processing tends to result in higher survival rates than slow, precise processing.

What this means is that Anderson’s information processing principle can be applied directly to low-level units of information, but might not be directly applicable to the way people process information at a higher-level, the way they process facts, for example. Facts are the subject of the next post.

References
Anderson, J (1996) ACT: A simple theory of complex cognition, American Psychologist, 51, 355-365.
Lamme, VAF (2003) Why visual attention and awareness are different, TRENDS in Cognitive Sciences, 7, 12-18.
Lenroot,RK, Gogtay, N, Greenstein, DK, Molloy, E, Wallace, GL, Clasen, LS, Blumenthal JD, Lerch,J, Zijdenbos, AP, Evans, AC, Thompson, PM & Giedd, JN (2007). Sexual dimorphism of brain developmental trajectories during childhood and adolescence. NeuroImage, 36, 1065–1073.
Younger, B (2003). Parsing objects into categories: Infants’ perception and use of correlated attributes. In Rakison & Oakes (eds.) Early Category and Concept development: Making sense of the blooming, buzzing confusion, Oxford University Press.

folk categorisation and implicit assumptions

In his second response to critics, Robert [Peal] tackles the issue of the false dichotomy. He says;

…categorisation invariably simplifies. This can be seen in all walks of life: music genres; architectural styles; political labels. However, though imprecise, categories are vital in allowing discussion to take place. Those who protest over their skinny lattes that they are far too sophisticated to use such un-nuanced language … are more often than not just trying to shut down debate.

Categorisation does indeed simplify. And it does allow discussion to take place. Grouping together things that have features in common and labelling the groups means we can refer to large numbers of thing by their collective labels, rather than having to list all their common features every time we want to discuss them. Whether all categorisation is equally helpful is another matter.

folk categorisation

The human brain categorises things as if it that was what it was built for; not surprising really because grouping things according to their similarities and differences and referring to them by a label is a very effective way of reducing cognitive load.

The things we detect with our senses are categorised by our brains quickly, automatically and pre-verbally (e.g. Haxby, Gobbini & Montgomery, 2004; Greene & Fei-Fei, 2014) – by which I mean that language isn’t necessary in order to form the categories – although language is often involved in categorisation. We also categorise pre-verbally in the sense that babies start to categorise things visually (such as toy trucks and toy animals) at between 7 and 10 months of age, before they acquire language (Younger, 2003). And babies acquire language itself by forming categories.

Once we do start to get the hang of language, we learn about how things are categorised and labelled by the communities we live in; we develop shared ways of categorising things. All human communities have these shared ‘folk’ categorisations, but not all groups categorise the same things in the same way. Nettles and chickweed would have been categorised as vegetables in the middle ages, but to most modern suburban gardeners they are ‘weeds’.

Not all communities agree on the categorisations they use either; political and religious groups are notorious for disagreements about the core features of their categories, who adheres to them and who doesn’t. Nor are folk categorisations equally useful in all circumstances. Describing a politician’s views as ‘right wing’ gives us a rough idea of what her views are likely to be, but doesn’t tell us what she thinks about specific policies.

Biologists have run into problems with folk categorisations too.  Mushrooms/toadstools, frogs/toads and horses/ponies are all folk classifications. So although biologists could distinguish between species of mushrooms/toadstools,  grouping the species together as either mushrooms or toadstools was impossible, because the differences between the folk categories ‘mushrooms’ and ‘toadstools’ aren’t clear enough, so biologists neatly sidestepped the problem by ignoring the folk category distinctions and grouping mushrooms and toadstools together as a phylum. The same principle apples to frogs/toads – so they form an order of their own. Horses and ponies, by contrast, are members of the same subspecies.

Incidentally 18th and 19th century biologists weren’t categorising these organisms just because of an obsessive interest in taxonomy. Their classification had a very practical purpose – to differentiate between species and identify the relationships between them. In a Europe that was fast running out of natural resources, farmers, manufacturers and doctors all had a keen interest in the plants and animals being brought back from far-flung parts of the world by traders, and accurate identification of different species was vital.

In short, folk categories do allow discussion to take place, but they have limitations. They’re not so useful when one needs to get down to specifics – how are particular MPs likely to vote, or is this fungus toxic or not? The catch is in the two words Robert uses to describe categories – ‘though imprecise’. My complaint about his educational categorisation is not categorisation per se, but its imprecision.

‘though imprecise’

The categories people use for their own convenience don’t always have clear-cut boundaries, nor do they map neatly on to the real world. They don’t always map neatly onto other people’s categories either. Eleanor Rosch’s work on prototype theory shed some light on this. What she found was that people’s mental categories have prototypical features – features that the members of the category share – but not all members of the category have all the prototypical features, and category members can have prototypical features to different extents. For example, the prototypical features of most people’s category {birds} are a beak, wings, feathers and being able to fly. A robin has a beak, wings and feathers and is able to fly, so it’s strongly prototypical of the category {birds}. A penguin can’t fly but uses its wings for swimming, so it’s weakly prototypical, although still a bird.

Mushrooms and toadstools have several prototypical features in common, as do frogs and toads, horses and ponies. The prototypical features that differentiate mushrooms from toadstools, frogs from toads and horses from ponies are the ideas that; toadstools are poisonous and often brightly coloured; toads have a warty skin, sometimes containing toxins; and horses are much larger than ponies. Although these differential features are useful for conversational purposes, they are not helpful for more specific ones such as putting edible fungi on your restaurant menu, using a particular toxin for medicinal purposes or breeding characteristics in or out of horses.

traditional vs progressive education

Traditional and progressive education are both types of education, obviously, so they have some prototypical features in common – teachers, learners, knowledge, schools etc. Robert proposes some core features of progressive education that differentiate it from traditional education; it is child-centered, focuses on skills rather than knowledge, sees strict discipline and moral education as oppressive and assumes that socio-economic background dictates success (pp. 5-8). He distilled these features from what’s been said and written about progressive education over the last fifty years, so it’s likely there’s a high degree of consensus on these core themes. The same might not be true for traditional education. Robert defines it only in terms of its core characteristics being the polar opposite of progressive education, although he appears to include in the category ‘traditional’ a list of other more peripheral features including blazers, badges and ties and class rankings.

Robert says “though imprecise, categories are vital in allowing discussion to take place.” No doubt about that, but if the categories are imprecise the discussion can be distinctly unfruitful. A lot of time and energy can be expended trying to figure out precise definitions and how accurately those definitions map onto the real world. Nor are imprecise categories helpful if we want to do something with them other than have a discussion. Categorising education as ‘traditional’ or ‘progressive’ is fine for referring conversationally to a particular teacher’s pedagogical approach or the type of educational philosophy favoured by a government minister, but those constructs are too complex and too imprecise to be of use in research.

implicit assumptions

An implicit assumption is, by definition, an assumption that isn’t made explicit. Implicit assumptions are sneaky things because if they are used in a discussion, people following the argument often overlook the fact that an implicit assumption is being made. An implicit assumption that’s completely wrong can easily slip by unnoticed. Implicit assumptions get even more sneaky; often the people making the argument aren’t aware of their implicit assumptions either. In the case of mushrooms and toadstools, any biologists who tried to group certain types of fungi into one or other of these categories would be on a hiding to nothing because of an implicit, but wrong, assumption that the fungi could be sorted into one or other of these categories.

Robert’s thesis appears to rest on an implicit assumption that because the state education system in the last fifty years has had shortcomings, some of them serious, and because progressive educational ideas have proliferated during the same period, it follows that progressive ideas must be the cause of the lack of effectiveness. This isn’t even the ever-popular ‘correlation equals causality’ error, because as far as I can see, Robert hasn’t actually established a correlation between progressive ideas and educational effectiveness. He can’t compare current traditional and progressive state schools because traditional state schools are a thing of the past. And he can’t compare current progressive state schools with historical traditional state schools because the relevant data isn’t available. Ironically, what data we do have suggest that numeracy and literacy rates have improved overall during this period. The reliability of the figures is questionable because of grade drift, but numeracy and literacy rates have clearly not plummeted.

What he does implicitly compare is state schools that he sees as broadly progressive, with independent schools that he sees as having “withstood the wilder extremes of the [progressive] movement”. The obvious problem with this comparison is that a progressive educational philosophy is not the only difference between the state and independent sectors.

In my previous post, I agreed with Robert that the education system in England leaves much to be desired, but making an implicit assumption that there’s only one cause and that other possible causes can be ignored is a risky approach to policy development. It would be instructive to compare schools that are effective (however you measure effectiveness) with schools that are less effective, to find out how the latter could be improved. But the differences between them could boil down to some very specific issues relating to the quality of teaching, classroom management, availability of additional support or allocation of budgets, rather than whether the schools take a ‘traditional’ or ‘progressive’ stance overall.

References
Greene, MR & Fie-Fie, L (2014).Visual categorization is automatic and obligatory: Evidence from Stroop-like paradigm. Journal of Vision, 14, article 14.
Haxby, J.V., Gobbini, M. I. & Montgomery, K. (2004). Spatial and temporal distribution of face and object representations in the human brain. In M. S. Gazzaniga (Ed.) The Cognitive Neurosciences (3rd edn.). Cambridge, MA: MIT Press.
Kuhl, P. (2004). Early language acquisition:Cracking the speech code. Nature Reviews Neuroscience 5, 831-843.
Younger, B (2003). Parsing objects into categories: Infants’ perception and use of correlated attributes. In Rakison & Oakes (eds.) Early Category and Concept development: Making sense of the blooming, buzzing confusion, Oxford University Press.

mixed methods for teaching reading (1)

Many issues in education are treated as either/or options and the Reading Wars have polarised opinion into synthetic phonics proponents on the one hand and those supporting the use of whole language (or ‘mixed methods’) on the other. I’ve been asked on Twitter what I think of ‘mixed methods’ for teaching reading. Apologies for the length of this reply, but I wanted to explain why I wouldn’t dismiss mixed methods outright and why I have some reservations about synthetic phonics. I wholeheartedly support the idea of using synthetic phonics (SP) to teach children to read. However, I have reservations about some of the assumptions made by SP proponents about the effectiveness of SP and about the quality of the evidence used to justify its use.

the history of mixed methods

As far as I’m aware, when education became compulsory in England in the late 19th century, reading was taught predominantly via letter-sound correspondence and analytic phonics – ‘the cat sat on the mat’ etc. A common assumption was that if people couldn’t read it was usually because they’d never been taught. What was found was that a proportion of children didn’t learn to read despite being taught in the same way as others in the class. The Warnock committee reported that teachers in England at the time were surprised by the numbers of children turning up for school with disabilities or learning difficulties. That resulted in special schools being set up for those with the most significant difficulties with learning. In France Alfred Binet was commissioned to devise a screening test to identify learning difficulties that evolved into the ‘intelligence test’. In Italy, Maria Montessori adapted methods to mainstream education that had been used to teach hearing-impaired children.

Research into acquired reading difficulties in adults generated an interest in developmental problems with learning to read, pioneered by James Hinshelwood and Samuel Orton in the early 20th century. The term developmental dyslexia began as a descriptive label for a range of problems with reading and gradually became reified into a ‘disorder’. Because using the alphabetic principle and analytic phonics clearly wasn’t an effective approach for teaching all children to read, and because of an increased interest in child development, researchers began to look at what adults and children actually did when reading and learning to read, rather than what it had been thought they should do.

What they found was that people use a range of cues (‘mixed methods’) to decode unfamiliar words; letter-sound correspondence, analytic phonics, recognising words by their shape, using key letters, grammar, context and pictures, for example. Educators reasoned that if some children hadn’t learned to read using alphabetic principles and/or analytic phonics, applying the strategies that people actually used when reading new words might be a more effective approach.

This idea, coinciding with an increased interest in child-led pedagogy and a belief that a species-specific genetic blueprint meant that children would follow the same developmental trajectory but at different rates, resulted in the concept of ‘reading-readiness’. The upshot was that no one panicked if children couldn’t read by 7, 9 or 11; they often did learn to read when they were ‘ready’. It’s impossible to compare the long-term outcomes of analytic phonics and mixed methods because the relevant data aren’t available. We don’t know for instance, whether children’s educational attainment suffered more if they got left behind by whole-class analytic phonics, or if they got left alone in schools that waited for them to become ‘reading-ready’.

Eventually, as is often the case, the descriptive observations about how people tackle unfamiliar words became prescriptive. Whole word recognition began to supersede analytic phonics after WW2, and in the 1960s Ken Goodman formalised mixed methods in a ‘whole language’ approach. Goodman was strongly influenced by Noam Chomsky, who believes that the structure underpinning language is essentially ‘hard-wired’ in humans. Goodman’s ideas chimed with the growing social constructivist approach to education that emphasises the importance of meaning mediated by language.

At the same time as whole language approaches were gaining ground, in England the national curriculum and standardised testing were introduced, which meant that children whose reading didn’t keep up with their peers were far more visible than they had been previously, and the complaints that had followed the introduction of whole language in the USA began to be heard here. In addition, the national curriculum appears to have focussed on the mechanics of understanding ‘texts’ rather than on reading books for enjoyment. What has also happened is that with the advent of multi-channel TV and electronic gadgets, reading has nowhere near the popularity it once had as a leisure activity amongst children, so children tend to get a lot less reading practice than they did in the past. These developments suggest that any decline in reading standards might have multiple causes, rather than ‘mixed methods’ being the only culprit.

what do I think about mixed methods?

I think Chomsky has drawn the wrong conclusions about his linguistic theory, so I don’t subscribe to Goodman’s reading theory either. Although meaning is undoubtedly a social construction, it’s more than that. Social constructivists tend to emphasise the mind at the expense of the brain. The mind is such vague concept that you can say more or less what you like about it, but we’re very constrained by how our brains function. I think marginalising the brain is an oversight on the part of social constructivists, and I can’t see how a child can extract meaning from a text if they can’t read the words.

Patricia Kuhl’s work suggests that babies acquire language computationally, from the frequency of sound patterns within speech. This is an implicit process; the baby’s brain detects the sounds and learns the patterns, but the baby isn’t aware of the learning process, nor of phonemes. What synthetic phonics does is to make the speech sounds explicit, develop phonemic awareness and allow children to learn phoneme-grapheme correspondence and how words are constructed.

My reservations about SP are not about the approach per se, but rather about how it’s applied and the reasons assumed to be responsible for its effectiveness. In cognitive terms, SP has three main components;

• phonemic and graphemic discrimination
• grapheme-phoneme correspondence
• building up phonemes/graphemes into words – blending

How efficient children become at these tasks is a function of the frequency of their exposure to the tasks and how easy they find them. Most children pick up the skills with little effort, but anyone who has problems with any or all of the tasks could need considerably more rehearsals. Problems with the cognitive components of SP aren’t necessarily a consequence of ineffective teaching or the child not trying hard enough. Specialist SP teachers will usually be aware of this, but policy-makers, parents, or schools that simply adopt a proprietary SP course might not.

My son’s school taught reading using Jolly Phonics. Most of the children in his class learned to read reasonably quickly. He took 18 months over it. He had problems with each of the three elements of SP. He couldn’t tell the difference between similar-sounding phonemes – i/e or b/d, for example. He couldn’t tell the difference between similar-looking graphemes either – such as b/d, h/n or i/j. As a consequence, he struggled with some grapheme-phoneme correspondences. Even in words where his grapheme-phoneme correspondences were secure, he couldn’t blend more than three letters.

After 18 months of struggling and failing, he suddenly began to read using whole word recognition. I could tell he was doing this because of the errors he was making; he was using initial and final letters and word shape and length as cues. Recognising patterns is what the human brain does for a living and once it’s recognised a pattern it’s extremely difficult to get it to unrecognise it. Brains are so good at recognising patterns they often see patterns that aren’t what they think they are – as in pareidolia or the behaviourists’ ‘superstition’. Once my son could recognise word-patterns, he was reading and there was no way he was going to be persuaded to carry on with all that tedious sounding-out business. He just wanted to get on with reading, and that’s what he did.

[Edited to add: I should point out that the reason the apparent failure of an SP programme to teach my son to read led to me supporting SP rather than dismissing it, was because after conversations with specialist SP teachers, I realised that he hadn’t had enough training in phonemic and graphemic discrimination. His school essentially put the children through the course, without identifying any specific problems or providing additional training that might have made a significant difference for him.]

When I trained as a teacher ‘mixed methods’ included a substantial phonics component – albeit as analytic phonics. I get the impression that the phonics component has diminished over time so ‘mixed methods’ aren’t what they once were. Even if they included phonics, I wouldn’t recommend ‘mixed methods’ prescriptively as an approach to teaching reading. Having said that, I think mixed methods have some validity descriptively, because they reflect the way adults/children actually read. I would recommend the use of SP for teaching reading, but I think some proponents of SP underestimate the way the human brain tends to cobble together its responses to challenges, rather than to follow a neat, straight pathway.

Advocacy of mixed methods and opposition to SP is often based on accurate observations of the strategies children use to read, not on evidence of what teaching methods are most effective. Our own personal observations tend to be far more salient to us than schools we’ve never visited reporting stunning SATs results. That’s why I think SP proponents need to ensure that the evidence they refer to as supporting SP is of a high enough quality to be convincing to sceptics.