the debating society

One of my concerns about the model of knowledge promoted by the Tiger Teachers is that it hasn’t been subjected to sufficient scrutiny.   A couple of days ago on Twitter I said as much.  Jonathan Porter, a teacher at the Michaela Community School, thought my criticism unfair because the school has invited critique by publishing a book and hosting two debating days. Another teacher recommended watching the debate between Guy Claxton and Daisy Christodoulou Sir Ken is right: traditional education kills creativity. She said it may not address my concerns about theory. She was right, it didn’t. But it did suggest a constructive way to extend the Tiger Teachers’ model of knowledge.

the debate

Guy, speaking for the motion and defending Sir Ken Robinson’s views, highlights the importance of schools developing students’ creativity, and answers the question ‘what is creativity?’ by referring to the findings of an OECD study; that creativity emerges from six factors – curiosity, determination, imagination, discipline, craftsmanship and collaboration. Daisy, opposing the motion, says that although she and Guy agree on the importance of creativity and its definition, they differ over the methods used in schools to develop it.

Daisy says Guy’s model involves students learning to be creative by practising being creative, which doesn’t make sense. It’s a valid point. Guy says knowledge is a necessary but not sufficient condition for developing creativity; other factors are involved. Another valid point. Both Daisy and Guy debate the motion but they approach it from very different perspectives, so they don’t actually rigorously test each other’s arguments.

Daisy’s model of creativity is a bottom-up one. Her starting point is how people form their knowledge and how that develops into creativity. Guy’s model, in contrast, is a top-down one; he points out that creativity isn’t a single thing, but emerges from several factors. In this post, I propose that Daisy and Guy are using the same model of creativity, but because Daisy’s focus is on one part and Guy’s on another, their arguments shoot straight past each other, and that in isolation, both perspectives are problematic.

Creativity is a complex construct, as Guy points out. A problem with his perspective is that the factors he found to be associated with creativity are themselves complex constructs. How does ‘curiosity’ manifest itself? Is it the same in everyone or does it vary from person to person? Are there multiple component factors associated with curiosity too? Can we ask the same questions about ‘imagination’? Daisy, in contrast, claims a central role for knowledge and deliberate practice. A problem with Daisy’s perspective is, as I’ve pointed out elsewhere, that her model of knowledge peters out when it comes to the complex cognition Guy refers to. With bit more information, Daisy and Guy could have done some joined-up thinking.  To me, the two models look like the representation below, the grey words and arrows indicating concepts and connections referred to but not explained in detail.

slide1

cognition and expertise

If I’ve understood it correctly, Daisy’s model of creativity is essentially this: If knowledge is firmly embedded in long-term memory (LTM) via lots of deliberate practice and organised into schemas, it results in expertise. Experts can retrieve their knowledge from LTM instantly and can apply it flexibly. In short, creativity is a feature of expertise.

Daisy makes frequent references to research; what scientists think, half a century of research, what all the research has shown. She names names; Herb Simon, Anders Ericsson, Robert Bjork. She reports research showing that expert chess players, football players or musicians don’t practise whole games or entire musical works – they practise short sequences repeatedly until they’ve overlearned them. That’s what enables experts to be creative.

Daisy’s model of expertise is firmly rooted in an understanding of cognition that emerged from artificial intelligence (AI) research in the 1950s and 1960s. At the time, researchers were aware that human cognition was highly complex and often seemed illogical.  Computer science offered an opportunity to find out more; by manipulating the data and rules fed into a computer, researchers could test different models of cognition that might explain how experts thought.

It was no good researchers starting with the most complex illogical thinking – because it was complex and illogical. It made more sense to begin with some simpler examples, which is why the AI researchers chose chess, sport and music as domains to explore. Expertise in these domains looks pretty complex, but the complexity has obvious limits because chess, sport and music have clear, explicit rules. There are thousands of ways you can configure chess pieces or football players and a ball during a game, but you can’t configure them any-old-how because chess and football have rules. Similarly, a musician can play a piece of music in many different ways, but they can’t play it any-old-how because then it wouldn’t be the same piece of music.

In chess, sport and music, experts have almost complete knowledge, clear explicit rules, and comparatively low levels of uncertainty.   Expert geneticists, doctors, sociologists, politicians and historians, in contrast, often work with incomplete knowledge, many of the domain ‘rules’ are unknown, and uncertainty can be very high. In those circumstances, expertise  involves more than simply overlearning a great many facts and applying them flexibly.

Daisy is right that expertise and creativity emerge from deliberate practice of short sequences – for those who play chess, sport or music. Chess, soccer and Beethoven’s piano concerto No. 5 haven’t changed much since the current rules were agreed and are unlikely to change much in future. But domains like medicine, economics and history still periodically undergo seismic shifts in the way whole areas of the domains are structured, as new knowledge comes to light.

This is the point at which Daisy’s and Guy’s models of creativity could be joined up.  I’m not suggesting some woolly compromise between the two. What I am suggesting is that research that followed the early AI work offers the missing link.

I think the missing link is the schema.   Daisy mentions schemata (or schemas if you prefer) but only in terms of arranging historical events chronologically. Joe Kirby in Battle Hymn of the Tiger Teachers also recognises that there can be an underlying schema in the way students are taught.  But the Tiger Teachers don’t explore the idea of the schema in any detail.

schemas, schemata

A schema is the way people mentally organise their knowledge. Some schemata are standardised and widely used – such as the periodic table or multiplication tables. Others are shared by many people, but are a bit variable – such as the Linnaean taxonomy of living organisms or the right/left political divide. But because schemata are constructed from the knowledge and experience of the individual, some are quite idiosyncratic. Many teachers will be familiar with students all taught the same material in the same way, but developing rather different understandings of it.

There’s been a fair amount of research into schemata. The schema was first proposed as a psychological concept by Jean Piaget*. Frederic Bartlett carried out a series of experiments in the 1930s demonstrating that people use schemata, and in the heyday of AI the concept was explored further by, for example, David Rumelhart, Marvin Minsky and Robert Axelrod. It later extended into script theory (Roger Schank and Robert Abelson), and how people form prototypes and categories (e.g. Eleanor Rosch, George Lakoff). The schema might be the missing link between Daisy’s and Guy’s models of creativity, but both models stop before they get there. Here’s how the cognitive science research allows them to be joined up.

Last week I finally got round to reading Jerry Fodor’s book The Modularity of Mind, published in 1983. By that time, cognitive scientists had built up a substantial body of evidence related to cognitive architecture. Although the evidence itself was generally robust, what it was saying about the architecture was ambiguous. It appeared to indicate that cognitive processes were modular, with specific modules processing specific types of information e.g. visual or linguistic. It also indicated that some cognitive processes operated across the board, e.g. problem-solving or intelligence. The debate had tended to be rather polarised.  What Fodor proposed was that cognition isn’t a case of either-or, but of both-and; that perceptual and linguistic processing is modular, but higher-level, more complex cognition that draws on modular information, is global.   His prediction turned out to be pretty accurate, which is why Daisy’s and Guy’s models can be joined up.

Fodor was familiar enough with the evidence to know that he was very likely to be on the right track, but his model of cognition is a complex one, and he knew he could have been wrong about some bits of it. So he deliberately exposes his model to the criticism of cognitive scientists, philosophers and anyone else who cared to comment, because that’s how the scientific method works. A hypothesis is tested. People try to falsify it. If they can’t, then the hypothesis signposts a route worth exploring further. If they can, then researchers don’t need to waste any more time exploring a dead end.

joined-up thinking

Daisy’s model of creativity has emerged from a small sub-field of cognitive science – what AI researchers discovered about expertise in domains with clear, explicit rules. She doesn’t appear to see the need to explore schemata in detail because the schemata used in chess, sport and music are by definition highly codified and widely shared.  That’s why the AI researchers chose them.  The situation is different in the sciences, humanities and arts where schemata are of utmost importance, and differences between them can be the cause of significant conflict.  Guy’s model originates in a very different sub-field of cognitive science – the application of high-level cognitive processes to education. Schemata are a crucial component; although Guy doesn’t explore them in this debate, his previous work indicates he’s very familiar with the concept.

Since the 1950s, cognitive science has exploded into a vast research field, encompassing everything from the dyes used to stain brain tissue, through the statistical analysis of brain scans, to the errors and biases that affect judgement and decision-making by experts. Obviously it isn’t necessary to know everything about cognitive science before you can apply it to teaching, but if you’re proposing a particular model of cognition, having an overview of the field and inviting critique of the model would help avoid unnecessary errors and disagreements.  In this debate, I suggest schemata are noticeable by their absence.

*First use of schema as a psychological concept is widely attributed to Piaget, but I haven’t yet been able to find a reference.

Advertisements

The Tiger Teachers and cognitive science

Cognitive science is a key plank in the Tiger Teachers’ model of knowledge. If I’ve understood it properly the model looks something like this:

Cognitive science has discovered that working memory has limited capacity and duration, so pupils can’t process large amounts of novel information. If this information is secured in long-term memory via spaced, interleaved practice, students can recall it instantly whenever they need it, freeing up working memory for thinking.

What’s wrong with that? Nothing, as it stands. It’s what’s missing that’s the problem.

Subject knowledge

One of the Tiger Teachers’ beefs about the current education system is its emphasis on transferable skills. They point out that skills are not universally transferable, many are subject-specific, and in order to develop expertise in higher-level skills novices need a substantial amount of subject knowledge. Tiger Teachers’ pupils are expected to pay attention to experts (their teachers) and memorise a lot of facts before they can comprehend, apply, analyse, synthesise or evaluate. The model is broadly supported by cognitive science and the Tiger Teachers apply it rigorously to children. But not to themselves, it seems.

For most Tiger Teachers cognitive science will be an unfamiliar subject area. That makes them (like most of us) cognitive science novices. Obviously they don’t need to become experts in cognitive science to apply it to their educational practice, but they do need the key facts and concepts and a basic overview of the field. The overview is important because they need to know how the facts fit together and the limitations of how they can be applied.   But with a few honourable exceptions (Daisy Christodoulou, David Didau and Greg Ashman spring to mind – apologies if I’ve missed anyone out), many Tiger Teachers don’t appear to have even thought about acquiring expertise, key facts and concepts or an overview. As a consequence facts are misunderstood or overlooked, principles from other knowledge domains are applied inappropriately, and erroneous assumptions made about how science works. Here are some examples (page numbers refer to Battle Hymn of the Tiger Teachers):

It’s a fact…

“Teachers’ brains work exactly the same way as pupils’” (p.177). No they don’t. Cognitive science (ironically) thinks that children’s brains begin by forming trillions of connections (synapses). Then through to early adulthood, synapses that aren’t used get pruned, which makes information processing more efficient. (There’s a good summary here.)  Pupils’ brains are as different to teachers’ brains as children’s bodies are different to adults’ bodies. Similarities don’t mean they’re identical.

Then there’s working memory. “As the cognitive scientist Daniel Willingham explains, we learn by transferring knowledge from the short-term memory to the long term memory” (p177). Well, kind of – if you assume that what Willingham explicitly describes as “just about the simplest model of the mind possible”  is an exhaustive model of memory. If you think that, you might conclude, wrongly, “the more knowledge we have in long-term memory, the more space we have in our working memory to process new information” (p.177). Or that “information cannot accumulate into long-term memory while working memory is being used” (p.36).

Long-term memory takes centre stage in the Tiger Teachers’ model of cognition. The only downside attributed to it is our tendency to forget things if we don’t revisit them (p.22). Other well-established characteristics of long-term memory – its unreliability, errors and biases – are simply overlooked, despite Daisy Christodoulou’s frequent citation of Daniel Kahneman whose work focused on those flaws.

With regard to transferable skills we’re told “cognitive scientist Herb Simon and his colleagues have cast doubt on the idea that there are any general or transferable cognitive skills” (p.17), when what they actually cast doubt on is the ideas that all skills are transferable or that none are.

The Michaela cognitive model is distinctly reductionist; “all there is to intelligence is the simple accrual and tuning of many small units of knowledge that in total produce complex cognition” (p.19). Then there’s “skills are simply just a composite of sequential knowledge – all skills can be broken down to irreducible pieces of knowledge” (p.161).

The statement about intelligence is a direct quote from John Anderson’s paper ‘A Simple Theory of Complex Cognition’ but Anderson isn’t credited, so you might not know he was talking about simple encodings of objects and transformations, and that by ‘intelligence’ he means how ants behave rather than IQ. I’ve looked at Daisy Christodoulou’s interpretation of Anderson’s model here.

The idea that intelligence and skills consist ‘simply just’ of units of knowledge ignores Anderson’s procedural rules and marginalises the role of the schema – the way people configure their knowledge. Joe Kirby mentions “procedural and substantive schemata” (p. 17), but seems to see them only in terms of how units of knowledge are configured for teaching purposes; “subject content knowledge is best organised into the most memorable schemata … chronological, cumulative schemata help pupils remember subject knowledge in the long term” (p.21). The concept of schemata as the way individuals, groups or entire academic disciplines configure their knowledge, that the same knowledge can be configured in different ways resulting in different meanings, or that configurations sometimes turn out to be profoundly wrong, doesn’t appear to feature in the Tiger Teachers’ model.

Skills: to transfer or not to transfer?

Tiger Teachers see higher-level skills as subject-specific. That hasn’t stopped them applying higher-level skills from one domain inappropriately to another. In her critique of Bloom’s taxonomy, Daisy Christodoulou describes it as a ‘metaphor’ for the relationship between knowledge and skills. She refers to two other metaphors; ED Hirsch’s scrambled egg and Joe Kirby’s double helix (Seven Myths p.21).  Daisy, Joe and ED teach English, and metaphors are an important feature in English literature. Scientists do use metaphors, but they use analogies more often, because in the natural world patterns often repeat themselves at different levels of abstraction. Daisy, Joe and ED are right to complain about Bloom’s taxonomy being used to justify divorcing skills from knowledge. And the taxonomy itself might be wrong or misleading.   But it is a taxonomy and it is based on an important scientific concept – levels of abstraction – so should be critiqued as such, not as if it were a device used by a novelist.

Not all evidence is equal

A major challenge for novices is what criteria they can use to decide whether or not factual information is valid. They can’t use their overview of a subject area if they don’t have one. They can’t weigh up one set of facts against another if they don’t know enough facts. So Tiger Teachers who are cognitive science novices have to fall back on the criteria ED Hirsch uses to evaluate psychology – the reputation of researchers and consensus. Those might be key criteria in evaluating English literature, but they’re secondary issues for scientific research, and for good reason.

Novices then have to figure out how to evaluate the reputation of researchers and consensus. The Tiger Teachers struggle with reputation. Daniel Willingham and Paul Kirschner are cited more frequently than Herb Simon, but with all due respect to Willingham and Kirschner, they’re not quite in the same league. Other key figures don’t get a mention.  When asked what was missing from the Tiger Teachers’ presentations at ResearchEd, I suggested, for starters, Baddeley and Hitch’s model of working memory. It’s been a dominant model for 40 years and has the rare distinction of being supported by later biological research. But it’s mentioned only in an endnote in Willingham’s Why Don’t Students Like School and in Daisy’s Seven Myths about Education. I recommended inviting Alan Baddeley to speak at ResearchEd – he’s a leading authority on memory after all.   One of the teachers said he’d never even heard of him. So why was that teacher doing a presentation on memory at a national education conference?

The Tiger Teachers also struggle with consensus. Joe Kirby emphasises the length of time an idea has been around and the number of studies that support it (pp.22-3), overlooking the fact that some ideas can dominate a field for decades, be supported by hundreds of studies and then turn out to be profoundly wrong; theories about how brains work are a case in point.   Scientific theory doesn’t rely on the quantity of supporting evidence; it relies on an evaluation of all relevant evidence – supporting and contradictory – and takes into account the quality of that evidence as well.  That’s why you need a substantial body of knowledge before you can evaluate it.

The big picture

For me, Battle Hymn painted a clearer picture of the Michaela Community School than I’d been able to put together from blog posts and visitors’ descriptions. It persuaded me that Michaela’s approach to behaviour management is about being explicit and consistent, rather than simply being ‘strict’. I think having a week’s induction for new students and staff (‘bootcamp’) is a great idea. A systematic, rigorous approach to knowledge is vital and learning by rote can be jolly useful. But for me, those positives were all undermined by the Tiger Teachers’ approach to their own knowledge.  Omitting key issues in discussions of Rousseau’s ideas, professional qualifications or the special circumstances of schools in coastal and rural areas, is one thing. Pontificating about cognitive science and then ignoring what it says is quite another.

I can understand why Tiger Teachers want to share concepts like the limited capacity of working memory and skills not being divorced from knowledge.  Those concepts make sense of problems and have transformed their teaching.  But for many Tiger Teachers, their knowledge of cognitive science appears to be based on a handful of poorly understood factoids acquired second or third hand from other teachers who don’t have a good grasp of the field either. Most teachers aren’t going to know much about cognitive science; but that’s why most teachers don’t do presentations about it at national conferences or go into print to share their flimsy knowledge about it.  Failing to acquire a substantial body of knowledge about cognitive science makes its comprehension, application, analysis, synthesis and evaluation impossible.  The Tiger Teachers’ disregard for principles they claim are crucial is inconsistent, disingenuous, likely to lead to significant problems, and sets a really bad example for pupils. The Tiger Teachers need to re-write some of the lyrics of their Battle Hymn.

References

Birbalsingh, K (2016).  Battle Hymn of the Tiger Teachers: The Michaela Way.  John Catt Educational.

Christodoulou, D (2014).  Seven Myths about Education.  Routledge.

learning styles: how does Daniel Willingham see them?

In 2005, Daniel Willingham used his “Ask the cognitive scientist” column in American Educator to answer the question “What does cognitive science tell us about the existence of visual, auditory, and kinesthetic learners and the best way to teach them?

The question refers to the learning styles model used in many schools which assumes that children learn best using their preferred sensory modality – visual, auditory or kinaesthetic. Fleming’s VARK model, and the more common VAK variant, frame learning styles in terms of preferences for learning in a particular sensory modality. Other learning styles models are framed in terms of individuals having other stable traits in respect of the way they learn. Willingham frames the VAK model in terms of abilities.

He summarises the relevant cognitive science research like this; “children do differ in their abilities with different modalities, but teaching the child in his best modality doesn’t affect his educational achievement” and goes on to discuss what cognitive science has to say about sensory modalities and memory. Willingham’s response is informative about the relevant research, but I think it could be misleading. For two reasons; he doesn’t differentiate between groups and individuals, and doesn’t adequately explain the role of sensory modalities in memory.

groups and individuals

In the previous post I mentioned the challenge to researchers posed by differences at the population, group and individual levels. Willingham’s summary of the research begins at the population level “children do differ in their abilities with different modalities” but then shifts to the individual level “but teaching the child in his best modality doesn’t affect his educational achievement” [my emphasis].

Even if Willingham’s choice of words is merely a matter of style, it inadvertently conflates findings at the group and individual levels. Group averages tell you what you need to know if you’re interested in broad pedagogical approaches or educational policy; in the case of learning styles, there’s no robust evidence warranting their use as a general approach in teaching. It doesn’t follow that individual children don’t have a ‘best’ (or more likely ‘worst’) modality, nor that they can’t benefit from learning in a particular modality. For example, Picture Exchange Communication System (PECS) and sign languages are the only way some children can communicate effectively and ‘talking books’ gives others access to literature that would otherwise be out of their reach. On his learning styles FAQ page, Willingham claims this is a matter of ‘ability’ rather than ‘style’; but ability is likely to have an impact on preference.

memory and modality

Willingham goes on to explain “a few things that cognitive scientists know about modalities”. His first claim is that “memory is usually stored independent of any modality” [Willingham’s emphasis]. “You typically store memories in terms of meaning — not in terms of whether you saw, heard, or physically interacted with the information”.

He supports this assertion with a finding from research into episodic memory – that whilst people are good at remembering the gist of a story, they tend to be hazy when it comes to specific details. His claim appears to be further supported by research into witness testimony. People might accurately remember a car crashing into a lamppost, but misremember the colour of the car; they correctly recall the driver behaving in an aggressive manner, but are wrong about the words she uttered.

Willingham then extends the role of meaning to the facet of memory that deals with facts and knowledge – semantic memory. He says “the vast majority of educational content is stored in terms of meaning and does not rely on visual, auditory, or kinesthetic memory” and “teachers almost always want students to remember what things mean, not what they look like or sound like”. He uses the example ‘a fire requires oxygen to burn’ and says “the initial experience by which you learned this fact may have been visual (watching a flame go out under a glass) or auditory (hearing an explanation), but the resulting representation of that knowledge in your mind is neither visual nor auditory.” Certainly the idea of a fire requiring oxygen to burn might be neither visual nor auditory, but how many students will not visualise flames being extinguished under a glass when they recall this fact?

substitute modalities

Willingham’s second assertion about memory and sensory modalities is that “the different visual, auditory, and meaning-based representations in our minds cannot serve as substitutes for one another”. He cites a set of experiments reported by Dodson and Shimamura (2000). In the experiments a list of words was read to participants by either a man or a woman. Participants then listened to a second list and were asked to judge which of the words had been in the first list. They were also asked whether a man or woman had spoken the word the first time round. People were five times better at remembering who spoke an item if a test word was read by the same voice than if it was read by the alternative voice. But mismatching the voices didn’t make a difference to the number of words that were recognised.

Dodson and Shimamura see the study as demonstrating that memory is highly susceptible to sensory cues. But Willingham’s conclusion is different; “this experiment indicates that subjects do store auditory information, but it only helps them remember the part of the memory that is auditory — the sound of the voice — and not the word itself, which is stored in terms of its meaning.” This is a rather odd conclusion, given that almost all the words in the experiments were spoken, so auditory memory must have been involved in recognising the words as well as identifying the gender of the speaker. I couldn’t see how the study supported Willingham’s assertion about substitute modalities. And substitute modalities are widely used and used very effectively; writing, sign language and lip-reading are all visual/kinaesthetic substitutes for speech in the auditory modality.

little difference in the classroom

Willingham’s third assertion is “children probably do differ in how good their visual and auditory memories are, but in most situations, it makes little difference in the classroom”. That’s a fair conclusion given the findings of reviews of learning styles studies. He also points out that studies of mental imagery suggest that paying attention to the modality best suited to the content of what’s being taught, rather than the student’s ‘best’ modality, is more likely to help students understand and remember.

the meaning of meaning

Meaning is one of those rather fuzzy words that people use in different ways. It’s widely used to denote the relationship between a symbol and the entity the symbol represents. You could justify talking about memory in terms of meaning in the sense that memory consists of our representations of entities rather than the entities themselves, but I don’t think that’s what Willingham is getting at. I think when he uses the term meaning he’s referring to schemas.

The sequence of a series of events, the gist of a story and the connections between groups of facts are all schemas. There’s no doubt that in the case of complex memories, most people focus on the schema rather than the detail. And teachers do want students to remember the deep structure schemas linking facts rather than just the surface level details. But our memories of chains of events, the plots of stories and factual information are quite clearly not “independent of any modality”. Witnesses who saw a car careering down a road at high speed, collide with a lamppost and the driver emerge swearing at shocked onlookers, might focus on the meaning of that series of events, but they must have some sensory representation of the car and the driver’s voice in order to recall those meaningful events. And how could we recall the narrative of Hansel and Gretel without a sensory representation of two children in a forest, or think about a fire ceasing to burn in the absence of oxygen without a sensory representation of flames and then no flames?

I found it difficult to get a clear picture of Willingham’s conceptual model of memory. When he says “the mind is capable of storing memories in a number of different formats”, and “some memories are stored visually, some auditorily, and some in terms of meaning“, one could easily get the impression that memory is neatly compartmentalised, with ‘meaning’ as one of the compartments. That impression wouldn’t be accurate.

mechanisms of memory

In the brain, sensory information (our only source of information about the outside world) is carried in networks of neurons – brain cells. The pattern of activation in the neural networks forms the representations of both real-time sensory input and of what we remember. It’s like the way an almost infinite number of images can be displayed on a computer screen using a limited number of pixels. It’s true that sensory information is initially processed in areas of the brain dedicated to specific sensory modalities. But those streams of information begin to be integrated quite near the beginning of their journey through the brain, and are rapidly brought together to form a bigger picture of what’s happening that can be compared to representations we’ve formed previously – what we call memory.

The underlying biological mechanism appears to be essentially the same for all sensory modalities and for all types of memory – whether they are of stories, sequences of events, facts about fire, or, to cite Willingham’s examples, of Christmas trees, peas, or Clinton’s or Bush’s voice. ‘Meaning’ as far as the brain is concerned, is about associations – which neurons are activating which other neurons and therefore which representations are being activated. Whether we remember the gist of a story, a fact about fire, or what a Christmas tree or frozen pea looks like, we’re activating patterns of neurons that represent information associated with those events, facts or objects.

Real life experiences usually involve incoming information in multiple sensory modalities. We very rarely encounter the world via only one sensory domain and never in terms of ‘meaning’ only – how would we construct that meaning without our senses being involved? Having several sensory channels increases the amount of information we get from the outside world, and increases the likelihood of our accessing memories. A whiff of perfume or a fragment of music can remind us vividly of a particular event or can trigger a chain of factual associations. Teachers are indeed focused on the ‘meaning’ of what they teach, but meaning isn’t divorced from sensory modalities. Indeed, what things look like is vitally important in biology, chemistry and art. And what they sound like is crucial for drama, poetry or modern foreign languages.

In his American Educator piece, Willingham agrees that “children do differ in their abilities with different modalities“.  But by 2008 he was claiming in a video presentation that Learning Styles Don’t Exist. The video made a big impression on teacher Tom Bennett. He says it “explains the problems with the theory so clearly that even dopey old me can get my head around it”.

Tom’s view of learning styles is the subject of the next post.

References
Dodson, C.S. and Shimamura, A.P. (2000). Differential effects of cue dependency on item and source memory. Journal of Experimental Psychology: Learning, Memory and Cognition, 26, 1023-1044.

Willingham, D (2005). Ask the cognitive scientist: Do visual, auditory, and kinesthetic learners need visual, auditory, and kinesthetic instruction? American Educator, Summer.

seven myths about education – what’s missing?

Old Andrew has raised a number of objections to my critique of Seven Myths about Education. In his most recent comment on my previous (and I had hoped, last) post about it, he says I should be able to easily identify evidence that shows ‘what in the cognitive psychology Daisy references won’t scale up’.

One response would be to provide a list of references showing step-by-step the problems that artificial intelligence researchers ran into. That would take me hours, if not days, because I would have to trawl through references I haven’t looked at for over 20 years. Most of them are not online anyway because of their age, which means Old Andrew would be unlikely to be able to access them.

What is more readily accessible is information about concepts that have emerged from those problems, for example; personal construct theory, schema theory, heuristics and biases, bounded rationality and indexing, connectionist models of cognition and neuroconstructivism. Unfortunately, none of the researchers says “incidentally, this means that students might not develop the right schemata when they commit facts to long-term memory” or “the implications for a curriculum derived from cultural references are obvious”, because they are researching cognition not education, and probably wouldn’t have anticipated anyone suggesting either of these ideas. Whether Old Andrew sees the relevance of these emergent issues or not is secondary, in my view, to how Daisy handles evidence in her book.

concepts and evidence

In the last section of her chapter on Myth 1, Daisy takes us through the concepts of the limited capacity of working memory and chunking. These are well-established, well-tested hypotheses and she cites evidence to support them.

concepts but no evidence

Daisy also appears to introduce two hypotheses of her own. The first is that “we can summon up the information from long-term memory to working memory without imposing a cognitive load” (p.19). The second is that the characteristics of chunking can be extrapolated to all facts, regardless of how complex or inconsistent they might be; “So, when we commit facts to long-term memory they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition” (p.20). The evidence she cites to support this extrapolation is Anderson’s paper – the one about simple, consistent information. I couldn’t find any other evidence cited to support either idea.

evidence but no concepts

Daisy does cite Frantz’s paper about Simon’s work on intuition. Two important concepts of Simon’s that Daisy doesn’t mention but Frantz does, are bounded rationality and the idea of indexing.

Bounded rationality refers to the fact that people can only make sense of the information they have. This supports Daisy’s premise that knowledge is necessary for understanding. But it also supports Freire’s complaint about which facts were being presented to Brazilian schoolchildren. Bounded rationality is also relevant to the idea of the breadth of a curriculum being determined by the frequency of cultural references. Simon used it to challenge economic and political theory.

Simon also pointed out that not only do experts have access to more information than novices do, they can access it more quickly because of their mental cross-indexing, ie the schemata that link relevant information. Rapid speed of access reduces cognitive load, but it doesn’t eliminate it. Chess experts can determine the best next move within seconds, but for most other experts, their knowledge is considerably more complex and less well-defined. A surgeon or an engineer is likely to take days rather than seconds to decide on the best procedure or design to resolve a difficult problem. That implies that quite a heavy cognitive load is involved.

Daisy does mention schemata but doesn’t go into detail about how they are formed or how they influence thinking and understanding. She refers to deep learning in passing but doesn’t tackle the issue Willingham raises about students’ problems with deep structure.

burden of proof

Old Andrew appears to be suggesting that I should assume that Daisy’s assertions are valid unless I can produce evidence to refute them. The burden of proof for a theory usually rests with the person making the claims, for obvious reasons. Daisy cites evidence to support some of her claims, but not all of them. She doesn’t evaluate that evidence by considering its reliability or validity or by taking into account contradictory evidence.

If Daisy had written a book about her musings on cognitive psychology and education, or about how findings from cognitive psychology had helped her teaching, I wouldn’t be writing this. But that’s not what she’s done. She’s used theory from one knowledge domain to challenge theory in another. That can be a very fruitful strategy; the application of game theory and ecological systems theory has transformed several fields. But it’s not helpful simply to take a few concepts out of context from one domain and apply them out of context to another domain.

The reason is that theoretical concepts aren’t free-standing; they are embedded in a conceptual framework. If you’re challenging theory with theory, you need to take a long hard look at both knowledge domains first to get an idea of where particular concepts fit in. You can’t just say “I’m going to apply the concepts of chunking and the limited capacity of working memory to education, but I shan’t bother with schema theory or bounded rationality or heuristics and biases because I don’t think they’re relevant.” Well, you can say that, but it’s not a helpful way to approach problems with learning, because all of these concepts are integral to human cognition. Students don’t leave some of them in the cloakroom when they come into class.

On top of that, the model for pedagogy and the curriculum that Daisy supports is currently influencing international educational policy. If the DfE considers the way evidence has been presented by Hirsch, Willingham and presumably Daisy, as ‘rigorous’, as Michael Gove clearly did, then we’re in trouble.

For Old Andrew’s benefit, I’ve listed some references. Most of them are about things that Daisy doesn’t mention. That’s the point.

references

Axelrod, R (1973). Schema Theory: An Information Processing Model of Perception and Cognition, The American Political Science Review, 67, 1248-1266.
Elman, J et al (1998). Rethinking Innateness: Connectionist Perspective on Development. MIT Press.
Frantz, R (2003). Herbert Simon. Artificial intelligence as a framework for understanding intuition, Journal of Economic Psychology, 24, 265–277.
Kahneman, D., Slovic, P & Tversky A (1982). Judgement under Uncertainty: Heuristics and Biases. Cambridge University Press.
Karmiloff-Smith, A (2009). Nativism Versus Neuroconstructivism: Rethinking the Study of
Developmental Disorders. Developmental Psychology, 45, 56–63.
Kelly, GA (1955). The Psychology of Personal Constructs. New York: Norton.

seven myths about education: deep structure

deep structure and understanding

Extracting information from data is crucially important for learning; if we can’t spot patterns that enable us to identify changes and make connections and predictions, no amount of data will enable us to learn anything. Similarly, spotting patterns within and between facts enables us to identify changes and connections and make predictions will help us understand how the world works. Understanding is a concept that crops up a lot in information theory and education. Several of the proposed hierarchies of knowledge have included the concept of understanding – almost invariably at or above the knowledge level of the DIKW pyramid. Understanding is often equated with what’s referred to as the deep structure of knowledge. In this post I want to look at deep structure in two contexts; when it involves a small number of facts, and when it involves a very large number, as in an entire knowledge domain.

When I discussed the DIKW pyramid, I referred to information being extracted from a ‘lower’ level of abstraction to form a ‘higher’ one. Now I’m talking about ‘deep’ structure. What’s the difference, if any? The concept of deep structure comes from the field of linguistics. The idea is that you can say the same thing in different ways; the surface features of what you say might be different, but the deep structure of the statements could still be the same. So the sentences ‘the cat is on the mat’ and ‘the mat is under the cat’ have different surface features but the same deep structure. Similarly, ‘the dog is on the box’ and ‘the box is under the dog’ share the same deep structure. From an information-processing perspective the sentences about the dog and the cat share the same underlying schema.

In the DIKW knowledge hierarchy, extracted information is at a ‘higher’ level, not a ‘deeper’ one. The two different terminologies are used because the concepts of ‘higher’ level extraction of information and ‘deep’ structure have different origins, but essentially they are the same thing. All you need to remember is that in terms of information-processing ‘high’ and ‘deep’ both refer to the same vertical dimension – which term you use depends on your perspective. Higher-level abstractions, deep structure and schemata refer broadly to the same thing.

deep structure and small numbers of facts

Daniel Willingham devotes an entire chapter of his book Why don’t students like school? to the deep structure of knowledge when addressing students’ difficulty in understanding abstract ideas. Willingham describes mathematical problems presented in verbal form that have different surface features but the same deep structure – in his opening example they involve the calculation of the area of a table top and of a soccer pitch (Willingham, p.87). What he is referring to is clearly the concept of a schema, though he doesn’t call it that.

Willingham recognises that students often struggle with deep structure concepts and recommends providing them with many examples and using analogies they’re are familiar with. These strategies would certainly help, but as we’ve seen previously, because the surface features of facts aren’t consistent in terms of sensory data, students’ brains are not going to spot patterns automatically and pre-consciously in the way they do with consistent low-level data and information. To the human brain, a cat on a mat is not the same as a dog on a box. And a couple trying to figure out whether a dining table would be big enough involves very different sensory data to that involved in a groundsman working out how much turf will be needed for a new football pitch.

Willingham’s problems involve several levels of abstraction. Note that the levels of abstraction only provide an overall framework, they’re not set in stone; I’ve had to split the information level into two to illustrate how information needs to be extracted at several successive levels before students can even begin to calculate the area of the table or the football pitch. The levels of abstraction are;

• data – the squiggles that make up letters and the sounds that make up speech
• first-order information – letters and words (chunked)
• second-order information – what the couple is trying to do and what the groundsman is trying to do (not chunked)
• knowledge – the deep structure/schema underlying each problem.

To anyone familiar with calculating area, the problems are simple ones; to anyone unfamiliar with the schema involved, they impose a high cognitive load because the brain is trying to juggle information about couples, tables, groundsmen and football pitches and can’t see the forest for the trees. Most brains would require quite a few examples before they had enough information to be able to spot the two patterns, so it’s not surprising that students who haven’t had much practical experience of buying tables, fitting carpets, painting walls or laying turf take a while to cotton on.

visual vs verbal representations

What might help students further is making explicit the deep structure of groups of facts with the help of visual representations. Visual representations have one huge advantage over verbal representations. Verbal representations, by definition, are processed sequentially – you can only say, hear or read one word at a time. Most people can process verbal information at the same rate at which they hear it or read it, so most students will be able to follow what a teacher is saying or what they are reading, even if it takes a while to figure out what the teacher or the book are getting at. However, if you can’t process verbal information quickly enough, can’t recall earlier sentences whilst processing the current one, miss a word, or don’t understand a crucial word or concept, it will be impossible to make sense of the whole thing. In visual representations, you can see all the key units of information at a glance, most of the information can be processed in parallel and the underlying schema is more obvious.

The concept of calculating area lends itself very well to visual representation; it is a geometry problem after all. Getting the students to draw a diagram of each problem would not only focus their attention on the deep structure rather than its surface features, it would also demonstrate clearly that problems with different surface features can have the same underlying deep structure.

It might not be so easy to make visual representations of the deep structure of other groups of facts, but it’s an approach worth trying because it makes explicit the deep structure of the relationship between the facts. In Seven Myths about Education, one of Daisy’s examples of a fact is the date of the battle of Waterloo. Battles are an excellent example of deep structure/schemata in action. There is a large but limited number of ways two opposing forces can position themselves in battle, whoever they are and whenever and wherever they are fighting, which is why ancient battles are studied by modern military strategists. The configurations of forces and what subsequent configurations are available to them are very similar to the configurations of pieces and next possible moves in chess. Of course chess began as a game of military strategy – as a visual representation of the deep structure of battles.

Deep structure/underlying schemata are a key factor in other domains too. Different atoms and different molecules can share the same deep structure in their bonding and reactions and chemists have developed formal notations for representing that visually; the deep structure of anatomy and physiology can be the same for many different animals – biologists rely heavily on diagrams to convey deep structure information. Historical events and the plots of plays can follow similar patterns even if the events occurred or the plays were written thousands of years apart. I don’t know how often history or English teachers use visual representations to illustrate the deep structure of concepts or groups of facts, but it might help students’ understanding.

deep structure of knowledge domains

It’s not just single facts or small groups of facts that have a deep structure or underlying schema. Entire knowledge domains have a deep structure too, although not necessarily in the form of a single schema; many connected schemata might be involved. How they are connected will depend on how experts arrange their knowledge or how much is known about a particular field.

Making students aware of the overall structure of a knowledge domain – especially if that’s via a visual representation so they can see the whole thing at once – could go a long way to improving their understanding of whatever they happen to be studying at any given time. It’s like the difference between Google Street View and Google Maps. Google Street View is invaluable if you’re going somewhere you’ve never been before and you want to see what it looks like. But Google Maps tells you where you are in relation to where you want to be – essential if you want to know how to get there. Having a mental map of an entire knowledge domain shows you how a particular fact or group of facts fits in to the big picture, and also tells you how much or how little you know.

Daisy’s model of cognition

Daisy doesn’t go into detail about deep structure or schemata. She touches on these concepts only a few times; once in reference to forming a chronological schema of historical events, then when referring to Joe Kirby’s double-helix metaphor for knowledge and skills and again when discussing curriculum design.

I don’t know if Daisy emphasises facts but downplays deep structure and schemata to highlight the point that the educational orthodoxy does essentially the opposite, or whether she doesn’t appreciate the importance of deep structure and schemata compared to surface features. I suspect it’s the latter. Daisy doesn’t provide any evidence to support her suggestion that simply memorising facts reduces cognitive load when she says;

“So when we commit facts to long-term memory, they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition”(p.20).

The examples she refers to immediately prior to this assertion are multiplication facts that meet the criteria for chunking – they are simple and highly consistent and if they are chunked they’d be treated as one item by working memory. Whether facts like the dates of historical events meet the criteria for chunking or whether they occupy less space in working memory when memorised is debatable.

What’s more likely is that if more complex and less consistent facts are committed to memory, they are accessed more quickly and reliably than those that haven’t been memorised. Research evidence suggests that neural connections that are activated frequently become stronger and are accessed faster. Because information is carried in networks of neural connections, the more frequently we access facts or groups of facts, the faster and more reliably we will be able to access them. That’s a good thing. It doesn’t follow that those facts will occupy less space in working memory.

It certainly isn’t the case that simply committing to memory hundreds or thousands of facts will enable students to form a schema, or if they do, that it will be the schema their teacher would like them to form. Teachers might need to be explicit about the schemata that link facts. Since hundreds or thousands of facts tend to be linked by several different schemata – you can arrange the same facts in different ways – being explicit about the different ways they can be linked might be crucial to students’ understanding.

Essentially, deep structure schemata play an important role in three ways;

Students’ pre-existing schemata will affect their understanding of new information – they will interpret it in the light of the way they currently organise their knowledge. Teachers need to know about common misunderstandings as well as what they want students to understand.

Secondly, being able to identify the schema underlying one fact or small group of facts is the starting point for spotting similarities and differences between several groups of facts.

Thirdly, having a bird’s-eye view of the schemata involved in an entire knowledge domain increases students’ chances of understanding where a particular fact fits in to the grand scheme of things – and their awareness of what they don’t know.

Having a bird’s-eye view of the curriculum can help too, because it can show how different subject areas are linked. Subject areas and the curriculum are the subjects of the next post.

seven myths about education: facts and schemata

Knowledge occupies the bottom level of Bloom’s taxonomy of educational objectives. In the 1950s, Bloom and his colleagues would have known a good deal about the strategies teachers use to help students to acquire knowledge. What they couldn’t have known is how students formed their knowledge; how they extracted information from data and knowledge from information. At the time cognitive psychologists knew a fair amount about learning but had only a hazy idea about how it all fitted together. The DIKW pyramid I referred to in the previous post explains how the bottom layer of Bloom’s taxonomy works – how students extract information and knowledge during learning. Anderson’s simple theory of cognition explains how people extract low-level information. More recent research at the knowledge and wisdom levels is beginning to shed light on Bloom’s higher-level skills, why people organise the same body of knowledge in different ways and why they misunderstand and make mistakes.

Seven Myths about Education addresses the knowledge level of Bloom’s taxonomy. Daisy Christodoulou presents a model of cognition that she feels puts the higher-level skills in Bloom’s taxonomy firmly into context. Her model also forms the basis for a pedagogical approach and a structure for a curriculum, which I’ll discuss in another post. Facts are a core feature of Daisy’s model. I’ve mentioned previously that many disciplines find facts problematic because facts, by definition, have to be valid (true), and it’s often difficult to determine their validity. In this post I want to focus instead on the information processing entailed in learning facts.

a simple theory of cognition

Having explained the concept of chunking and the relationship between working and long-term memory, Daisy introduces Anderson’s paper;

So when we commit facts to long-term memory, they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition. Anderson puts it thus:

‘All that there is to intelligence is the simple accrual and tuning of many small units of knowledge that in total produce complex cognition. The whole is no more than the sum of its parts, but it has a lot of parts.’”

She then says “a lot is no exaggeration. Long-term memory is capable of storing thousands of facts, and when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20).

facts

This was one of the points where I began to lose track of Daisy’s argument. I think she’s saying this:

Anderson shows that low-level data can be chunked into a ‘unit of knowledge’ that is then treated as one item by WM – in effect increasing the capacity of WM. In the same way, thousands of memorised facts can be chunked into a more complex unit (a schema) that is then treated as one item by WM – this essentially bypasses the limitations of WM.

I think Daisy assumes that the principle Anderson found pertaining to low-level ‘units of knowledge’ applies to all units of knowledge at whatever level of abstraction. It doesn’t. Before considering why it doesn’t, it’s worth noting a problem with the use of the word ‘facts’ when describing data. Some researchers have equated data with ‘raw facts’. The difficulty with defining data as ‘facts’ is that by definition a fact has to be valid (true) and not all data is valid, as the GIGO (garbage-in-garbage-out) principle that bedevils computer data processing and the human brain’s often flaky perception of sensory input demonstrate. In addition, ‘facts’ are more complex than raw (unprocessed) data or raw (unprocessed) sensory input.

It’s clear from Daisy’s examples of facts that she isn’t referring to raw data or raw sensory input. Her examples include the date of the battle of Waterloo, key facts about numerous historical events and ‘all of the twelve times tables’. She makes it clear in the rest of the book that in order to understand such facts, students need prior knowledge. In terms of the DIKW hierarchy, Daisy’s ‘facts’ are at a higher level to Anderson’s ‘units of knowledge’ and are unlikely to be processed automatically and pre-consciously in the same way as Anderson’s units. To understand why, we need to take another look at Anderson’s units of knowledge and why chunking happens.

chunking revisited

Data that can be chunked easily have two key characteristics; they involve small amounts of information and the patterns within them are highly consistent. As I mentioned in the previous post, one of Anderson’s examples of chunking is the visual features of upper case H. As far as the brain is concerned, the two parallel vertical lines and linking horizontal line that make up the letter H don’t involve much information. Also, although fonts and handwriting vary, the core features of all the Hs the brain perceives are highly consistent. So the brain soon starts perceiving all Hs as the same thing and chunks up the core features into a single unit – the letter H. If H could also be written Ĥ and Ħ in English, it would take a bit longer for the brain to chunk the three different configurations of lines and to learn the association between them, but not much longer, since the three variants involve little information and are still highly consistent.

understanding facts

But the letter H isn’t a fact, it’s a symbol. So are + and the numerals 1 and 2. ‘1+2’ isn’t a fact in the sense that Daisy uses the term, it’s a series of symbols. ‘1+2=3’ could be considered a fact because it consists of symbols representing two entities and the relationship between them. If you know what the symbols refer to, you can understand it. It could probably be chunked because it contains a small amount of information and has consistent visual features. Each multiplication fact in multiplication tables could probably be chunked, too, since they meet the same criteria. But that’s not true for all the facts that Daisy refers to, because they are more complex and less consistent.

‘The cat is on the mat’ is a fact, but in order to understand it, you need some prior knowledge about cats, mats and what ‘on’ means. These would be treated by working memory as different items. Most English-speaking 5 year-olds would understand the ‘cat is on the mat’ fact, but because there are different sorts of cats, different sorts of mats and different ways in which the cat could be on the mat, each child could have a different mental image of the cat on the mat. A particular child might conjure up a different mental image each time he or she encountered the fact, meaning that different sensory data were involved each time, the mental representations of the fact would be low in consistency, and the fact’s component parts couldn’t be chunked into a single unit in the same way as lower-level more consistent representations. Consequently the fact is less likely to be treated as one item in working memory.

Similarly, in order to understand a fact like ‘the battle of Waterloo was in 1815’ you’d need to know what a battle is, where Waterloo is (or at least that it’s a place), what 1815 means and how ‘of’ links a battle and a place name. If you’re learning about the Napoleonic wars, your perception of the battle is likely to keep changing and the components of the facts would have low consistency meaning that it couldn’t be chunked in the way Anderson describes.

The same problem involving inconsistency would prevent two or more facts being chunked into a single unit. But clearly people do mentally link facts and the components of facts. They do it using a schema, but not quite in the way Daisy describes.

schemata

Before discussing how people use schemata (schemas), a comment on the biological structures that enable us to form them. I mentioned in an earlier post that the neurons in the brain form complex networks a bit like the veins in a leaf. Physical connections are formed between neighbouring neurons when the neurons are activated simultaneously by incoming data. If the same or very similar data are encountered repeatedly, the same neurons are activated repeatedly, connections between them are strengthened and eventually networks of neurons are formed that can carry a vast amount of information in their patterns of connections. The patterns of connections between the neurons represent the individual’s perception of the patterns in the data.

So if I see a cat on a mat, or read a sentence about a cat on a mat, or imagine a cat on a mat, my networks of neurons carrying information about cats and mats will be activated. Facts and concepts about cats, mats and things related to them will readily spring to mind. But I won’t have access to all of those facts and concepts at once. That would completely overload my working memory. Instead, what I recall is a stream of facts and concepts about cats and mats that takes time to access. It’s only a short time, but it doesn’t happen all at once. Also, some facts and concepts will be activated immediately and strongly and others will take longer and might be a bit hazy. In essence, a schema is a network of related facts and concepts, not a chunked ‘unit of knowledge’.

Daisy says “when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20). It doesn’t work quite like that, for several reasons.

the structure of a schema A schema is what it sounds like – a schematic plan or framework. It doesn’t consist of facts or concepts, but it’s a representation of how someone mentally arranges facts or concepts. In the same way the floor-plan of a building doesn’t consist of actual walls, doors and windows, but it does show you where those things are in the building in relation to each other. The importance of this apparently pedantic point will become clear when I discuss deep structure.

implicit and explicit schemata Schemata can be implicit – the brain organises facts and concepts in a particular way but we’re not aware of what it is – or explicit – we actively organise facts and concepts in a particular way and we aware of how they are organised.

the size of a schema Schemata can vary in size and complexity. The configuration of the three lines that make up the letter H is a schema, so is the way a doctor organises his or her knowledge about the human circulatory system. A schema doesn’t have to represent all the facts or concepts it links together. If it did, a schema involving thousands of facts would be so complex it wouldn’t be much help in showing how the facts were related. And in order to encompass all the different relationships between thousands of facts, a single schema for them would need to be very simple.

For example, a simple schema for chemistry would be that different chemicals are formed from different configurations of the sub-atomic ‘particles’ that make up atoms and configurations of atoms that form molecules. Thousands of facts can be fitted into that schema. In order to have a good understanding of chemistry, students would need to know about schemata other than just that simple one, and would need to know thousands of facts about chemistry before they would qualify as experts, but the simple schema plus a few examples would give them a basic understanding of what chemistry was about.

experts’ schemata Research into expertise (e.g. Chi et al, 1981) shows that experts don’t usually have one single schema for all the facts they know, but instead use different schemata for different aspects of their body of knowledge. Sometimes those schemata are explicitly linked, but sometimes they’re not. Sometimes they can’t be linked because no one knows how the linkage works yet.

chess experts

Daisy refers to research showing that expert chess players memorise thousands of different configurations of chess pieces (p.78). This is classic chunking; although in different chess sets specific pieces vary in appearance, their core visual features and the moves they can make are highly consistent, so frequently-encountered configurations of pieces are eventually treated by the brain as single units – the brain chunks the positions of the chess pieces in essentially the same way as it chunks letters into words.

De Groot’s work showed that chess experts initially identified the configurations of pieces that were possible as a next move, and then went through a process of eliminating the possibilities. The particular configuration of pieces on the board would activate several associated schemata involving possible next and subsequent moves.

So, each of the different configurations of chess pieces that are encountered so frequently they are chunked, has an underlying (simple) schema. Expert chess players then access more complex schemata for next and subsequent possible moves. Even if they have an underlying schema for chess as a whole, it doesn’t follow that they treat chess as a single unit or that they recall all possible configurations at once. Most people can reliably recognise thousands of faces and thousands of words and have schemata for organising them, but when thinking about faces or words, they don’t recall all faces or all words simultaneously. That would rapidly overload working memory.

Compared to most knowledge domains, chess is pretty simple. Chess expertise consists of memorising a large but limited number of configurations and having schemata that predict the likely outcomes from a selection of them. Because of the rules of chess, although lots of moves are possible, the possibilities are clearly defined and limited. Expertise in medicine, say, or history, is considerably more complex and less certain. A doctor might have many schemata for human biology; one for each of the skeletal, nervous, circulatory, respiratory and digestive systems, for cell metabolism, biochemistry and genetics etc. Not only is human biology more complex than chess, there’s also more uncertainty involved. Some of those schemata we’re pretty sure about, some we’re not so sure about and some we know very little about. There’s even more uncertainty involved in history. Evaluating evidence about how the human body works might be difficult, but the evidence itself is readily available in the form of human bodies. Historical evidence is often absent and likely to stay that way, which makes establishing facts and developing schemata more challenging.

To illustrate her point about schemata Daisy claims that learning couple of key facts about 150 historical events from 3000BC to the present, will form “the fundamental chronological schema that is the basis of all historical understanding” (p.20). Chronological sequencing could certainly form a simple schema for history, but you don’t need to know about many events in order to grasp that principle – two or three would suffice. Again, although this simple schema would give students a basic understanding of what history was about, in order to have a good understanding of history, students would need to know not only thousands of facts, but to develop many schemata about how those facts were linked before they would qualify as experts. This brings us on to the deep structure of knowledge, the subject of the next post.

references
Chi, MTH, Feltovich, PJ & Glaser, R (1981). Categorisation and Representation of Physics Problems by Experts and Novices, Cognitive Science, 5, 121-152
de Groot, AD (1978). Thought in Chess. Mouton.

Edited for clarity 8/1/17.

progressively worse

‘Let the data speak for themselves’ is a principle applied by researchers in a wide range of knowledge domains, from particle physics through molecular biology to sociology and economics. The converse would be ‘make the data say what you want them to say’, a human tendency that different knowledge domains have developed various ways of counteracting, such as experimental design, statistical analysis, peer review and being explicit about one’s own epistemological framework.

Cognitive science has explored several of the ways in which our evaluation of data can be flawed; Kahneman, Slovic & Tversky (1982) for example, examine in detail some of the errors and biases inherent in human reasoning. Findings from cognitive science have been embraced with enthusiasm by the new traditionalists, but they appear to have applied the findings only to teaching and learning, not to the thinking of the people who design education systems or pedagogical methods – or those who write books about those things. In Progressively Worse Robert Peal succumbs to some of those errors and biases – notably the oversimplification of complex phenomena, confirmation bias and attribution errors – and as a consequence he draws conclusions that are open to question.

The ‘furious debate’

Peal opens Progressively Worse with a question he says has been the subject of half a century of ‘furious debate’; ‘how should children learn?’ He exemplifies the debate as a series of dichotomies – an authoritative teacher vs independent learning, knowledge vs skills etc. representing differences between traditional and progressive educational approaches. He then provides an historical overview of changes to the British (or, more accurately English – they do things differently in Scotland) education system between 1960 and 2010, notes their impact on pedagogy and concludes that it’s only freedom to innovate that will rescue the country from the ‘damaging doctrine’ of progressive education to which the educational establishment is firmly wedded. (p.1)

Progressive or traditional

For Peal, progressive education has four core themes;

• education should be child-centred
• knowledge is not central to education
• strict discipline and moral education are oppressive and
• socio-economic background dictates success (pp.5-7).

He’s not explicit about the core themes of traditional education, but the features he mentions include;

• learning from the wisdom of an authoritative teacher
• an academic curriculum
• a structure of rewards and examinations
• sanctions for misbehaving and not working (p.1).

He also gives favourable mention to;

subject divisions
the house system
smart blazers, badges and ties
lots of sport
academic streaming
prize-giving
prefects
pupil duties
short hair
silent study
homework
testing
times tables
grammar, spelling and punctuation
school song, colours and motto
whole-class teaching, explanation and questioning
the difference between right and wrong, good and evil
class rankings

I claimed that Peal’s analysis of the English education system is subject to three principle cognitive errors or biases. Here are some examples:

Oversimplification

For the new traditionalists, cognitive load theory – derived from the fact that working memory has limited capacity – has important implications for pedagogy. But people don’t seek to minimise cognitive load only when learning new concepts in school. We also do it when handling complex ideas. On a day-to-day level, oversimplification can be advantageous because it enables rapid, flexible thinking; when devising public policy it can be catastrophic because the detail of policy is often as important as the overarching principle.

Education is a relatively simple idea in principle, but in practice it’s fiendishly complex, involving political and philosophical frameworks, socio-economic factors, systems pressures, teacher recruitment, training and practice and children’s health and development. Categorising education as ‘progressive’ or ‘traditional’ doesn’t make it any simpler. Each of Peal’s four core themes of progressive education is complex and could be decomposed into many elements. In classrooms, the elements that make up progressive education are frequently interspersed with elements of traditional education, so although I agree with him that some elements of progressive education taken to extreme have had a damaging influence, it’s by no means clear that they have been the only causes of damage, nor that other elements of progressive education have not been beneficial.

Peal backs up with numbers his claim that the British education system is experiencing ‘enduring educational failure’ (p. 4). He says the ‘bare figures are hard to ignore’. Indeed they are; what he doesn’t seem to realise is that ‘bare figures’ are also sometimes ambiguous. For example, the UK coming a third of the way down the PISA rankings is not an indication of educational ‘failure’ – unless your definition of success is a pretty narrow one. And the fact that in all countries except the UK literacy and numeracy levels of 16-24 year-olds are better than those of 55-65 year-olds might be telling us more about the resilience of the UK education system in the post-war period than about current literacy standards in other countries. ‘Bare figures’ rarely tell the whole story.

Confirmation bias

Another concept from cognitive science important to the new traditionalists is the schema – the way related information is organised in long-term memory. Schemata are seen as useful because they aid recall. But our own schemata aren’t always an accurate representation of the real world. Peal overlooks the role schemata play in confirmation bias; we tend to construe evidence that confirms the structure of one of our own existing schemata as having higher validity than evidence that contradicts it, even if the evidence overall shows that our schema is inaccurate.

Research usually begins with a carefully worded research question; the question has to be one that can have an answer, and the way the question is framed will determine what data are gathered and how they are analysed to provide an answer. The data don’t always confirm researchers’ expectations; what the data say is sometimes surprising and occasionally counterintuitive. Peal opens with the question; ‘how should children learn?’ but it’s not a question that could be answered using data as it’s framed in terms of an imperative. That’s not an issue for Peal, because he doesn’t use his data to answer the question, but starts with his answer and marshals the data to support it. He’s entitled to do this of course. Whether it’s an appropriate way to tackle an important area of public policy is another matter. The big pitfall in using this approach is that it’s all too easy to overlook data that doesn’t confirm one’s thesis, and Peal overlooks data relating to the effectiveness of traditional educational methods.

Peal’s focus on the history of progressive education during the last 50 years means he doesn’t cover the history of traditional education in the preceding centuries. If Peal’s account of British education is the only one you’ve read, you could be forgiven for thinking that traditional education was getting along just fine until the pesky progressives arrived with their political ideology that happened to gain traction because of the counter-cultural zeitgeist in the 1960s and 1970s. But other accounts paint a different picture.

Traditional education has had plenty of opportunities to demonstrate its effectiveness; Prussia had introduced a centralised, compulsory education system by the late 18th century – one that was widely emulated. But traditional methods weren’t without their critics. It wasn’t uncommon for a school to consist of one class with one teacher in charge. Children (sometimes hundreds) were seated in order of age on benches (‘forms’) and learned by rote not just multiplication tables and the alphabet, but entire lessons, which they then recited to older children or ‘monitors’ (Cubbereley, 1920). This was an approach derived from the catechetical method used for centuries by religious groups and was understandable if funding was tight and pupils didn’t have access to books. But a common complaint about rote learning was that children might memorise the lessons but they often didn’t understand them.

Another problem was the children with learning difficulties and disabilities enrolled in schools when education became compulsory. The Warnock committee reports teachers being surprised by the numbers. In England, such children were often hived off into special schools where those deemed ‘educable’ were trained for work. In France, by contrast, Braille, Itard and Seguin developed ways of supporting the learning of children with sensory impairments and Binet was commissioned to develop an assessment for learning difficulties that eventually transformed into the Stanford-Binet Intelligence Scale.

Corporal punishment for misdemeanours or failure to learn ‘lessons’ wasn’t uncommon either, especially after payment by results was introduced through ‘Lowe’s code’ in 1862. In The Lost Elementary Schools of Victorian England Philip Gardner draws attention to the reasons why ‘dame schools’- small schools in private houses – persisted up until WW2; these included meeting the needs of children terrified of corporal punishment and parents sceptical of the quality of teaching in state schools – often the result of their own experiences.

Not all schools were like this of course, and I don’t imagine for a moment that that’s what the new traditionalists would advocate. But it’s important to bear in mind that just as progressive methods taken to extremes can damage children’s educational prospects, traditional methods taken to extremes can do the same. It’s difficult to make an objective comparison of the outcomes of traditional and progressive education in the early days of the English state education system because comparable data aren’t available for the period prior to WW2, but it’s clear that the drawbacks of rote learning, whole class teaching and teacher authority made a significant contribution to progressive educational ideas being well-received by a generation of adults whose personal experience of school was often negative.

Attribution errors

Not only is the structure of some things complex, but their causes can be too. Confirmation bias can lead to some causes being considered but others being prematurely dismissed – in other words, to wrong causal attributions being made. One common attribution error is to assume that a positive correlation between two factors indicates that one causes another.

Peal attributes the origins of progressive education to Rousseau and the Romantic movement, presumably following ED Hirsch, a former professor of English literature whose specialism was the Romantic poets and who re-frames the nature/nurture debate as Romantic/Classical. Peal also claims that “progressive education seeks to apply political principles such as individual freedom and an aversion to authority to the realm of education” (p.4) supporting the new traditionalists’ view of progressive education as ideologically motivated. Although the pedagogical methods advocated by Pestalozzi, Froebel, Montessori and Dewey resemble Rousseau’s philosophy, a closer look at their ideas suggests his influence was limited. Pestalozzi became involved in developing Rousseau’s ideas when Rousseau’s books were banned in Switzerland. Pestalozzi was also influenced by Herbart, a philosopher intrigued by perception and consciousness, topics that preoccupied early psychologists such as William James, a significant influence on John Dewey. Froebel was a pupil of Pestalozzi interested in early learning who set up the original Kindergärten. Maria Montessori trained as a doctor. She applied the findings of Itard and Seguin who worked with deaf-mute children, to education in general. The founders of progressive education were influenced as much by psychology and medicine as by the Romantics.

Peal doesn’t appear to have considered the possibility of convergence – that people with very different worldviews, including Romantics, Marxists, social reformers, educators and those working with children with disabilities – might espouse similar educational approaches for very different reasons; or of divergence – that they might adopt some aspects of progressive education but not others.

Peal and traditional education

Peal’s model of the education system certainly fits his data, but that’s not surprising since he explicitly begins with a model and selects data to fit it. Although he implies that he would like to see a return to traditional approaches, he doesn’t say exactly what they would look like. Several of the characteristics of traditional education Peal refers to are the superficial trappings of long-established independent schools – bells, blazers and haircuts, for example. Although some of the other features he mentions might have educational impacts he doesn’t cite any evidence to show what they might be.

I suspect that Peal has fallen into the trap of assuming that because long-established independent schools have a good track record of providing a high quality academic education, it follows that if all schools emulated them in all respects, all students would get a good education. What this view overlooks is that independent schools are, and have always been, selective, even those set up specifically to provide an education for children from poor families. Providing a good academic education to an intellectually able, academically-inclined child from a family motivated enough to take on additional work to be able to afford the school uniform is a relatively straightforward task. Providing the same for a child with learning difficulties, interested only in football and motor mechanics whose dysfunctional family lives in poverty in a neighbourhood with a high crime rate is significantly more challenging, and might not be appropriate.

The way forward

The new traditionalists argue that the problems with the education system are the result of a ‘hands off’ approach by government and the educational establishment being allowed to get on with it. Peal depicts government, from Jim Callaghan’s administration onward, as struggling (and failing) to mitigate the worst excesses of progressive education propagated by the educational establishment. That’s a popular view, but not necessarily an accurate one and Peal’s data don’t support that conclusion. The data could equally well indicate that the more government intervenes in education, the worse things get. The post-war period has witnessed a long series of expensive disasters since government got more ‘hands on’ with education; the social divisiveness of the 11+, pressure on schools to adopt particular pedagogical approaches, enforced comprehensivisation, change to a three-tier system followed by a change back to a two-tier one, a constantly changing compulsory national curriculum, standardised testing focused on short-term rather than long-term outcomes, a local inspectorate replaced by a centralised one, accountability to local people replaced by accountability to central government, a constant stream of ‘initiatives’, constantly changing legislation and regulation and increasing micro-management.

A state education system has to be able to provide a suitable education for all children, a challenging task for teachers. The most effective approach found to date for occupations required to apply expertise to highly variable situations is the professional one. Although ‘professional’ is often used simply to denote good practice, it has a more specific meaning for occupations – professionals are practitioners who have acquired high-level expertise to the point where they are authorised to practice without supervision. Regulation and accountability comes via professional bodies and independent adjudicators. This model, used in occupations ranging from doctors, lawyers and architects to builders and landscape gardeners, although not foolproof, has worked well for centuries.

Teaching is an obvious candidate for professional status, but teachers in England have never been treated as true professionals. Initial teacher training has often been shortened or set aside entirely in times of economic downturn or shortages of teachers in specific subject areas, and it’s debatable whether a PGCE provides a sufficient grounding for subject-specialist secondary teachers, never mind for the range of skills required in primary education. Increasing micromanagement by local authorities and more recently by central government has undermined the professional status of teachers further.

I see no evidence to suggest that the university lecturers and researchers, civil servants, local authorities, school inspectors, teaching unions, educational psychologists and teachers themselves that make up the so-called ‘educational establishment’ are any less able than government to design a workable and effective education system – indeed by Peal’s own reckoning, during the period when they actually did that the education system functioned much better.

Despite providing some useful information about recent educational policy, Peal’s strategy of starting with a belief and using evidence to support it is unhelpful and possibly counterproductive because it overlooks alternative explanations for why there might be problems with the English education system. This isn’t the kind of evidence-based approach to policy that government needs to use. Let the data speak for themselves.

References
Cubberley, EP (1920). The History of Education. Cambridge, MA: Riverside Press
Gardner, P (1984). The Lost Elementary Schools of Victorian England: The People’s Education. Routledge.
Kahneman, D., Slovic, P & Tversky A (1982). Judgement under Uncertainty: Heuristics and Biases. Cambridge University Press.
Peal, R (2014). Progressively Worse: The Burden of Bad Ideas in British Schools. Civitas.