apprentice without a sorcerer

Cummings’ essay Some Thoughts on Education and Political Priorities highlights his admiration for experts, notably scientists, but this doesn’t prevent him making several classic novice errors. These errors, not surprisingly, lead Cummings to some conclusions contradicted by evidence he hasn’t considered. I’ve focused on four of them.

oversimplifying systems

Cummings knows that systems operate differently at different levels, and although all systems, as part of the physical world involve maths and physics, you can’t reduce all systems to maths and physics (p.18). But his preoccupation with maths and physics, and lack of attention to the higher levels of systems suggest he can’t resist doing just that. In his essay maths is mentioned 473 times (almost 2 mentions per page) and physics 179 times. Science gets 507 references and quantum 238. In contrast, the arts get 8 mentions and humanities 16. Ironically, given his emphasis on complex systems, Cummings seems determined to view complex knowledge domains like education, politics, the humanities and arts, only through the lenses of maths, physics and linear scales.

Cummings’ first degree is in history, but he knows a lot of scientific facts. How deep his understanding goes is another matter. He opens the section on a scientific approach to teaching practice with the famous ‘Cargo Cult’ speech in which Richard Feynman accused educational and psychological studies of mimicking the surface features of science but not applying the deep structure of the scientific method (p.70). Cumming’s criticism is well-founded; evidence has always influenced educational practice in the UK, but the level of rigour involved has varied considerably. Ironically, Cummings’ appeal to scientific evidence then itself sets off down the cargo-cult route.

misunderstanding key concepts: chunking vs schemata

Cummings claims “experts do better because they ‘chunk’ together lots of individual things in higher level concepts – networks of abstractions – which have a lot of compressed information and allow them to make sense of new information (experts can also use their networks to piece together things they have forgotten)” (p.71).

‘Chunking’ occurs when several distinct items of information are perceived and processed as one item. The research e.g. Miller (1956), De Groot (1965) and Anderson (1996), shows it happens automatically after groups of low-level (simple) items with strongly similar features have been encountered very frequently, e.g. Morse code, words, faces, chess positions. I’ve not seen any research that shows the same phenomenon happening with information that’s associated but complex and dissimilar. And Cummings doesn’t cite any.

Information that’s complex and dissimilar but frequently encountered together (e.g. Periodic Table, biological taxonomy, battle of Hastings) forms strong associations cognitively that are configured into a schema. What Cummings describes isn’t chunking; it’s the formation of a high level schema. Chunks are schemata, but not all schemata are chunks.

Cummings is right that experts abstract information to form high level schemata, but the information isn’t compressed as he claims. The abstractions are key features of aspects of the schema e.g. key features of transition metals, birds or invasions.  I can just about hold all the key features of birds in my working memory at once, but not at the same time as exceptions (e.g ostrich, penguin) or features of different bird species. The prototypical features make it easier to retrieve associated information, but it isn’t retrieved all at once. If I think about the key features of birds, many facts about birds and their features spring to mind, but they do so sequentially, not at the same time. The limitations of working memory still apply.

The distinction between chunking and schema formation is important because schemata play a big part in expertise e.g. Schank & Abelson (1977) and Rumelhart (1980). Despite their importance, Cummings refers to schemata only once, when he’s describing how his essay is structured (p.7). The omission is a significant one with implications for Cumming’s model of how experts structure their knowledge.

experts vs novices

Experts in a particular field derive their expertise from a body of knowledge that’s been found to be valid and reliable. They construct that knowledge into schemata, or mental models. New knowledge can then be incorporated into the schemata, which might then need to be configured differently. Sometimes experts disagree strongly, not about the content of their schemata, but about how the content is configured.

The ensuing debates can go on for decades. A classic example is the debate between those who think correlations between intelligence test scores indicate that intelligence is a ‘something’ that ‘really exists’, and those who think the assumption that there’s a ‘something’ called intelligence, shapes the choice of items in intelligence tests, so correlations should come as no surprise (see previous post). Another long-standing debate involves those who think universal patterns in the structure of language mean that language is hard-wired in the brain, versus others who think the patterns emerge from the way networks of neurons compute information.

Acquiring key information about an unfamiliar knowledge domain takes time and effort, and Cummings has obviously put in the hours. What’s more challenging is finding out how domain experts configure their knowledge – experts often take their schemata for granted and don’t make them explicit. Sometimes you need to ask directly (or be told) why knowledge is organized in a certain way, and if there are any crucial differences of opinion in the field.

Cummings doesn’t seem to have asked how experts structure their knowledge. Instead, he appears to have squeezed knowledge new to him (e.g. chunking) into his own pre-existing schema without checking whether his schema is right or wrong. Or, he’s adopted the first schema he’s agreed with (e.g. genes and IQ). He admits to basing his genes/IQ model largely on Robert Plomin’s Behavioural Genetics and talks by Stephen Hsu. He dismisses the controversies and takes Plomin and Hsu’s models for granted.

evaluating evidence

There are references to the scientific method in Cummings’ essay but they’re about data analysis, not the scientific method as such. A crucial step in the scientific method is evaluating evidence – analysing data for sure, but also testing hypotheses by weighing up the evidence for and against. This process isn’t about ‘balance’ – it’s about finding flaws in methods and reasoning in order to avoid confirmation bias.

But Cummings repeatedly accepts evidence in support of one thing or against another, without questioning it. I’d suggest he can’t question much of it because he doesn’t know enough about the field. Some that caught my eye are:

  • Assuming hunter-gatherers’ knowledge is “based on superstition (almost total ignorance of complex systems)” (p.1). Anthropology that might claim otherwise, is like other social sciences, summarily dismissed by Cummings.
  • Unsubstantiated claims such as “Aeronautics was confined to qualitative stories (like Icarus) until the 1880s when people started making careful observations and experiments about the principles of flight” (p.21). Da Vinci, Bacon, Montgolfiers, Caley? No mention.
  • Attributing European economic development between 14th and 19th centuries to ‘markets and science’ and omitting the role of the Reformation, French Revolution, or Enclosure Acts (p.108).
  • Uncritical acceptance of Smith’s and Hayek’s speculative claims about the benefits of markets (p.106).
  • Overlooking systems constraints on growth – in corn yields, computing power etc. (pp.46, 231-2). No mention of the ubiquitous sigmoid curve.
  • Overlooking the Club of Rome’s Limits to Growth when discussing shortage and innovation (p.112).
  • Emphasising the importance of complex systems with no mention of systems theory as such (e.g. Bertalanffy’s general systems theory).
  • Ignoring important debates about construct validity e.g. intelligence and personality (p.49).

not just wrong

People are often wrong about things and usually a few minor errors don’t matter. In Cummings’ case they matter a great deal, partly because he’s so influential, but also because even tiny errors can have huge consequences. I chose the example of chunking because Cummings’ interpretation of it has been disproportionately influential in recent English education policy.

Daisy Christodoulou in Seven Myths about Education (2014) takes the assumption about chunking a step further. She’s right that chunking low-level associations such as times tables allows us to ‘cheat’ the limitations of working memory, but wrong to assume (like Cummings) high-level schemata do the same. And flat-out wrong to claim “we can summon up the information from long-term memory to working memory without imposing a cognitive load.” (Christodoulou p.19, my emphasis). Her own example (23,322 x 42) contradicts her claim.

Christodoulou’s claim is based on Kirschner, Sweller & Clark’s 2006 paper ‘Why minimal guidance during instruction does not work’. The authors say; “The limitations of working memory only apply to new, yet to be learned information that has not been stored in long-term memory. New information such as new combinations of numbers or letters can only be stored for brief periods with severe limitations on the amount of such information that can be dealt with. In contrast, when dealing with previously learned information stored in long-term memory, these limitations disappear.” (Kirschner et al p.77).  The only evidence they cite is a 1995 review paper proposing an additional cognitive mechanism “long-term working memory”.

I have yet to read a proponent of Kirschner, Sweller & Clarke’s model discuss the well-known limitations of long-term memory, summarised here. Greg Ashman for example, following on from a useful summary of schemata, says;

One way of thinking about the role of long-term memory in solving problems or dealing with new information is that entire schema can be brought readily into working memory and manipulated as a single element alongside any new elements that we need to process. The normal limits imposed on working memory fall away almost entirely when dealing with schemas retrieved from long-term memory – a key idea of cognitive load theory. This illustrates both the power of having robust schemas in long-term memory and the effortlessness of deploying them; an effortlessness that fools so many of us into neglecting the critical role long-term memory plays in learning”.

Many with expertise as varied as English, history, physics or politics, have enthusiastically embraced findings from cognitive science that could improve the effectiveness of teaching. Or more accurately, they’ve embraced Kirschner, Sweller and Clarke’s model of memory and learning.  Some of the ‘cog sci’ enthusiasts have gone further. They’ve taken a handful of facts out of context, squeezed them into their own pre-existing schemata, and drawn conclusions that are at odds with the research. They’ve also assumed that if an expert in ‘cog sci’ makes a plausible claim it must be true, but haven’t evaluated the evidence cited by the expert – because they don’t have the relevant expertise; cognitive science is a knowledge domain unfamiliar to them.

Nevertheless objections to the Kirschner, Sweller and Clarke model are often dismissed as originating either in ideology or ignorance. Ironic, as despite emphasising the importance of knowledge, evidence and expertise, many of the proponents of ‘cog sci’ are patently novices selecting evidence to support a model that doesn’t stand up to scrutiny. Murray Gell-Man is right that we need people who can take a crude look at the whole of knowledge (p.5), but the crude look should be one informed by a good grasp of the domains in question.

In 1797, Goethe published a poem entitled Der Zauberlehrling (Sorcerer’s Apprentice). It was a popular work, and became even more popular in 1940 when animated as part of Disney’s Fantasia, with Mickey Mouse playing the part of the apprentice who started something he couldn’t stop. The moral of the story is that a little knowledge can be a dangerous thing. Cummings has been portrayed as a brilliant eccentric and/or an evil genius. I think he’s an apprentice without a sorcerer.

references

Anderson, J (1996) ACT: A simple theory of complex cognition, American Psychologist, 51, 355-365.

Christodoulou, D (2014).  Seven Myths about Education.  Routledge.

de Groot, A D (1965).  Thought and Choice in Chess.  Mouton.

Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

Miller, G (1956). The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information, Psychological Review, 63, 81-97.

Rumelhart, DE (1980). Schemata: the building blocks of cognition. In R.J. Spiro et al. (eds) Theoretical Issues in Reading Comprehension.  Lawrence Erlbaum: Hillsdale, NJ.

Schank, RC & Abelson, RP (1977). Scripts, Plans, Goals and Understanding: an Inquiry into Human Knowledge Structures.  Lawrence Erlbaum: Hillsdale, NJ.

 

 

 

the MUSEC briefings and Direct Instruction

Yesterday, I got involved in a discussion on Twitter about Direct Instruction (DI). The discussion was largely about what I had or hadn’t said about DI. Twitter isn’t the best medium for discussing anything remotely complex, but there’s something about DI that brings out the pedant in people, me included.

The discussion, if you can call it that, was triggered by a tweet about the most recent MUSEC briefing. The briefings, from Macquarie University Special Education Centre, are a great idea. A one-page round-up of the evidence relating to a particular mode of teaching or treatment used in special education is exactly the sort of resource I’d use often. So why the discussion about this one?

the MUSEC briefings

I’ve bumped into the briefings before. I read one a couple of years ago on the recommendation of a synthetics phonics advocate. It was briefing no.18, Explicit instruction for students with special learning needs. At the time, I wasn’t aware that ‘explicit instruction’ had any particular significance in education – other than denoting instruction that was explicit. And that could involve anything from a teacher walking round the room checking that students understood what they were doing, to ‘talk and chalk’, reading a book or computer-aided learning. The briefing left me feeling bemused. It was packed with implicit assumptions and the references, presented online presumably for reasons of space, included one self-citation, a report that reached a different conclusion to the briefing, a 400-page book by John Hattie that doesn’t appear to reach the same conclusion either, and a paper by Kirschner Sweller and Clark that doesn’t mention children with special educational needs, The references form a useful reading list for teachers, but hardly constitute robust evidence for support the briefing’s conclusions.

My curiosity piqued, I took a look at another briefing, no.33 on behavioural optometry. I chose it because the SP advocates I’d encountered tended to be sceptical about visual impairments being a causal factor in reading difficulties, and I wondered what evidence they were relying on. I knew a bit about visual problems because of my son’s experiences. The briefing repeatedly lumped together things that should have been kept distinct and came to different conclusions to the evidence it cites. I think I was probably unlucky with these first two because some of the other briefings look fine. So what about the one on Direct Instruction, briefing no.39?

Direct Instruction and Project Follow Through

Direct Instruction (capitalized) is a now commercially available scripted learning programme developed by Siegfried Engelmann and Wesley Becker in the US in the 1960s that performed outstandingly well in Project Follow Through (PFT).

The DI programme involved the scripted teaching of reading, arithmetic, and language to children between kindergarten and third grade. The PFT evaluation of DI showed significant gains in basic skills (word knowledge, spelling, language and math computation); in cognitive-conceptual skills (reading comprehension, math concepts, math problem solving) and in affect measures (co-operation, self-esteem, intellectual achievement, responsibility). A high school follow-up study by the sponsors of the DI programme showed that was associated with positive long-term outcomes.

The Twitter discussion revolved around what I meant by ‘basic’ and ‘skills’. To clarify, as I understand it the DI programme itself involved teaching basic skills (reading, arithmetic, language) to quite young children (K-3). The evaluation assessed basic skills, cognitive-conceptual skills and affect measures. There is no indication in the evidence I’ve been able to access of how sophisticated the cognitive-conceptual skills or affect measures were. One would expect them to be typical of children in the K-3 age range. And we don’t know how long those outcomes persisted. The only evidence for long-term positive outcomes is from a study by the programme sponsors – not to be discounted, but not a reliable enough to form the basis for a pedagogical method.

In other words, the PFT evaluation tells us that there were several robust positive outcomes from the DI programme. What it doesn’t tell us is whether the DI approach has the same robust outcomes if applied to other areas of the curriculum and/or with older children. Because the results of the evaluation are aggregated, it doesn’t tell us whether the DI programme benefitted all children or only some, or if it had any negative effects, or what the outcomes were for children with specific special educational needs or learning difficulties – the focus of MUSEC. Nor does it tell us anything about the use of direct instruction in general – what the briefing describes as a “generic overarching concept, with DI as a more specific exemplar”.

the evidence

The briefing refers to “a large body of research evidence stretching back over four decades testifying to the efficacy of explicit/direct instruction methods including the specific DI programs.” So what is the evidence?

The briefing itself refers only to the PFT evaluation of the DI programme. The references, available online consist of:

• a summary of findings written by the authors of the DI programme, Becker & Engelmann,
• a book about DI – the first two authors were Engelmann’s students and worked on the original DI programme,
• an excerpt from the same book on a commercial site called education.com,
• an editorial from a journal called Effective School Practices, previously known as Direct Instruction News and published by the National Institute for Direct Instruction (Chairman S Engelmann)
• a paper about the different ways in which direct instruction is understood, published by the Center on Innovation and Improvement which is administered by the Academic Development Institute, one of whose partners is Little Planet Learning,
• the 400-page book referenced by briefing 18,
• the peer-reviewed paper also referenced by briefing 18.

The references, which I think most people would construe as evidence, include only one peer-reviewed paper. It cites research findings supporting the use of direct instruction in relation to particular types of material, but doesn’t mention children with special needs or learning difficulties. Another reference is a synthesis of peer-reviewed studies. All the other references involve organisations with a commercial interest in educational methods – not the sort of evidence I’d expect to see in a briefing published by a university.

My recommendation for the MUSEC briefings? Approach with caution.

the new traditionalists: there’s more to d.i. than meets the eye, too

A few years ago, mystified by the way my son’s school was tackling his reading difficulties, I joined the TES forum and discovered I’d missed The Reading Wars. Well, not quite. They began before I started school and show no sign of ending any time soon. But I’d been blissfully unaware that they’d been raging around me.

On one side in the Reading Wars are advocates of a ‘whole language’ approach to learning to read – focusing on reading strategies and meaning – and on the other are advocates of teaching reading using phonics. Phonics advocates see their approach as evidence-based, and frequently refer to the whole language approach (using ‘mixed methods’) as based on ideology.

mixed methods

Most members of my family learned to read successfully using mixed methods. I was trained to teach reading using mixed methods and all the children I taught learned to read. My son, taught using synthetic phonics, struggled with reading and eventually figured it out for himself using whole word recognition. Hence my initial scepticism about SP. I’ve since changed my mind, having discovered that my son’s SP programme wasn’t properly implemented and after learning more about how the process of reading works. If I’d relied only on the scientific evidence cited as supporting SP, I wouldn’t have been convinced. Although it clearly supports SP as an approach to decoding, the impact on literacy in general isn’t so clear-cut.

ideology

I’ve also found it difficult to pin down the ideology purported to be at the root of whole language approaches. An ideology is a set of abstract ideas or values based on beliefs rather than on evidence, but the reasons given for the use of mixed methods when I was learning to read and when I was being trained to teach reading were pragmatic ones. In both instances, mixed methods were advocated explicitly because (analytic) phonics alone hadn’t been effective for some children, and children had been observed to use several different strategies during reading acquisition.

The nearest I’ve got to identifying an ideology are the ideas that language frames and informs people’s worldviews and that social and economic power plays a significant part in determining who teaches what to whom. The implication is that teachers, schools, school boards, local authorities or government don’t have a right to impose on children the way they construct their knowledge. To me, the whole language position looks more like a theoretical framework than an ideology, even if the theory is debatable.

the Teaching Wars

The Reading Wars appear to be but a series of battles in a much bigger war over what’s often referred to as traditional vs progressive teaching methods. The new traditionalists frequently characterise the Teaching Wars along the same lines as SP proponents characterise the Reading Wars; claiming that traditional methods are supported by scientific evidence, but ideology is the driving force behind progressive methods. Even a cursory examination of this claim suggests it’s a caricature of the situation rather than an accurate summary.

The progressives’ ideology
Rousseau is often cited as the originator of progressive education and indeed, progressive methods sometimes resemble the approach he advocated. However, many key figures in progressive education such as Herbert Spencer, John Dewey and Jean Piaget derived their methods from what was then state-of-the-art scientific theory and empirical observation, not from 18th century Romanticism.

The traditionalists’ scientific evidence The evidence cited by the new traditionalists appears to consist of a handful of findings from cognitive psychology and information science. They’re important findings, they should form part of teacher training and they might have transformed the practice of some teachers, but teaching and learning involves more than cognition. Children’s developing brains and bodies, their emotional and social background, the social, economic and political factors shaping the expectations on teachers and students in schools, and the philosophical frameworks of everybody involved suggest that evidence from many other scientific fields should also be informing educational theory, and that it might be risky to apply a few findings out of context.

I can understand the new traditionalists’ frustration. One has to ask why education theory hasn’t kept up to date with research in many fields that are directly relevant to teaching, learning, child development and the structure of the education system itself. However, dissatisfaction with progressive methods appears to originate, not so much with the methods themselves, as with the content of the curriculum and with progressive methods being taken to extremes.

keeping it simple

The limited capacity of working memory is the feature of human cognitive architecture that underpins Kirschner, Sweller and Clark’s argument in favour of direct instruction. One outcome of that limitation is a human tendency to oversimplify information by focusing on the prototypical features of phenomena – a tendency that often leads to inaccurate stereotyping. Kirschner, Sweller and Clark present their hypothesis in terms of a dispute between two ‘sides’ one advocating minimal guidance and the other a full explanation of concepts, procedures and strategies (p.75).

Although it’s appropriate in experimental work to use extreme examples of these approaches in order to test a hypothesis, the authors themselves point out that in a classroom setting most teachers using progressive methods provide students with considerable guidance anyway (p.79). Their conclusion that the most effective way to teach novices is through “direct, strong, instructional guidance” might be valid, but in respect of the oversimplified way they frame the dispute, they appear to have fallen victim to the very limitations of human cognitive architecture to which they draw our attention.

The presentation of the Teaching Wars in this polarised manner goes some way to explaining why direct instruction seems like such a big deal for the new traditionalists. Direct instruction shouldn’t be confused with Direct Instruction (capitalised) – the scripted teaching used in Engelmann & Becker’s DISTAR programme – although a recent BBC Radio 4 programme suggests that might be exactly what’s happening in some quarters.

direct instruction

The Radio 4 programme How do children learn history? is presented by Adam Smith, a senior lecturer in history at University College London, who has blogged about the programme here. He’s carefully non-committal about the methods he describes – it is the BBC after all.

A frequent complaint about the way the current national curriculum approaches history is what’s included, what’s excluded, what’s emphasised and what’s not. At home, we’ve had to do some work on timelines because although both my children have been required to put themselves into the shoes of various characters throughout history (an exercise my son has grown to loathe), neither of them knew how the Ancient Egyptians, Greeks, Romans, Vikings or Victorians related to each other – a pretty basic historical concept. But those are curriculum issues, rather than methods issues. As well as providing a background to the history curriculum debate, the broadcast featured two lessons that used different pedagogical approaches.

During an ‘inquiry’ lesson on Vikings, presented as a good example of current practice, groups of children were asked to gather information about different aspects of Viking life. A ‘direct instruction’ lesson on Greek religious beliefs, by contrast, involved the teacher reading from a textbook whilst the children followed the text in their own books with their finger, then discussed the text and answered comprehension questions on it. The highlight of the lesson appeared to be the inclusion of an exclamation mark in the text.

It’s possible that the way the programme was edited oversimplified the lesson on Greek religious beliefs, or that the children in the Viking lesson were older than those in the Greek lesson and better able to cope with ‘inquiry’, but there are clearly some possible pitfalls awaiting those who learn by relying on the content of a single textbook. The first is that whoever publishes the textbook controls the knowledge – that’s a powerful position to be in. The second is that you don’t need much training to be able to read from a textbook or lead a discussion about what’s in it – that has implications for who is going to be teaching our children. The third is how children will learn to question what they’re told. I’m not trying to undermine discipline in the classroom, just pointing out that textbooks can be, and sometimes are, wrong. The sooner children learn that authority lies in evidence rather than in authority figures, the better. Lastly, as a primary school pupil I would have found following a teacher reading from a textbook tedious in the extreme. As a secondary school pupil it was a teacher reading from a textbook for twenty minutes that clinched my decision to drop history as soon possible. I don’t think I’d be alone in that.

who are the new traditionalists?

The Greek religions lesson was part of a project funded by the Education Endowment Foundation (EEF), a charity developed by the Sutton Trust and the Impetus Trust in 2011 with a grant from the DfE. The EEF’s remit is to fund research into interventions aimed at improving the attainment of pupils receiving free school meals. The intervention featured in How do children learn history? is being implemented in Future Academies in central London. I think the project might be the one outlined here, although this one is evaluating the use of Hirsch’s Core Knowledge framework in literacy, rather than in history, which might explain the focus on extracting meaning from the text.

My first impression of the traditionalists was that they were a group of teachers disillusioned by the ineffectiveness of the pedagogical methods they were trained to use, who’d stumbled across some principles of cognitive science they’d found invaluable and were understandably keen to publicise them. Several of the teachers are Teach First graduates and work in academies or free schools – not surprising if they want freedom to innovate. They also want to see pedagogical methods rigorously evaluated, and the most effective ones implemented in schools. But those teachers aren’t the only parties involved.

Religious groups have welcomed the opportunities to open faith schools and develop their own curricula – a venture supported by previous and current governments despite past complications resulting from significant numbers of schools in England being run by churches and the current investigation into the alleged operation Trojan Horse in Birmingham.

Future, the sponsors of Future Academies and the Curriculum Centre, was founded by John and Caroline Nash, a former private equity specialist and stockbroker respectively. Both are reported to have made significant donations to the Conservative party. John Nash was appointed Parliamentary Under Secretary of State for Schools in January 2013. The Nashes are co-chairs of the board of governors of Pimlico Academy and Caroline Nash is chair of The Curriculum Centre. All four trustees of the Future group are from the finance industry.

Many well-established independent schools, notably residential schools for children with special educational needs and disabilities, are now controlled by finance companies. This isn’t modern philanthropy in action; the profits made from selling on the school chains, the magnitude of the fees charged to local authorities, and the fact that the schools are described as an ‘investment’, suggests that another motivation is at work.

A number of publishers of textbooks got some free product placement in a recent speech by Elizabeth Truss, currently parliamentary Under Secretary of state for Education and Childcare.

Educational reform might have teachers in the vanguard, but there appear to be some powerful bodies with religious, political and financial interests who might want to ensure they benefit from the outcomes, and have a say in what those outcomes are. The new traditionalist teachers might indeed be on to something with their focus on direct instruction, but if direct instruction boils down in practice to teachers using scripted texts or reading from textbooks, they will find plenty of other players willing to jump on the bandwagon and cash in on this simplistic and risky approach to educating the country’s most vulnerable children. Oversimplification can lead to unwanted complications.

direct instruction: the evidence

A discussion on Twitter raised a lot of questions about working memory and the evidence supporting direct instruction cited by Kirschner, Sweller and Clark. I couldn’t answer in 140 characters, so here’s my response. I hope it covers all the questions.

Kirschner Sweller & Clark’s thesis is;

• working memory capacity is limited
• constructivist, discovery, problem-based, experiential, and inquiry-based teaching (minimal guidance) all overload working memory and
• evidence from studies investigating efficacy of different methods supports the superiority of direct instruction.
Therefore, “In so far as there is any evidence from controlled studies, it almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners.” (p.83)

Sounds pretty unambiguous – but it isn’t.

1. Working memory (WM) isn’t simple. It includes several ‘dissociable’ sensory buffers and a central executive that monitors, attends to and responds to sensory information, information from the body and information from long term memory (LTM) (Wagner, Bunge & Badre, 2004; Damasio, 2006).

2. Studies comparing minimal guidance with direct instruction are based on ‘pure’ methods. Sweller’s work on cognitive load theory (CLT) (Sweller, 1988) was based on problems involving use of single buffer/loop e.g. mazes, algebra. New items coming into the buffer displace older items, so buffer capacity would be limiting factor. But real-world problems tend to involve different buffers, so items in the buffers can be easily maintained while they are manipulated by the central executive. For example, I can’t write something complex and listen to Radio 4 at the same time because my phonological loop can’t cope. But I can write and listen to music, or listen to Radio 4 whilst I cook a new recipe because I’m using different buffers. Discovery, problem-based, experiential, and inquiry-based teaching in classrooms tends to more closely resemble real world situations than the single-buffer problems used by Sweller to demonstrate the concept of cognitive load, so the impact of the buffer limit would be lessened.

3. For example, Klahr & Nigam (2004) point out that because there’s no clear definition of discovery learning, in their experiment involving a scientific concept they ‘magnified the difference between the two instructional treatments’ – ie used an ‘extreme type’ of both methods – that’s unlikely to occur in any classroom. Essentially they disproved the hypothesis that children always learn better by discovering things for themselves; but children are unlikely to ‘discover things for themselves’ in circumstances like those in the Klahr & Nigam study.

It’s worth noting that 8 of the children in their study figured out what to do at the outset, so were excluded from the results. And 23% of the direct instruction children didn’t master the concept well enough to transfer it.

That finding – that some learners failed to learn even when direct instruction was used, and that some learners might benefit from less direct instruction, comes up time and again in the evidence cited by Kirschner, Sweller and Clark, but gets overlooked in their conclusion.

I can quite see why educational methods using ‘minimal instruction’ might fail, and agree that proponents of such methods don’t appear to have taken much notice of such research findings as there are. But the findings are not unambiguous. It might be true that the evidence ‘almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners’ [my emphasis] but teachers aren’t faced with that forced choice. Also the evidence doesn’t show that direct, strong instructional guidance is always effective for all learners. I’m still not convinced that Kirschner, Sweller & Clark’s conclusion is justified.


References

Damasio, A (2006) Descartes’ Error. Vintage Books
Klahr, D & Klahr, D, & Nigam, M. (2004). The equivalence of learning paths in early
science instruction: Effects of direct instruction and discovery learning.
Psychological Science, 15, 661–667.
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning.
Cognitive Science, 12, 257–285.
Wagner, A.D., Bunge, S.A. & Badre, D. (2004). Cognitive control, semantic memory and priming: Contributions from prefontal cortex. In M. S. Gazzaniga (Ed.) The Cognitive Neurosciences (3rd edn.). Cambridge, MA: MIT Press.

a tale of two Blobs

The think-tank Civitas has just published a 53-page pamphlet written by Toby Young and entitled ‘Prisoners of The Blob’. ‘The Blob’ for the uninitiated, is the name applied by the UK’s Secretary of State for Education, Michael Gove, to ‘leaders of the teaching unions, local authority officials, academic experts and university education departments’ described by Young as ‘opponents of educational reform’. The name’s not original. Young says it was coined by William J Bennett, a former US Education Secretary; it was also used by Chris Woodhead, first Chief Inspector of Ofsted in his book Class War.

It’s difficult to tell whether ‘The Blob’ is actually an amorphous fog-like mass whose members embrace an identical approach to education as Young claims, or whether such a diverse range of people espouse such a diverse range of views that it’s difficult for people who would like life to be nice and straightforward to understand all the differences.

Young says;

They all believe that skills like ‘problem-solving’ and ‘critical thinking’ are more important than subject knowledge; that education should be ‘child-centred’ rather than ‘didactic’ or ‘teacher-led’; that ‘group work’ and ‘independent learning’ are superior to ‘direct instruction’; that the way to interest children in a subject is to make it ‘relevant’; that ‘rote-learning’ and ‘regurgitating facts’ is bad, along with discipline, hierarchy, routine and anything else that involves treating the teacher as an authority figure. The list goes on.” (p.3)

It’s obvious that this is a literary device rather than a scientific analysis, but that’s what bothers me about it.

Initially, I had some sympathy with the advocates of ‘educational reform’. The national curriculum had a distinctly woolly appearance in places, enforced group-work and being required to imagine how historical figures must have felt drove my children to distraction, and the approach to behaviour management at their school seemed incoherent. So when I started to come across references to educational reform based on evidence, the importance of knowledge and skills being domain-specific, I was relieved. When I found that applying findings from cognitive science to education was being advocated, I got quite excited.

My excitement was short-lived. I had imagined that a community of researchers had been busily applying cognitive science findings to education, that the literatures on learning and expertise were being thoroughly mined and that an evidence-based route-map was beginning to emerge. Instead, I kept finding references to the same small group of people.

Most fields of discourse are dominated by a few individuals. Usually they are researchers responsible for significant findings or major theories. A new or specialist field might be dominated by only two or three people. The difference here is that education straddles many different fields of discourse (biology, psychology sociology, philosophy and politics, plus a range of subject areas) so I found it a bit odd that the same handful of names kept cropping up. I would have expected a major reform of the education system to have had a wider evidence base.

Evaluating the evidence

And then there was the evidence itself. I might be looking in the wrong place, but so far, although I’ve found a few references, I’ve uncovered no attempts by proponents of educational reform to evaluate the evidence they cite.

A major flaw in human thinking is confirmation bias. To represent a particular set of ideas, we develop a mental schema. Every time we encounter the same set of ideas, the neural network that carries the schema is activated. The more it’s activated, the more readily it’s activated in future. This means that any configuration of ideas that contradicts a pre-existing schema, has, almost literally, to swim against the electromagnetic tide. It’s going to take a good few reiterations of the new idea set before a strongly embedded pre-existing schema is likely to be overridden by a new one. Consequently we tend to favour evidence that confirms our existing views, and find it difficult to see things in a different way.

The best way we’ve found to counteract confirmation bias in the way we evaluate evidence is through hypothesis testing. Essentially you come up with a hypothesis and then try to disprove it. If you can’t, it doesn’t mean your hypothesis is right, it just means you can’t yet rule it out. Hypothesis testing as such is mainly used in the sciences, but the same principle underlies formal debating, the adversarial approach in courts of law, and having an opposition to government in parliament. The last two examples are often viewed as needlessly combative, when actually their job is to spot flaws in what other people are saying. How well they do that job is another matter.

It’s impossible to tell at first glance whether a small number of researchers have made a breakthrough in education theory, or whether their work is simply being cited to affirm a set of beliefs. My suspicion that it might be the latter was strengthened when I checked out the evidence.

The evidence

John Hattie conducted a meta-anlaysis of over 800 studies of student achievement. My immediate thought when I came across his work was of the well-documented problems associated with meta-analyses. Hattie does discuss these, but I’m not convinced he disposed of one key issue; the garbage-in-garbage-out problem. A major difficulty with meta-analyses is ensuring that all the studies involved use the same definitions for the constructs they are measuring; and I couldn’t find a discussion of what Hattie (or other researchers) mean by ‘achievement’. I assume that Hattie uses test scores as a proxy measure of achievement. This is fine if you think the job of schools is to ensure that children learn what somebody has decided they should learn. But that assumption poses problems. One is who determines what students should learn. Another is what happens to students who, for whatever reason, can’t learn at the same rate as the majority. And a third is how the achievement measured in Hattie’s study maps on to achievement in later life. What’s noticeable about the biographies of many ‘great thinkers’ – Darwin and Einstein are prominent examples – is how many of them didn’t do very well in school. It doesn’t follow that Hattie is wrong – Darwin and Einstein might have been even greater thinkers if their schools had adopted his recommendations – but it’s an outcome Hattie doesn’t appear to address.

Siegfreid Engelmann and Wesley C Becker developed a system called Direct Instruction System for Teaching Arithmetic and Reading (DISTAR) that was shown to be effective in Project Follow-Through – a evaluation of a number of educational approaches in the US education system over a 30 year period starting in the 1960s. There’s little doubt that Direct Instruction is more effective than many other systems at raising academic achievement and self-esteem. The problem is, again, who decides what students learn, what happens to students who don’t benefit as much as others, and what’s meant by ‘achievement’.

ED Hirsch developed the Core Knowledge sequence – essentially an off-the-shelf curriculum that’s been adapted for the UK and is available from Civitas. The US Core Knowledge sequence has a pretty obvious underlying rationale even if some might question its stance on some points. The same can’t be said of the UK version. Compare, for example, the content of US Grade 1 History and Geography with that of the UK version for Year 1. The US version includes Early People and Civilisations and the History of World Religion – all important for understanding how human geography and cultures have developed over time. The UK version focuses on British Pre-history and History (with an emphasis on the importance of literacy) followed by Kings and Queens, Prime ministers then Symbols and figures – namely the Union Jack, Buckingham Palace, 10 Downing Street and the Houses of Parliament – despite the fact that few children in Y1 are likely to understand how or why these people or symbols came to be important. Although the strands of world history and British history are broadly chronological, Y4s study Ancient Rome alongside the Stuarts, and Y6s the American Civil War potentially before the Industrial Revolution.

Daniel Willingham is a cognitive psychologist and the author of Why don’t students like school? A cognitive scientist answers questions about how the mind works and what it means for the classroom and When can you trust the experts? How to tell good science from bad in education. He also writes for a column in American Educator magazine. I found Willingham informative on cognitive psychology. However, I felt his view of education was a rather narrow one. There’s nothing wrong with applying cognitive psychology to how teachers teach the curriculum in schools – it’s just that learning and education involve considerably more than that.

Kirschner, Sweller and Clark have written several papers about the limitations of working memory and its implications for education. In my view, their analysis has three key weaknesses; they arbitrarily lump together a range of education methods as if they were essentially the same, they base their theory on an outdated and incomplete model of memory, and they conclude that only one teaching approach is effective – explicit, direct instruction – ignoring the fact that knowledge comes in different forms.

Conclusions

I agree with some of the points made by the reformers:
• I agree with the idea of evidence-based education – the more evidence the better, in my view.
• I have no problem with children being taught knowledge. I don’t subscribe to a constructivist view of education – in the sense that we each develop a unique understanding of the world and everybody’s worldview is as valid as everybody else’s – although cognitive science has shown that everybody’s construction of knowledge is unique. We know that some knowledge is more valid and/or more reliable than other knowledge and we’ve developed some quite sophisticated ways of figuring out what’s more certain and what’s less certain.
• The application of findings from cognitive science to education is long overdue.
• I have no problem with direct instruction (as distinct from Direct Instruction) per se.

However, some of what I read gave me cause for concern:
• The evidence-base presented by the reformers is limited and parts of it are weak and flawed. It’s vital to evaluate evidence, not just to cite evidence that at face-value appears to support what you already think. And a body of evidence isn’t a unitary thing; some parts of it can be sound whilst other parts are distinctly dodgy. It’s important to be able to sift through it and weigh up the pros and cons. Ignoring contradictory evidence can be catastrophic.
• Knowledge, likewise, isn’t a unitary thing; it can vary in terms of validity and reliability.
• The evidence from cognitive science also needs to be evaluated. It isn’t OK to assume that just because cognitive scientists say something it must be right; cognitive scientists certainly don’t do that. Being able to evaluate cognitive science might entail learning a fair bit about cognitive science first.
• Direct instruction, like any other educational method, is appropriate for acquiring some types of knowledge. It isn’t appropriate for acquiring all types of knowledge. The problem with approaches such as discovery learning and child-led learning is not that there’s anything inherently wrong with the approaches themselves, but that they’re not suitable for acquiring all types of knowledge.

What has struck me most forcibly about my exploration of the evidence cited by the education reformers is that, although I agree with some of the reformers’ reservations about what’s been termed ‘minimal instruction’ approaches to education, the reformers appear to be ignoring their own advice. They don’t have extensive knowledge of the relevant subject areas, they don’t evaluate the relevant evidence, and the direct instruction framework they are advocating – certainly the one Civitas is advocating – doesn’t appear to have a structure derived from the relevant knowledge domains.

Rather than a rational, evidence-based approach to education, the ‘educational reform’ movement has all the hallmarks of a belief system that’s using evidence selectively to support its cause; and that’s what worries me. This new Blob is beginning to look suspiciously like the old one.

Kirschner, Sweller & Clark: a summary of my critique

It’s important not just to know things, but to understand them, which is why I took three posts to explain my unease about the paper by Kirschner, Sweller & Clark. From the responses I’ve received I appear to have overstated my explanation but understated my key points, so for the benefit of anybody unable or unwilling to read all the words, here’s a summary.

1. I have not said that Kirschner, Sweller & Clark are wrong to claim that working memory has a limited capacity. I’ve never come across any evidence that says otherwise. My concerns are about other things.

2. The complex issue of approaches to learning and teaching is presented as a two-sided argument. Presenting complex issues in an oversimplified way invariably obscures rather than clarifies the debate.

3. The authors appeal to a model of working memory that’s almost half a century old, rather than one revised six years before their paper came out and widely accepted as more accurate. Why would they do that?

4. They give the distinct impression that long-term memory isn’t subject to working memory constraints, when it is very much subject to them.

5. They completely omit any mention of the biological mechanisms involved in processing information. Understanding the mechanisms is key if you want to understand how people learn.

6. They conclude that explicit, direct instruction is the only viable teaching approach based on the existence of a single constraining factor – the capacity of working memory to process yet-to-be learned information (though exactly what they mean by yet-to-be learned isn’t explained). In a process as complex as learning, it’s unlikely that there will be only one constraining factor.

Kirschner, Sweller & Clark appear to have based their conclusion on a model of memory that was current in the 1970s (I know because that’s when I first learned about it), to have ignored subsequent research, and to have oversimplified the picture at every available opportunity.

What also concerns me is that some teachers appear to be taking what Kirschner, Sweller & Clark say at face value, without making any attempt to check the accuracy of their model, to question their presentation of the problem or the validity of their conclusion. There’s been much discussion recently about ‘neuromyths’. Not much point replacing one set of neuromyths with another.

Reference
Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

cognitive load and learning

In the previous two posts I discussed the model of working memory used by Kirschner, Sweller & Clark and how working memory and long-term memory function. The authors emphasise that their rejection of minimal guidance approaches to teaching is based on the limited capacity of working memory in respect of novel information, and that even if experts might not need much guidance “…nearly everyone else thrives when provided with full, explicit instructional guidance (and should not be asked to discover any essential content or skills)” (Clark, Kirschner & Sweller, p.6) Whether they are right or not depends on what they mean by ‘novel’ information.

So what’s new?

Kirschner, Sweller & Clark define novel information as ‘new, yet to be learned’ information that has not been stored in long-term memory (p.77). But novelty isn’t a simple case of information either being yet–to-be-learned or stored-in-long-term memory. If I see a Russian sentence written in Cyrillic script, its novelty value to me on a scale of 1-10 would be about 9. I can recognise some Cyrillic letters and know a few Russian words, but my working memory would be overloaded after about the third letter because of the multiple operations involved in decoding, blending and translating. A random string of Arabic numerals would have a novelty value of about 4, however, because I am very familiar with Arabic numerals; the only novelty would be in their order in the string. The sentence ‘the cat sat on the mat’ would have a novelty value close to zero because I’m an expert at chunking the letter patterns in English and I’ve encountered that sentence so many times.

Because novelty isn’t an either/or thing but sits on a sliding scale, and because the information coming into working memory can vary between simple and complex, that means that ‘new, yet to be learned’ information can vary in both complexity and novelty.

You could map it on a 2×2 matrix like this;

Slide1

A sentence such as ‘the monopsonistic equilibrium at M should now be contrasted with the equilibrium that would obtain under competitive conditions’ is complex (it contains many bits of information) but its novelty content would depend on the prior knowledge of the reader. It would score high on both the novelty and complexity scales of the average 5 year old. I don’t understand what the sentence means, but I do understand many of the words, so it would be mid-range in both novelty and complexity for me. An economist would probably give it a 3 for complexity but 0 for novelty. Trying to teach a 5 year-old what the sentence meant would completely overload their working memory. But it would be a manageable challenge for mine, and an economist would probably feel bored.

Kirschner, Sweller & Clark reject ‘constructivist, discovery, problem-based, experiential and inquiry-based approaches’ on the basis that they overload working memory and the excessive cognitive load means that learners don’t learn as efficiently as they would using explicit direct instruction. If only it were that simple.

‘Constructivist, discovery, problem-based, experiential and inquiry-based approaches’ were adopted initially not because teachers preferred them or because philosophers thought they were a good idea, but because by the end of the 19th century explicit, direct instruction – the only game in town for fledgling mass education systems – clearly wasn’t as effective as people had thought it would be. Alternative approaches were derived from three strategies that young children apply when learning ‘naturally’.

How young children learn

Human beings are mammals and young mammals learn by applying three key learning strategies which I’ll call ‘immersion’, trial-and-error and modelling (imitating the behaviour of other members of their species). By ‘strategy’, I mean an approach that they use, not that the baby mammals sit down and figure things out from first principles; all three strategies are outcomes of how mammals’ brains work.

Immersion

Most young children learn to walk, talk, feed and dress themselves and acquire a vast amount of information about their environment with very little explicit, direct instruction. And they acquire those skills pretty quickly and apparently effortlessly. The theory was that if you put school age children in a suitable environment, they would pick up other skills and knowledge equally effortlessly, without the boredom of rote-learning and the grief of repeated testing. Unfortunately, what advocates of discovery, problem-based, experiential and inquiry-based learning overlooked was the sheer amount of repetition involved in young children learning ‘naturally’.

Although babies’ learning is kick-started by some hard-wired processes such as reflexes, babies have to learn to do almost everything. They repeatedly rehearse their gross motor skills, fine motor skills and sensory processing. They practice babbling, crawling, toddling and making associations at every available opportunity. They observe things and detect patterns. A relatively simple skill like face-recognition, grasping an object or rolling over might only take a few attempts. More complex skills like using a spoon, crawling or walking take more. Very complex skills like using language require many thousands of rehearsals; it’s no coincidence that children’s speech and reading ability take several years to mature and their writing ability (an even more complex skill) doesn’t usually mature until adulthood.

The reason why children don’t learn to read, do maths or learn foreign languages as ‘effortlessly’ as they learn to walk or speak in their native tongue is largely because of the number of opportunities they have to rehearse those skills. An hour a day of reading or maths and a couple of French lessons a week bears no resemblance to the ‘immersion’ in motor development and their native language that children are exposed to. Inevitably, it will take them longer to acquire those skills. And if they take an unusually long time, it’s the child, the parent, the teacher or the method of that tends to be blamed, not the mechanism by which the skill is acquired.

Trial-and-error

The second strategy is trial-and-error. It plays a key role in the rehearsals involved in immersion, because it provides feedback to the brain about how the skill or knowledge is developing. Some skills, like walking, talking or handwriting, can only be acquired through trial-and-error because of the fine-grained motor feedback that’s required. Learning by trial-and-error can offer very vivid, never-forgotten experiences, regardless of whether the initial outcome is success or failure.

Modelling

The third strategy is modelling – imitating the behaviour of other members of the species (and sometimes other species or inanimate objects). In some cases, modelling is the most effective way of teaching because it’s difficult to explain (or understand) a series of actions in verbal terms.

Cognitive load

This brings us back to the issue of cognitive load. It isn’t the case that immersion, trial-and-error and modelling or discovery, problem-based, experiential and inquiry-based approaches always impose a high cognitive load, and that explicit direct instruction doesn’t. If that were true, young children would have to be actively taught to walk and talk and older ones would never forget anything. The problem with all these educational approaches is that they have all initially been seen as appropriate for teaching all knowledge and skills and have subsequently been rejected as ineffective. That’s not at all surprising, because different types of knowledge and skill require different strategies for effective learning.

Cognitive load is also affected by the complexity of incoming information and how novel it is to the learner. Nor is cognitive load confined to the capacity of working memory. 40 minutes of explicit, direct novel instruction, even if presented in well-paced working-memory-sized chunks, would pose a significant challenge to most brains. The reason, as I pointed out previously, is because the transfer of information from working memory to long-term memory is a biological process that takes time, resources and energy. Research into changes in the motor cortex suggests that the time involved might be as little as hours, but even that has implications for the pace at which students are expected to learn and how much new information they can process. There’s a reason why someone would find acquiring large amounts of new information tiring – their brain uses up a considerable amount of glucose getting that information embedded in the form of neural connections. The inevitable delay between information coming into the brain and being embedded in long-term memory suggests that down-time is as important as learning time – calling into question the assumption that the longer children spend actively ‘learning’ the more they will know.

Final thoughts

If I were forced to choose between constructivist, discovery, problem-based, experiential and inquiry-based approach to learning or explicit, direct instruction, I’d plump for explicit, direct instruction because the world we live in works according to discoverable principles and it makes sense to teach kids what those principles are, rather than to expect them to figure them out for themselves. However, it would have to be a forced choice, because we do learn through constructing our knowledge and through discovery, problem-solving, experiencing and inquiring as well as by explicit, direct instruction. The most appropriate learning strategy will depend on the knowledge or skill being learned.

The Kirschner, Sweller & Clark paper left me feeling perplexed and rather uneasy. I couldn’t understand why the authors frame the debate about educational approaches in terms of minimal guidance ‘on one side’ and direct instructional guidance ‘on the other’, when self-evidently the debate is more complex than that. Nor why they refer to Atkinson & Shiffrin’s model of working memory when Baddeley & Hitch’s more complex model is so widely accepted as more accurate. Nor why they omit any mention of the biological mechanisms involved in learning; not only are the biological mechanisms responsible for the way working memory and long-term memory operate, they also shed light on why any single educational approach doesn’t work for all knowledge, all skills – or even all students.

I felt it was ironic that the authors place so much emphasis on the way novices think but present a highly complex debate in binary terms – a classic feature of the way novices organise their knowledge. What was also ironic was that despite their emphasis on explicit, direct instruction, they failed to mention several important features of memory that would have helped a lay readership understand how memory works. This is all the more puzzling because some of these omissions (and a more nuanced model of instruction) are referred to in a paper on cognitive load by Paul Kirschner published four years earlier.

In order to fully understand what Kirschner, Sweller & Clark are saying, and to decide whether they were right or not, you’d need to have a fair amount of background knowledge about how brains work. To explain that clearly to a lay readership, and to address possible objections to their thesis, the authors would have had to extend the paper’s length by at least 50%. Their paper is just over 10 000 words long, suggesting that word-count issues might have resulted in them having to omit some points. That said, Educational Psychologist doesn’t currently apply a word limit, so maybe the authors were trying to keep the concepts as simple as possible.

Simplifying complex concepts for the benefit of a lay readership can certainly make things clearer, but over-simplifying them runs the risk of giving the wrong impression, and I think there’s a big risk of that happening here. Although the authors make it clear that explicit direct instruction can take many forms, they do appear to be proposing a one-size fits all approach that might not be appropriate for all knowledge, all skills or all students.

References

Clark, RE, Kirschner, PA & Sweller, J (2012). Putting students on the path to learning: The case for fully guided instruction, American Educator, Spring.

Kirschner, PA (2002). Cognitive load theory: implications of cognitive load theory on the design of learning, Learning and Instruction, 12 1–10.

Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

how working memory works

In my previous post I wondered why Kirschner, Sweller & Clark based their objections to minimal guidance in education on Atkinson & Schiffrin’s 1968 model of memory; it’s a model that assumes a mechanism for memory that’s now considerably out of date. A key factor in Kirschner, Sweller & Clark’s advocacy of direct instructional guidance is the limited capacity of working memory, and that’s what I want to look at in this post.

Other models are available

Atkinson & Shiffrin describe working memory as a ‘short-term store’. It has a limited capacity (around 4-9 bits of information) that it can retain for only a few seconds. It’s also a ‘buffer’; unless information in the short-term store is actively maintained, by rehearsal for example, it will be displaced by incoming information. Kirschner, Sweller & Clark note that ‘two well-known characteristics’ of working memory are its limited duration and capacity when ‘processing novel information’ (p.77), suggesting that their model of working memory is very similar to Atkinson & Shiffrin’s short-term store.

Slide1

In 1974 Alan Baddeley and Graham Hitch proposed a more sophisticated model for working memory that included dedicated auditory and visual information processing components. Their model has been revised in the light of more recent discoveries relating to the function of the prefrontal areas of the brain – the location of ‘working memory’. The Baddeley and Hitch model now looks a bit more complex than Atkinson & Shiffrin’s.

Baddeley & Hitch model

Baddeley & Hitch model

You could argue that it doesn’t matter how complex working memory is, or how the prefrontal areas of the brain work; neither alters the fact that the capacity of working memory is limited. Kirschner, Sweller & Clark question the effectiveness of educational methods involving minimal guidance because they increase cognitive load beyond the capacity of working memory. But Kirschner, Sweller & Clark’s model of working memory appears to be oversimplified and doesn’t take into account the biological mechanisms involved in learning.

Biological mechanisms involved in learning

Making connections

Learning is about associating one thing with another, and making associations is what the human brain does for a living. Associations are represented in the brain by connections formed between neurons; the ‘information’ is carried in the pattern of connections. A particular stimulus will trigger a series of electrical impulses through a particular network of connected neurons. So, if I spot my cat in the garden, that sight will trigger a series of electrical impulses that activates a particular network of neurons; the connections between the neurons represent all the information I’ve ever acquired about my cat. If I see my neighbour’s cat, much of the same neural pathway will be triggered because both cats are cats, it will then diverge slightly because I have acquired different information about each cat.

Novelty value

Neurons make connections with other neurons via synapses. Our current understanding of the role of synapses in information storage and retrieval suggests that new information triggers the formation of new synapses between neurons. If the same associations are encountered repeatedly, the relevant synapses are used repeatedly and those connections between neurons are strengthened, but if synapses aren’t active for a while, they are ‘pruned’. Toddlers form huge numbers of new synapses, but from the age of three through to adulthood, the number reduces dramatically as pruning takes place. It’s not clear whether synapse formation and pruning are pre-determined developmental phases or whether they happen in response to the kind of information that the brain is processing. Toddlers are exposed to vast amounts of novel information, but novelty rapidly tails off as they get older. Older adults tend to encounter very little novel information, often complaining that they’ve ‘seen it all before’.

The way working memory works

Most of the associations made by the brain occur in the cortex, the outer layer of the brain. Sensory information processed in specialised areas of cortex is ‘chunked’ into coherent wholes – what we call ‘perception’. Perceptual information is further chunked in the frontal areas of the brain to form an integrated picture of what’s going on around and within us. The picture that’s emerging from studies of prefrontal cortex is that this area receives, attends to, evaluates and responds to information from many other areas of the brain. It can do this because patterns of the electrical activity from other brain areas are maintained in prefrontal areas for a short time whilst evaluation takes place. As Antonio Damasio points out in Descartes’ Error, the evaluation isn’t always an active, or even a conscious process; there’s no little homunculus sitting at the front of the brain figuring out what information should take priority. What does happen is that streams of incoming information compete for attention. What gets attention depends on what information is coming in at any one time. If something happens that makes you angry during a maths lesson, you’re more likely to pay attention to that than to solving equations. During an exam, you might be concentrating so hard that you are unaware of anything happening around you.

The information coming into prefrontal cortex varies considerably. There’s a constant inflow from three main sources, of:

• real-time information from the environment via the sense organs;
• information about the physiological state of the body, including emotional responses to incoming information;
• information from the neural pathways formed by previous experience and activated by that sensory and physiological input (Kirschner, Sweller & Clark would call this long-term memory).

Working memory and long-term memory

‘Information’ and models of information processing are abstract concepts. You can’t pick them up or weigh them, so it’s tempting to think of information processing in the brain as an abstract process, involving rather abstract forces like electrical impulses. It would be easy to form the impression from Kirschner, Sweller & Clark’s model that well-paced, bite-sized chunks of novel information will flow smoothly from working memory to long-term memory, like water between two tanks. But the human brain is a biological organ, and it retains and accesses information using some very biological processes. Developing new synapses involves physical changes to the structure of neurons, and those changes take time, resources and energy. I’ll return to that point later, but first I want to focus on something that Kirschner, Sweller & Clark say about the relationship between working memory and long-term memory that struck me as a bit odd;

The limitations of working memory only apply to new, yet to be learned information that has not been stored in long-term memory. New information such as new combinations of numbers or letters can only be stored for brief periods with severe limitations on the amount of such information that can be dealt with. In contrast, when dealing with previously learned information stored in long-term memory, these limitations disappear.” (p77)

This statement is odd because it doesn’t tally with Atkinson & Shiffrin’s concept of the short-term store, and isn’t supported by decades of experimental work that show that capacity limitations apply to all information in working memory, regardless of its source. But Kirschner, Sweller & Clark go on to qualify their claim;

In the sense that information can be brought back from long-term memory to working memory over indefinite periods of time, the temporal limits of working memory become irrelevant.” (p77).

I think I can see what they’re getting at; because information is stored permanently in long-term memory it doesn’t rapidly fade away and you can access it any time you need to. But you have to access it via working memory, so it’s still subject to working memory constraints. I think the authors are referring implicitly to two ways in which the brain organizes information and which increase the capacity of working memory – chunking and schemata.

Chunking

If the brain frequently encounters small items of information that are usually associated with each other, it eventually ‘chunks’ them together and then processes them automatically as single units. George Miller, who in the 1950s did some pioneering research into working memory capacity, noted that people familiar with the binary notation then in widespread use by computer programmers, didn’t memorise random lists of 1s and 0s as random lists, but as numbers in the decimal system. So 10 would be remembered as 2, 100 as 8, 101 as 9 and so on. In this way, very long strings of 1s and 0s could be held in working memory in the form of decimal numbers that would automatically be translated back into 1s and 0s when the people taking part in the experiments were asked to recall the list. Morse code experts do the same; they don’t read messages as a series of dots and dashes, but chunk up the patterns of dots and dashes into letters and then into words. Exactly the same process occurs in reading, but we don’t call it chunking, we call it learning to read. Chunking effectively increases the capacity of working memory – but it doesn’t increase it by very much. Curiously, although Kirschner, Sweller & Clark refer to a paper by Egan and Schwartz that’s explicitly about chunking, they don’t mention chunking as such.

Schemata

What they do mention is the concept of the schema, particularly those of chess players. In the 1940s Adriaan de Groot discovered that expert chess players memorise a vast number of configurations of chess pieces on a board; he called each particular configuration a schema. I get the impression that Kirschner, Sweller & Clark see schemata and chunking as synonymous, even though a schema usually refers to a meta-level way of organising information, like a life-script or an overview, rather than an automatic processing of several bits of information as one unit. It’s quite possible that expert chess players do automatically read each configuration of chess pieces as one unit, but de Groot didn’t call it ‘chunking’ because his research was carried out a decade before George Miller coined the term.

Thinking about everything at once

Whether you call them chunks or schemata, what’s clear is that the brain has ways of increasing the amount of information held in working memory. Expert chess players aren’t limited to thinking about the four or five possible moves for one piece, but can think about four or five possible configurations for all pieces. But it doesn’t follow that the limitations of working memory in relation to long-term memory disappear as a result.

I mentioned in my previous post what information is made accessible via my neural networks if I see an apple. If I free-associate, I think of apples – apple trees – should we cover our apple trees if it’s wet and windy after they blossom? – will there be any bees to pollinate them? – bee viruses – viruses in ancient bodies found in melted permafrost – bodies of climbers found in melted glaciers, and so on. Because my neural connections represent multiple associations I can indeed access vast amounts of information stored in my brain. But I don’t access it all simultaneously. That’s just as well, because if I could access all that information at once my attempts to decide what to do with our remaining windfall apples would be thwarted by totally irrelevant thoughts about mountain rescue teams and St Bernard dogs. In short, if information stored in long-term memory weren’t subject to the capacity constraints of working memory, we’d never get anything done.

Chess masters (or ornithologists or brain surgeons) have access to vast amounts of information, but in any given situation they don’t need to access it all at once. In fact, accessing it all at once would be disastrous because it would take forever to eliminate information they didn’t need. At any point in any chess game, only a few configurations of pieces are possible, and that number is unlikely to exceed the capacity of working memory. Similarly, even if an ornithologist/brain surgeon can recognise thousands of species of birds/types of brain injury, in any given environment, most of those species/injuries are likely to be irrelevant, so don’t even need to be considered. There’s a good reason for working memory’s limited capacity and why all the information we process is subject to that limit.

In the next post, I want to look at how the limits of working memory impact on learning.

References

Atkinson, R, & Shiffrin, R (1968). Human memory: A proposed system and its control processes. In K. Spence & J. Spence (Eds.), The psychology of learning and motivation (Vol. 2, pp. 89–195). New York: Academic Press
Damasio, A (1994). Descartes’ Error, Vintage Books.
Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

memories are made of this

Education theory appears to be dominated by polarised debates. I’ve just come across another; minimal guidance vs direct instruction. Harry Webb has helpfully brought together what he calls the Kirschner, Sweller & Clark cycle of papers that seem to encapsulate it. The cycle consists of papers by these authors and responses to them, mostly published in Educational Psychologist during 2006-7.

Kirschner, Sweller & Clark are opposed to minimal guidance approaches in education and base their case on the structure of human cognitive architecture. As they rightly observe “Any instructional procedure that ignores the structures that constitute human cognitive architecture is not likely to be effective” (p.76). I agree completely, so let’s have a look at the structures of human cognitive architecture they’re referring to.

Older models

Kirschner, Sweller & Clark claim that “Most modern treatments of human cognitive architecture use the Atkinson and Shiffrin (1968) sensory memory–working memory–long-term memory model as their base” (p.76).

That depends on how you define ‘using a model as a base’. Atkinson and Shiffrin’s model is 45 years old. 45 years is a long time in the fast-developing field of brain research, so claiming that modern treatments use it as their base is a bit like claiming that modern treatments of blood circulation are based on William Harvey’s work (1628) or that modern biological classification is based on Carl Linnaeus’ system (1735). It would be true to say that modern treatments are derived from those models, but our understanding of circulation and biological classification has changed significantly since then, so the early models are almost invariably referred to only in an historical context. A modern treatment of cognitive architecture might mention Atkinson & Shiffrin if describing the history of memory research, but I couldn’t see why anyone would use it as a base for an educational theory – because the reality has turned out to be a lot more complicated than Atkinson and Shiffrin could have known at the time.

Atkinson and Shiffrin’s model was influential because it provided a coherent account of some apparently contradictory research findings about the characteristics of human memory. It was also based on the idea that features of information processing systems could be universally applied; that computers worked according to the same principles as did the nervous systems of sea slugs or the human brain. That idea wasn’t wrong, but the features of information processing systems have turned out to be a bit more complex than was first imagined.

The ups and downs of analogies

Theoretical models are rather like analogies; they are useful in explaining a concept that might otherwise be difficult for people to grasp. Atkinson and Shiffrin’s model essentially made the point that human memory wasn’t a single thing that behaved in puzzlingly different ways in different circumstances, but that it could have three components, each of which behaved consistently but differently.

But there’s a downside to analogies (and theoretical models); sometimes people forget that analogies are for illustrative purposes only, and that models show what hypotheses need to be tested. So they remember the analogy/model and forget what it’s illustrating, or they assume the analogy/model is an exact parallel of the reality, or, as I think has happened in this case, the analogy/model takes on a life of its own.

You can read most of Atkinson & Shiffrin’s chapter about their model here. There’s a diagram on p.113. Atkinson and Shiffrin’s model is depicted as consisting of three boxes. One box is the ‘sensory register’ – sensory memory that persists for a very short time and then fades away. The second box is a short-term store with a very limited capacity (5-9 bits of information) that can retain that information for a few seconds. The third box is a long-term store, where information is retained indefinitely. The short-term and long-term stores are connected to each other and information can be transferred between them in both directions. The model is based on what was known in 1968 about how memory behaved, but Atkinson and Shiffrin are quite explicit that there was a lot that wasn’t known.

Memories are made of this

Anyone looking at Atkinson & Shiffrin’s model for the first time could be forgiven for thinking that the long-term memory ‘store’ is like a library where memories are kept. That was certainly how many people thought about memory at the time. One of the problems with that way of thinking about memory is that the capacity required to store all the memories that people clearly do store, would exceed the number of cells in the brain and that accessing the memories by systematically searching through them would take a very long time – which it often doesn’t.

This puzzle was solved by the gradual realisation that the brain didn’t store individual memories in one place as if they were photographs in a huge album, but that ‘memories’ were activated via a vast network of interconnected neurons. A particular stimulus would activate a particular part of the neural network and that activation is the ‘memory’.

For example, if I see an apple, the pattern of light falling on my retina will trigger a chain of electrical impulses that activates all the neurons that have previously been activated in response to my seeing an apple. Or hearing about or reading about or eating apples. I will recall other apples I’ve seen, how they smell and taste, recipes that use apples, what the word ‘apple’ sounds like, how it’s spelled and written, ‘apple’ in other languages etc. That’s why memories can (usually) be retrieved so quickly. You don’t have to search through all memories to find the one you want. As Antonio Damasio puts it;

Images are not stored as facsimile pictures of things, or events or words, or sentences…In brief, there seem to be no permanently held pictures of anything, even miniaturized, no microfiches or microfilms, no hard copies… as the British psychologist Frederic Bartlett noted several decades ago, when he first proposed that memory is essentially reconstructive.” (p.100)

But Atkinson and Shiffrin don’t appear to have thought of memory in this way when they developed their model. Their references to ‘store’ and ‘search’ suggest they saw memory as more of a library than a network. That’s also how Kirschner, Sweller & Clark seem to view it. Although they say “our understanding of the role of long-term memory in human cognition has altered dramatically over the last few decades” (p.76), they repeatedly refer to long-term memory as a ‘store’ ‘containing huge amounts of information’. I think that description is misleading. Long-term memory is a property of neural networks – if any information is ‘stored’ it’s stored in the pattern and strength of the connections between neurons.

This is especially noticeable in the article the authors published in 2012 in American Educator from which it’s difficult not to draw the conclusion that long term memory is a store that contains many thousands of schemas, rather than a highly flexible network of connections that can be linked in an almost infinite number of ways.

Where did I put my memory?

In the first paper I mentioned, Kirschner, Sweller & Clark also refer to long-term memory and working memory as ‘structures’. Although they could mean ‘configurations’, the use of ‘structures’ does give the impression that there’s a bit of the brain dedicated to storing information long-term and another where it’s just passing through. Although some parts of the brain do have dedicated functions, those localities should be thought of as localities within a network of neurons. Information isn’t stored in particular locations in the brain, it’s distributed across it, although particular connections are located in particular places in the brain.

Theories having a life of their own

Atkinson and Shiffrin’s model isn’t exactly wrong; human memory does encompass short-lived sensory traces, short-term buffering and information that’s retained indefinitely. But implicit in their model are some assumptions about the way memory functions that have been superseded by later research.

At first I couldn’t figure out why anyone would base an educational theory on an out-dated conceptual model. Then it occurred to me that that’s exactly what’s happened in respect of theories about child development and autism. In both cases, someone has come up with a theory based on Freud’s ideas about children. Freud’s ideas in turn were based on his understanding of genetics and how the brain worked. Freud died in 1939, over a decade before the structure of DNA was discovered, and two decades before we began to get a detailed understanding of how brains process information. But what happened to the theories of child development and autism based on Freud’s understanding of genetics and brain function, is that they developed an independent existence and carried on regardless, instead of constantly being revised in the light of new understandings of genetics and brain function. Theories dominating autism research are finally being presented with a serious challenge from geneticists, but child development theories still have some way to go. Freud did a superb job with the knowledge available to him, but that doesn’t mean it’s a good idea to base a theory on his ideas as if new understandings of genetics and brain function haven’t happened.

Again I completely agree with Kirschner, Sweller & Clark that “any instructional procedure that ignores the structures that constitute human cognitive architecture is not likely to be effective”, but basing an educational theory on one aspect of human cognitive architecture – memory – and on an outdated concept of memory at that, is likely to be counterproductive.

A Twitter discussion of the Kirschner, Sweller & Clark model centred around the role of working memory, which is what I plan to tackle in my next post.

References

Atkinson, R, & Shiffrin, R (1968). Human memory: A proposed system and its control processes. In K. Spence & J. Spence (Eds.), The psychology of learning and motivation (Vol. 2, pp. 89–195). New York: Academic Press
Clark, RE, Kirschner, PA & Sweller, J (2012). Putting students on the path to learning: The case for fully guided instruction, American Educator, Spring.
Damasio, A (1994). Descartes’ Error, Vintage Books.
Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.