the MUSEC briefings and Direct Instruction

Yesterday, I got involved in a discussion on Twitter about Direct Instruction (DI). The discussion was largely about what I had or hadn’t said about DI. Twitter isn’t the best medium for discussing anything remotely complex, but there’s something about DI that brings out the pedant in people, me included.

The discussion, if you can call it that, was triggered by a tweet about the most recent MUSEC briefing. The briefings, from Macquarie University Special Education Centre, are a great idea. A one-page round-up of the evidence relating to a particular mode of teaching or treatment used in special education is exactly the sort of resource I’d use often. So why the discussion about this one?

the MUSEC briefings

I’ve bumped into the briefings before. I read one a couple of years ago on the recommendation of a synthetics phonics advocate. It was briefing no.18, Explicit instruction for students with special learning needs. At the time, I wasn’t aware that ‘explicit instruction’ had any particular significance in education – other than denoting instruction that was explicit. And that could involve anything from a teacher walking round the room checking that students understood what they were doing, to ‘talk and chalk’, reading a book or computer-aided learning. The briefing left me feeling bemused. It was packed with implicit assumptions and the references, presented online presumably for reasons of space, included one self-citation, a report that reached a different conclusion to the briefing, a 400-page book by John Hattie that doesn’t appear to reach the same conclusion either, and a paper by Kirschner Sweller and Clark that doesn’t mention children with special educational needs, The references form a useful reading list for teachers, but hardly constitute robust evidence for support the briefing’s conclusions.

My curiosity piqued, I took a look at another briefing, no.33 on behavioural optometry. I chose it because the SP advocates I’d encountered tended to be sceptical about visual impairments being a causal factor in reading difficulties, and I wondered what evidence they were relying on. I knew a bit about visual problems because of my son’s experiences. The briefing repeatedly lumped together things that should have been kept distinct and came to different conclusions to the evidence it cites. I think I was probably unlucky with these first two because some of the other briefings look fine. So what about the one on Direct Instruction, briefing no.39?

Direct Instruction and Project Follow Through

Direct Instruction (capitalized) is a now commercially available scripted learning programme developed by Siegfried Engelmann and Wesley Becker in the US in the 1960s that performed outstandingly well in Project Follow Through (PFT).

The DI programme involved the scripted teaching of reading, arithmetic, and language to children between kindergarten and third grade. The PFT evaluation of DI showed significant gains in basic skills (word knowledge, spelling, language and math computation); in cognitive-conceptual skills (reading comprehension, math concepts, math problem solving) and in affect measures (co-operation, self-esteem, intellectual achievement, responsibility). A high school follow-up study by the sponsors of the DI programme showed that was associated with positive long-term outcomes.

The Twitter discussion revolved around what I meant by ‘basic’ and ‘skills’. To clarify, as I understand it the DI programme itself involved teaching basic skills (reading, arithmetic, language) to quite young children (K-3). The evaluation assessed basic skills, cognitive-conceptual skills and affect measures. There is no indication in the evidence I’ve been able to access of how sophisticated the cognitive-conceptual skills or affect measures were. One would expect them to be typical of children in the K-3 age range. And we don’t know how long those outcomes persisted. The only evidence for long-term positive outcomes is from a study by the programme sponsors – not to be discounted, but not a reliable enough to form the basis for a pedagogical method.

In other words, the PFT evaluation tells us that there were several robust positive outcomes from the DI programme. What it doesn’t tell us is whether the DI approach has the same robust outcomes if applied to other areas of the curriculum and/or with older children. Because the results of the evaluation are aggregated, it doesn’t tell us whether the DI programme benefitted all children or only some, or if it had any negative effects, or what the outcomes were for children with specific special educational needs or learning difficulties – the focus of MUSEC. Nor does it tell us anything about the use of direct instruction in general – what the briefing describes as a “generic overarching concept, with DI as a more specific exemplar”.

the evidence

The briefing refers to “a large body of research evidence stretching back over four decades testifying to the efficacy of explicit/direct instruction methods including the specific DI programs.” So what is the evidence?

The briefing itself refers only to the PFT evaluation of the DI programme. The references, available online consist of:

• a summary of findings written by the authors of the DI programme, Becker & Engelmann,
• a book about DI – the first two authors were Engelmann’s students and worked on the original DI programme,
• an excerpt from the same book on a commercial site called education.com,
• an editorial from a journal called Effective School Practices, previously known as Direct Instruction News and published by the National Institute for Direct Instruction (Chairman S Engelmann)
• a paper about the different ways in which direct instruction is understood, published by the Center on Innovation and Improvement which is administered by the Academic Development Institute, one of whose partners is Little Planet Learning,
• the 400-page book referenced by briefing 18,
• the peer-reviewed paper also referenced by briefing 18.

The references, which I think most people would construe as evidence, include only one peer-reviewed paper. It cites research findings supporting the use of direct instruction in relation to particular types of material, but doesn’t mention children with special needs or learning difficulties. Another reference is a synthesis of peer-reviewed studies. All the other references involve organisations with a commercial interest in educational methods – not the sort of evidence I’d expect to see in a briefing published by a university.

My recommendation for the MUSEC briefings? Approach with caution.

Advertisements

the new traditionalists: there’s more to d.i. than meets the eye, too

A few years ago, mystified by the way my son’s school was tackling his reading difficulties, I joined the TES forum and discovered I’d missed The Reading Wars. Well, not quite. They began before I started school and show no sign of ending any time soon. But I’d been blissfully unaware that they’d been raging around me.

On one side in the Reading Wars are advocates of a ‘whole language’ approach to learning to read – focusing on reading strategies and meaning – and on the other are advocates of teaching reading using phonics. Phonics advocates see their approach as evidence-based, and frequently refer to the whole language approach (using ‘mixed methods’) as based on ideology.

mixed methods

Most members of my family learned to read successfully using mixed methods. I was trained to teach reading using mixed methods and all the children I taught learned to read. My son, taught using synthetic phonics, struggled with reading and eventually figured it out for himself using whole word recognition. Hence my initial scepticism about SP. I’ve since changed my mind, having discovered that my son’s SP programme wasn’t properly implemented and after learning more about how the process of reading works. If I’d relied only on the scientific evidence cited as supporting SP, I wouldn’t have been convinced. Although it clearly supports SP as an approach to decoding, the impact on literacy in general isn’t so clear-cut.

ideology

I’ve also found it difficult to pin down the ideology purported to be at the root of whole language approaches. An ideology is a set of abstract ideas or values based on beliefs rather than on evidence, but the reasons given for the use of mixed methods when I was learning to read and when I was being trained to teach reading were pragmatic ones. In both instances, mixed methods were advocated explicitly because (analytic) phonics alone hadn’t been effective for some children, and children had been observed to use several different strategies during reading acquisition.

The nearest I’ve got to identifying an ideology are the ideas that language frames and informs people’s worldviews and that social and economic power plays a significant part in determining who teaches what to whom. The implication is that teachers, schools, school boards, local authorities or government don’t have a right to impose on children the way they construct their knowledge. To me, the whole language position looks more like a theoretical framework than an ideology, even if the theory is debatable.

the Teaching Wars

The Reading Wars appear to be but a series of battles in a much bigger war over what’s often referred to as traditional vs progressive teaching methods. The new traditionalists frequently characterise the Teaching Wars along the same lines as SP proponents characterise the Reading Wars; claiming that traditional methods are supported by scientific evidence, but ideology is the driving force behind progressive methods. Even a cursory examination of this claim suggests it’s a caricature of the situation rather than an accurate summary.

The progressives’ ideology
Rousseau is often cited as the originator of progressive education and indeed, progressive methods sometimes resemble the approach he advocated. However, many key figures in progressive education such as Herbert Spencer, John Dewey and Jean Piaget derived their methods from what was then state-of-the-art scientific theory and empirical observation, not from 18th century Romanticism.

The traditionalists’ scientific evidence The evidence cited by the new traditionalists appears to consist of a handful of findings from cognitive psychology and information science. They’re important findings, they should form part of teacher training and they might have transformed the practice of some teachers, but teaching and learning involves more than cognition. Children’s developing brains and bodies, their emotional and social background, the social, economic and political factors shaping the expectations on teachers and students in schools, and the philosophical frameworks of everybody involved suggest that evidence from many other scientific fields should also be informing educational theory, and that it might be risky to apply a few findings out of context.

I can understand the new traditionalists’ frustration. One has to ask why education theory hasn’t kept up to date with research in many fields that are directly relevant to teaching, learning, child development and the structure of the education system itself. However, dissatisfaction with progressive methods appears to originate, not so much with the methods themselves, as with the content of the curriculum and with progressive methods being taken to extremes.

keeping it simple

The limited capacity of working memory is the feature of human cognitive architecture that underpins Kirschner, Sweller and Clark’s argument in favour of direct instruction. One outcome of that limitation is a human tendency to oversimplify information by focusing on the prototypical features of phenomena – a tendency that often leads to inaccurate stereotyping. Kirschner, Sweller and Clark present their hypothesis in terms of a dispute between two ‘sides’ one advocating minimal guidance and the other a full explanation of concepts, procedures and strategies (p.75).

Although it’s appropriate in experimental work to use extreme examples of these approaches in order to test a hypothesis, the authors themselves point out that in a classroom setting most teachers using progressive methods provide students with considerable guidance anyway (p.79). Their conclusion that the most effective way to teach novices is through “direct, strong, instructional guidance” might be valid, but in respect of the oversimplified way they frame the dispute, they appear to have fallen victim to the very limitations of human cognitive architecture to which they draw our attention.

The presentation of the Teaching Wars in this polarised manner goes some way to explaining why direct instruction seems like such a big deal for the new traditionalists. Direct instruction shouldn’t be confused with Direct Instruction (capitalised) – the scripted teaching used in Engelmann & Becker’s DISTAR programme – although a recent BBC Radio 4 programme suggests that might be exactly what’s happening in some quarters.

direct instruction

The Radio 4 programme How do children learn history? is presented by Adam Smith, a senior lecturer in history at University College London, who has blogged about the programme here. He’s carefully non-committal about the methods he describes – it is the BBC after all.

A frequent complaint about the way the current national curriculum approaches history is what’s included, what’s excluded, what’s emphasised and what’s not. At home, we’ve had to do some work on timelines because although both my children have been required to put themselves into the shoes of various characters throughout history (an exercise my son has grown to loathe), neither of them knew how the Ancient Egyptians, Greeks, Romans, Vikings or Victorians related to each other – a pretty basic historical concept. But those are curriculum issues, rather than methods issues. As well as providing a background to the history curriculum debate, the broadcast featured two lessons that used different pedagogical approaches.

During an ‘inquiry’ lesson on Vikings, presented as a good example of current practice, groups of children were asked to gather information about different aspects of Viking life. A ‘direct instruction’ lesson on Greek religious beliefs, by contrast, involved the teacher reading from a textbook whilst the children followed the text in their own books with their finger, then discussed the text and answered comprehension questions on it. The highlight of the lesson appeared to be the inclusion of an exclamation mark in the text.

It’s possible that the way the programme was edited oversimplified the lesson on Greek religious beliefs, or that the children in the Viking lesson were older than those in the Greek lesson and better able to cope with ‘inquiry’, but there are clearly some possible pitfalls awaiting those who learn by relying on the content of a single textbook. The first is that whoever publishes the textbook controls the knowledge – that’s a powerful position to be in. The second is that you don’t need much training to be able to read from a textbook or lead a discussion about what’s in it – that has implications for who is going to be teaching our children. The third is how children will learn to question what they’re told. I’m not trying to undermine discipline in the classroom, just pointing out that textbooks can be, and sometimes are, wrong. The sooner children learn that authority lies in evidence rather than in authority figures, the better. Lastly, as a primary school pupil I would have found following a teacher reading from a textbook tedious in the extreme. As a secondary school pupil it was a teacher reading from a textbook for twenty minutes that clinched my decision to drop history as soon possible. I don’t think I’d be alone in that.

who are the new traditionalists?

The Greek religions lesson was part of a project funded by the Education Endowment Foundation (EEF), a charity developed by the Sutton Trust and the Impetus Trust in 2011 with a grant from the DfE. The EEF’s remit is to fund research into interventions aimed at improving the attainment of pupils receiving free school meals. The intervention featured in How do children learn history? is being implemented in Future Academies in central London. I think the project might be the one outlined here, although this one is evaluating the use of Hirsch’s Core Knowledge framework in literacy, rather than in history, which might explain the focus on extracting meaning from the text.

My first impression of the traditionalists was that they were a group of teachers disillusioned by the ineffectiveness of the pedagogical methods they were trained to use, who’d stumbled across some principles of cognitive science they’d found invaluable and were understandably keen to publicise them. Several of the teachers are Teach First graduates and work in academies or free schools – not surprising if they want freedom to innovate. They also want to see pedagogical methods rigorously evaluated, and the most effective ones implemented in schools. But those teachers aren’t the only parties involved.

Religious groups have welcomed the opportunities to open faith schools and develop their own curricula – a venture supported by previous and current governments despite past complications resulting from significant numbers of schools in England being run by churches and the current investigation into the alleged operation Trojan Horse in Birmingham.

Future, the sponsors of Future Academies and the Curriculum Centre, was founded by John and Caroline Nash, a former private equity specialist and stockbroker respectively. Both are reported to have made significant donations to the Conservative party. John Nash was appointed Parliamentary Under Secretary of State for Schools in January 2013. The Nashes are co-chairs of the board of governors of Pimlico Academy and Caroline Nash is chair of The Curriculum Centre. All four trustees of the Future group are from the finance industry.

Many well-established independent schools, notably residential schools for children with special educational needs and disabilities, are now controlled by finance companies. This isn’t modern philanthropy in action; the profits made from selling on the school chains, the magnitude of the fees charged to local authorities, and the fact that the schools are described as an ‘investment’, suggests that another motivation is at work.

A number of publishers of textbooks got some free product placement in a recent speech by Elizabeth Truss, currently parliamentary Under Secretary of state for Education and Childcare.

Educational reform might have teachers in the vanguard, but there appear to be some powerful bodies with religious, political and financial interests who might want to ensure they benefit from the outcomes, and have a say in what those outcomes are. The new traditionalist teachers might indeed be on to something with their focus on direct instruction, but if direct instruction boils down in practice to teachers using scripted texts or reading from textbooks, they will find plenty of other players willing to jump on the bandwagon and cash in on this simplistic and risky approach to educating the country’s most vulnerable children. Oversimplification can lead to unwanted complications.

direct instruction: the evidence

A discussion on Twitter raised a lot of questions about working memory and the evidence supporting direct instruction cited by Kirschner, Sweller and Clark. I couldn’t answer in 140 characters, so here’s my response. I hope it covers all the questions.

Kirschner Sweller & Clark’s thesis is;

• working memory capacity is limited
• constructivist, discovery, problem-based, experiential, and inquiry-based teaching (minimal guidance) all overload working memory and
• evidence from studies investigating efficacy of different methods supports the superiority of direct instruction.
Therefore, “In so far as there is any evidence from controlled studies, it almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners.” (p.83)

Sounds pretty unambiguous – but it isn’t.

1. Working memory (WM) isn’t simple. It includes several ‘dissociable’ sensory buffers and a central executive that monitors, attends to and responds to sensory information, information from the body and information from long term memory (LTM) (Wagner, Bunge & Badre, 2004; Damasio, 2006).

2. Studies comparing minimal guidance with direct instruction are based on ‘pure’ methods. Sweller’s work on cognitive load theory (CLT) (Sweller, 1988) was based on problems involving use of single buffer/loop e.g. mazes, algebra. New items coming into the buffer displace older items, so buffer capacity would be limiting factor. But real-world problems tend to involve different buffers, so items in the buffers can be easily maintained while they are manipulated by the central executive. For example, I can’t write something complex and listen to Radio 4 at the same time because my phonological loop can’t cope. But I can write and listen to music, or listen to Radio 4 whilst I cook a new recipe because I’m using different buffers. Discovery, problem-based, experiential, and inquiry-based teaching in classrooms tends to more closely resemble real world situations than the single-buffer problems used by Sweller to demonstrate the concept of cognitive load, so the impact of the buffer limit would be lessened.

3. For example, Klahr & Nigam (2004) point out that because there’s no clear definition of discovery learning, in their experiment involving a scientific concept they ‘magnified the difference between the two instructional treatments’ – ie used an ‘extreme type’ of both methods – that’s unlikely to occur in any classroom. Essentially they disproved the hypothesis that children always learn better by discovering things for themselves; but children are unlikely to ‘discover things for themselves’ in circumstances like those in the Klahr & Nigam study.

It’s worth noting that 8 of the children in their study figured out what to do at the outset, so were excluded from the results. And 23% of the direct instruction children didn’t master the concept well enough to transfer it.

That finding – that some learners failed to learn even when direct instruction was used, and that some learners might benefit from less direct instruction, comes up time and again in the evidence cited by Kirschner, Sweller and Clark, but gets overlooked in their conclusion.

I can quite see why educational methods using ‘minimal instruction’ might fail, and agree that proponents of such methods don’t appear to have taken much notice of such research findings as there are. But the findings are not unambiguous. It might be true that the evidence ‘almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners’ [my emphasis] but teachers aren’t faced with that forced choice. Also the evidence doesn’t show that direct, strong instructional guidance is always effective for all learners. I’m still not convinced that Kirschner, Sweller & Clark’s conclusion is justified.


References

Damasio, A (2006) Descartes’ Error. Vintage Books
Klahr, D & Klahr, D, & Nigam, M. (2004). The equivalence of learning paths in early
science instruction: Effects of direct instruction and discovery learning.
Psychological Science, 15, 661–667.
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning.
Cognitive Science, 12, 257–285.
Wagner, A.D., Bunge, S.A. & Badre, D. (2004). Cognitive control, semantic memory and priming: Contributions from prefontal cortex. In M. S. Gazzaniga (Ed.) The Cognitive Neurosciences (3rd edn.). Cambridge, MA: MIT Press.

A tale of two Blobs

The think-tank Civitas has just published a 53-page pamphlet written by Toby Young and entitled ‘Prisoners of The Blob’. ‘The Blob’ for the uninitiated, is the name applied by the UK’s Secretary of State for Education, Michael Gove, to ‘leaders of the teaching unions, local authority officials, academic experts and university education departments’ described by Young as ‘opponents of educational reform’. The name’s not original. Young says it was coined by William J Bennett, a former US Education Secretary; it was also used by Chris Woodhead, first Chief Inspector of Ofsted in his book Class War.

It’s difficult to tell whether ‘The Blob’ is actually an amorphous fog-like mass whose members embrace an identical approach to education as Young claims, or whether such a diverse range of people espouse such a diverse range of views that it’s difficult for people who would like life to be nice and straightforward to understand all the differences.

Young says;

They all believe that skills like ‘problem-solving’ and ‘critical thinking’ are more important than subject knowledge; that education should be ‘child-centred’ rather than ‘didactic’ or ‘teacher-led’; that ‘group work’ and ‘independent learning’ are superior to ‘direct instruction’; that the way to interest children in a subject is to make it ‘relevant’; that ‘rote-learning’ and ‘regurgitating facts’ is bad, along with discipline, hierarchy, routine and anything else that involves treating the teacher as an authority figure. The list goes on.” (p.3)

It’s obvious that this is a literary device rather than a scientific analysis, but that’s what bothers me about it.

Initially, I had some sympathy with the advocates of ‘educational reform’. The national curriculum had a distinctly woolly appearance in places, enforced group-work and being required to imagine how historical figures must have felt drove my children to distraction, and the approach to behaviour management at their school seemed incoherent. So when I started to come across references to educational reform based on evidence, the importance of knowledge and skills being domain-specific, I was relieved. When I found that applying findings from cognitive science to education was being advocated, I got quite excited.

My excitement was short-lived. I had imagined that a community of researchers had been busily applying cognitive science findings to education, that the literatures on learning and expertise were being thoroughly mined and that an evidence-based route-map was beginning to emerge. Instead, I kept finding references to the same small group of people.

Most fields of discourse are dominated by a few individuals. Usually they are researchers responsible for significant findings or major theories. A new or specialist field might be dominated by only two or three people. The difference here is that education straddles many different fields of discourse (biology, psychology sociology, philosophy and politics, plus a range of subject areas) so I found it a bit odd that the same handful of names kept cropping up. I would have expected a major reform of the education system to have had a wider evidence base.

Evaluating the evidence

And then there was the evidence itself. I might be looking in the wrong place, but so far, although I’ve found a few references, I’ve uncovered no attempts by proponents of educational reform to evaluate the evidence they cite.

A major flaw in human thinking is confirmation bias. To represent a particular set of ideas, we develop a mental schema. Every time we encounter the same set of ideas, the neural network that carries the schema is activated. The more it’s activated, the more readily it’s activated in future. This means that any configuration of ideas that contradicts a pre-existing schema, has, almost literally, to swim against the electromagnetic tide. It’s going to take a good few reiterations of the new idea set before a strongly embedded pre-existing schema is likely to be overridden by a new one. Consequently we tend to favour evidence that confirms our existing views, and find it difficult to see things in a different way.

The best way we’ve found to counteract confirmation bias in the way we evaluate evidence is through hypothesis testing. Essentially you come up with a hypothesis and then try to disprove it. If you can’t, it doesn’t mean your hypothesis is right, it just means you can’t yet rule it out. Hypothesis testing as such is mainly used in the sciences, but the same principle underlies formal debating, the adversarial approach in courts of law, and having an opposition to government in parliament. The last two examples are often viewed as needlessly combative, when actually their job is to spot flaws in what other people are saying. How well they do that job is another matter.

It’s impossible to tell at first glance whether a small number of researchers have made a breakthrough in education theory, or whether their work is simply being cited to affirm a set of beliefs. My suspicion that it might be the latter was strengthened when I checked out the evidence.

The evidence

John Hattie conducted a meta-anlaysis of over 800 studies of student achievement. My immediate thought when I came across his work was of the well-documented problems associated with meta-analyses. Hattie does discuss these, but I’m not convinced he disposed of one key issue; the garbage-in-garbage-out problem. A major difficulty with meta-analyses is ensuring that all the studies involved use the same definitions for the constructs they are measuring; and I couldn’t find a discussion of what Hattie (or other researchers) mean by ‘achievement’. I assume that Hattie uses test scores as a proxy measure of achievement. This is fine if you think the job of schools is to ensure that children learn what somebody has decided they should learn. But that assumption poses problems. One is who determines what students should learn. Another is what happens to students who, for whatever reason, can’t learn at the same rate as the majority. And a third is how the achievement measured in Hattie’s study maps on to achievement in later life. What’s noticeable about the biographies of many ‘great thinkers’ – Darwin and Einstein are prominent examples – is how many of them didn’t do very well in school. It doesn’t follow that Hattie is wrong – Darwin and Einstein might have been even greater thinkers if their schools had adopted his recommendations – but it’s an outcome Hattie doesn’t appear to address.

Siegfreid Engelmann and Wesley C Becker developed a system called Direct Instruction System for Teaching Arithmetic and Reading (DISTAR) that was shown to be effective in Project Follow-Through – a evaluation of a number of educational approaches in the US education system over a 30 year period starting in the 1960s. There’s little doubt that Direct Instruction is more effective than many other systems at raising academic achievement and self-esteem. The problem is, again, who decides what students learn, what happens to students who don’t benefit as much as others, and what’s meant by ‘achievement’.

ED Hirsch developed the Core Knowledge sequence – essentially an off-the-shelf curriculum that’s been adapted for the UK and is available from Civitas. The US Core Knowledge sequence has a pretty obvious underlying rationale even if some might question its stance on some points. The same can’t be said of the UK version. Compare, for example, the content of US Grade 1 History and Geography with that of the UK version for Year 1. The US version includes Early People and Civilisations and the History of World Religion – all important for understanding how human geography and cultures have developed over time. The UK version focuses on British Pre-history and History (with an emphasis on the importance of literacy) followed by Kings and Queens, Prime ministers then Symbols and figures – namely the Union Jack, Buckingham Palace, 10 Downing Street and the Houses of Parliament – despite the fact that few children in Y1 are likely to understand how or why these people or symbols came to be important. Although the strands of world history and British history are broadly chronological, Y4s study Ancient Rome alongside the Stuarts, and Y6s the American Civil War potentially before the Industrial Revolution.

Daniel Willingham is a cognitive psychologist and the author of Why don’t students like school? A cognitive scientist answers questions about how the mind works and what it means for the classroom and When can you trust the experts? How to tell good science from bad in education. He also writes for a column in American Educator magazine. I found Willingham informative on cognitive psychology. However, I felt his view of education was a rather narrow one. There’s nothing wrong with applying cognitive psychology to how teachers teach the curriculum in schools – it’s just that learning and education involve considerably more than that.

Kirschner, Sweller and Clark have written several papers about the limitations of working memory and its implications for education. In my view, their analysis has three key weaknesses; they arbitrarily lump together a range of education methods as if they were essentially the same, they base their theory on an outdated and incomplete model of memory, and they conclude that only one teaching approach is effective – explicit, direct instruction – ignoring the fact that knowledge comes in different forms.

Conclusions

I agree with some of the points made by the reformers:
• I agree with the idea of evidence-based education – the more evidence the better, in my view.
• I have no problem with children being taught knowledge. I don’t subscribe to a constructivist view of education – in the sense that we each develop a unique understanding of the world and everybody’s worldview is as valid as everybody else’s – although cognitive science has shown that everybody’s construction of knowledge is unique. We know that some knowledge is more valid and/or more reliable than other knowledge and we’ve developed some quite sophisticated ways of figuring out what’s more certain and what’s less certain.
• The application of findings from cognitive science to education is long overdue.
• I have no problem with direct instruction (as distinct from Direct Instruction) per se.

However, some of what I read gave me cause for concern:
• The evidence-base presented by the reformers is limited and parts of it are weak and flawed. It’s vital to evaluate evidence, not just to cite evidence that at face-value appears to support what you already think. And a body of evidence isn’t a unitary thing; some parts of it can be sound whilst other parts are distinctly dodgy. It’s important to be able to sift through it and weigh up the pros and cons. Ignoring contradictory evidence can be catastrophic.
• Knowledge, likewise, isn’t a unitary thing; it can vary in terms of validity and reliability.
• The evidence from cognitive science also needs to be evaluated. It isn’t OK to assume that just because cognitive scientists say something it must be right; cognitive scientists certainly don’t do that. Being able to evaluate cognitive science might entail learning a fair bit about cognitive science first.
• Direct instruction, like any other educational method, is appropriate for acquiring some types of knowledge. It isn’t appropriate for acquiring all types of knowledge. The problem with approaches such as discovery learning and child-led learning is not that there’s anything inherently wrong with the approaches themselves, but that they’re not suitable for acquiring all types of knowledge.

What has struck me most forcibly about my exploration of the evidence cited by the education reformers is that, although I agree with some of the reformers’ reservations about what’s been termed ‘minimal instruction’ approaches to education, the reformers appear to be ignoring their own advice. They don’t have extensive knowledge of the relevant subject areas, they don’t evaluate the relevant evidence, and the direct instruction framework they are advocating – certainly the one Civitas is advocating – doesn’t appear to have a structure derived from the relevant knowledge domains.

Rather than a rational, evidence-based approach to education, the ‘educational reform’ movement has all the hallmarks of a belief system that’s using evidence selectively to support its cause; and that’s what worries me. This new Blob is beginning to look suspiciously like the old one.

Kirschner, Sweller & Clark: a summary of my critique

It’s important not just to know things, but to understand them, which is why I took three posts to explain my unease about the paper by Kirschner, Sweller & Clark. From the responses I’ve received I appear to have overstated my explanation but understated my key points, so for the benefit of anybody unable or unwilling to read all the words, here’s a summary.

1. I have not said that Kirschner, Sweller & Clark are wrong to claim that working memory has a limited capacity. I’ve never come across any evidence that says otherwise. My concerns are about other things.

2. The complex issue of approaches to learning and teaching is presented as a two-sided argument. Presenting complex issues in an oversimplified way invariably obscures rather than clarifies the debate.

3. The authors appeal to a model of working memory that’s almost half a century old, rather than one revised six years before their paper came out and widely accepted as more accurate. Why would they do that?

4. They give the distinct impression that long-term memory isn’t subject to working memory constraints, when it is very much subject to them.

5. They completely omit any mention of the biological mechanisms involved in processing information. Understanding the mechanisms is key if you want to understand how people learn.

6. They conclude that explicit, direct instruction is the only viable teaching approach based on the existence of a single constraining factor – the capacity of working memory to process yet-to-be learned information (though exactly what they mean by yet-to-be learned isn’t explained). In a process as complex as learning, it’s unlikely that there will be only one constraining factor.

Kirschner, Sweller & Clark appear to have based their conclusion on a model of memory that was current in the 1970s (I know because that’s when I first learned about it), to have ignored subsequent research, and to have oversimplified the picture at every available opportunity.

What also concerns me is that some teachers appear to be taking what Kirschner, Sweller & Clark say at face value, without making any attempt to check the accuracy of their model, to question their presentation of the problem or the validity of their conclusion. There’s been much discussion recently about ‘neuromyths’. Not much point replacing one set of neuromyths with another.

Reference
Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

cognitive load and learning

In the previous two posts I discussed the model of working memory used by Kirschner, Sweller & Clark and how working memory and long-term memory function. The authors emphasise that their rejection of minimal guidance approaches to teaching is based on the limited capacity of working memory in respect of novel information, and that even if experts might not need much guidance “…nearly everyone else thrives when provided with full, explicit instructional guidance (and should not be asked to discover any essential content or skills)” (Clark, Kirschner & Sweller, p.6) Whether they are right or not depends on what they mean by ‘novel’ information.

So what’s new?

Kirschner, Sweller & Clark define novel information as ‘new, yet to be learned’ information that has not been stored in long-term memory (p.77). But novelty isn’t a simple case of information either being yet–to-be-learned or stored-in-long-term memory. If I see a Russian sentence written in Cyrillic script, its novelty value to me on a scale of 1-10 would be about 9. I can recognise some Cyrillic letters and know a few Russian words, but my working memory would be overloaded after about the third letter because of the multiple operations involved in decoding, blending and translating. A random string of Arabic numerals would have a novelty value of about 4, however, because I am very familiar with Arabic numerals; the only novelty would be in their order in the string. The sentence ‘the cat sat on the mat’ would have a novelty value close to zero because I’m an expert at chunking the letter patterns in English and I’ve encountered that sentence so many times.

Because novelty isn’t an either/or thing but sits on a sliding scale, and because the information coming into working memory can vary between simple and complex, that means that ‘new, yet to be learned’ information can vary in both complexity and novelty.

You could map it on a 2×2 matrix like this;

novelty, complexity & cognitive load

novelty, complexity & cognitive load

A sentence such as ‘the monopsonistic equilibrium at M should now be contrasted with the equilibrium that would obtain under competitive conditions’ is complex (it contains many bits of information) but its novelty content would depend on the prior knowledge of the reader. It would score high on both the novelty and complexity scales of the average 5 year old. I don’t understand what the sentence means, but I do understand many of the words, so it would be mid-range in both novelty and complexity for me. An economist would probably give it a 3 for complexity but 0 for novelty. Trying to teach a 5 year-old what the sentence meant would completely overload their working memory. But it would be a manageable challenge for mine, and an economist would probably feel bored.

Kirschner, Sweller & Clark reject ‘constructivist, discovery, problem-based, experiential and inquiry-based approaches’ on the basis that they overload working memory and the excessive cognitive load means that learners don’t learn as efficiently as they would using explicit direct instruction. If only it were that simple.

‘Constructivist, discovery, problem-based, experiential and inquiry-based approaches’ weren’t adopted initially because teachers preferred them or because philosophers thought they were a good idea, but because by the end of the 19th century explicit, direct instruction – the only game in town for fledgling mass education systems – clearly wasn’t as effective as people had thought it would be. Alternative approaches were derived from three strategies that young children apply when learning ‘naturally’.

How young children learn

Human beings are mammals and young mammals learn by applying three key learning strategies which I’ll call ‘immersion’, trial-and-error and modelling (imitating the behaviour of other members of their species). By ‘strategy’, I mean an approach that they use, not that the baby mammals sit down and figure things out from first principles; all three strategies are outcomes of how mammals’ brains work.

Immersion

Most young children learn to walk, talk, feed and dress themselves and acquire a vast amount of information about their environment with very little explicit, direct instruction. And they acquire those skills pretty quickly and apparently effortlessly. The theory was that if you put school age children in a suitable environment, they would pick up other skills and knowledge equally effortlessly, without the boredom of rote-learning and the grief of repeated testing. Unfortunately, what advocates of discovery, problem-based, experiential and inquiry-based learning overlooked was the sheer amount of repetition involved in young children learning ‘naturally’.

Although babies’ learning is kick-started by some hard-wired processes such as reflexes, babies have to learn to do almost everything. They repeatedly rehearse their gross motor skills, fine motor skills and sensory processing. They practice babbling, crawling, toddling and making associations at every available opportunity. They observe things and detect patterns. A relatively simple skill like face-recognition, grasping an object or rolling over might only take a few attempts. More complex skills like using a spoon, crawling or walking take more. Very complex skills like using language require many thousands of rehearsals; it’s no coincidence that children’s speech and reading ability take several years to mature and their writing ability (an even more complex skill) doesn’t usually mature until adulthood.

The reason why children don’t learn to read, do maths or learn foreign languages as ‘effortlessly’ as they learn to walk or speak in their native tongue is largely because of the number of opportunities they have to rehearse those skills. An hour a day of reading or maths and a couple of French lessons a week bears no resemblance to the ‘immersion’ in motor development and their native language that children are exposed to. Inevitably, it will take them longer to acquire those skills. And if they take an unusually long time, it’s the child, the parent, the teacher or the method of that tends to be blamed, not the mechanism by which the skill is acquired.

Trial-and-error

The second strategy is trial-and-error. It plays a key role in the rehearsals involved in immersion, because it provides feedback to the brain about how the skill or knowledge is developing. Some skills, like walking, talking or handwriting, can only be acquired through trial-and-error because of the fine-grained motor feedback that’s required. Learning by trial-and-error can offer very vivid, never-forgotten experiences, regardless of whether the initial outcome is success or failure.

Modelling

The third strategy is modelling – imitating the behaviour of other members of the species (and sometimes other species or inanimate objects). In some cases, modelling is the most effective way of teaching because it’s difficult to explain (or understand) a series of actions in verbal terms.

Cognitive load

This brings us back to the issue of cognitive load. It isn’t the case that immersion, trial-and-error and modelling or discovery, problem-based, experiential and inquiry-based approaches always impose a high cognitive load, and that explicit direct instruction doesn’t. If that were true, young children would have to be actively taught to walk and talk and older ones would never forget anything. The problem with all these educational approaches is that they have all initially been seen as appropriate for teaching all knowledge and skills and have subsequently been rejected as ineffective. That’s not at all surprising, because different types of knowledge and skill require different strategies for effective learning.

Cognitive load is also affected by the complexity of incoming information and how novel it is to the learner. Nor is cognitive load confined to the capacity of working memory. 40 minutes of explicit, direct novel instruction, even if presented in well-paced working-memory-sized chunks, would pose a significant challenge to most brains. The reason, as I pointed out previously, is because the transfer of information from working memory to long-term memory is a biological process that takes time, resources and energy. Research into changes in the motor cortex suggests that the time involved might be as little as hours, but even that has implications for the pace at which students are expected to learn and how much new information they can process. There’s a reason why someone would find acquiring large amounts of new information tiring – their brain uses up a considerable amount of glucose getting that information embedded in the form of neural connections. The inevitable delay between information coming into the brain and being embedded in long-term memory suggests that down-time is as important as learning time – calling into question the assumption that the longer children spend actively ‘learning’ the more they will know.

Final thoughts

If I were forced to choose between constructivist, discovery, problem-based, experiential and inquiry-based approach to learning or explicit, direct instruction, I’d plump for explicit, direct instruction because the world we live in works according to discoverable principles and it makes sense to teach kids what those principles are, rather than to expect them to figure them out for themselves. However, it would have to be a forced choice, because we do learn through constructing our knowledge and through discovery, problem-solving, experiencing and inquiring as well as by explicit, direct instruction. The most appropriate learning strategy will depend on the knowledge or skill being learned.

The Kirschner, Sweller & Clark paper left me feeling perplexed and rather uneasy. I couldn’t understand why the authors frame the debate about educational approaches in terms of minimal guidance ‘on one side’ and direct instructional guidance ‘on the other’, when self-evidently the debate is more complex than that. Nor why they refer to Atkinson & Shiffrin’s model of working memory when Baddeley & Hitch’s more complex model is so widely accepted as more accurate. Nor why they omit any mention of the biological mechanisms involved in learning; not only are the biological mechanisms responsible for the way working memory and long-term memory operate, they also shed light on why any single educational approach doesn’t work for all knowledge, all skills – or even all students.

I felt it was ironic that the authors place so much emphasis on the way novices think but present a highly complex debate in binary terms – a classic feature of the way novices organise their knowledge. What was also ironic was that despite their emphasis on explicit, direct instruction, they failed to mention several important features of memory that would have helped a lay readership understand how memory works. This is all the more puzzling because some of these omissions (and a more nuanced model of instruction) are referred to in a paper on cognitive load by Paul Kirschner published four years earlier.

In order to fully understand what Kirschner, Sweller & Clark are saying, and to decide whether they were right or not, you’d need to have a fair amount of background knowledge about how brains work. To explain that clearly to a lay readership, and to address possible objections to their thesis, the authors would have had to extend the paper’s length by at least 50%. Their paper is just over 10 000 words long, suggesting that word-count issues might have resulted in them having to omit some points. That said, Educational Psychologist doesn’t currently apply a word limit, so maybe the authors were trying to keep the concepts as simple as possible.

Simplifying complex concepts for the benefit of a lay readership can certainly make things clearer, but over-simplifying them runs the risk of giving the wrong impression, and I think there’s a big risk of that happening here. Although the authors make it clear that explicit direct instruction can take many forms, they do appear to be proposing a one-size fits all approach that might not be appropriate for all knowledge, all skills or all students.

References
Clark, RE, Kirschner, PA & Sweller, J (2012). Putting students on the path to learning: The case for fully guided instruction, American Educator, Spring.
Kirschner, PA (2002). Cognitive load theory: implications of cognitive load theory on the design of learning, Learning and Instruction, 12 1–10.
Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

how working memory works

In my previous post I wondered why Kirschner, Sweller & Clark based their objections to minimal guidance in education on Atkinson & Schiffrin’s 1968 model of memory; it’s a model that assumes a mechanism for memory that’s now considerably out of date. A key factor in Kirschner, Sweller & Clark’s advocacy of direct instructional guidance is the limited capacity of working memory, and that’s what I want to look at in this post.

Other models are available

Atkinson & Shiffrin describe working memory as a ‘short-term store’. It has a limited capacity (around 4-9 bits of information) that it can retain for only a few seconds. It’s also a ‘buffer’; unless information in the short-term store is actively maintained, by rehearsal for example, it will be displaced by incoming information. Kirschner, Sweller & Clark note that ‘two well-known characteristics’ of working memory are its limited duration and capacity when ‘processing novel information’ (p.77), suggesting that their model of working memory is very similar to Atkinson & Shiffrin’s short-term store.

Slide1

In 1974 Alan Baddeley and Graham Hitch proposed a more sophisticated model for working memory that included dedicated auditory and visual information processing components. Their model has been revised in the light of more recent discoveries relating to the function of the prefrontal areas of the brain – the location of ‘working memory’. The Baddeley and Hitch model now looks a bit more complex than Atkinson & Shiffrin’s.

Baddeley & Hitch model

Baddeley & Hitch model

You could argue that it doesn’t matter how complex working memory is, or how the prefrontal areas of the brain work; neither alters the fact that the capacity of working memory is limited. Kirschner, Sweller & Clark question the effectiveness of educational methods involving minimal guidance because they increase cognitive load beyond the capacity of working memory. But Kirschner, Sweller & Clark’s model of working memory appears to be oversimplified and doesn’t take into account the biological mechanisms involved in learning.

Biological mechanisms involved in learning

Making connections

Learning is about associating one thing with another, and making associations is what the human brain does for a living. Associations are represented in the brain by connections formed between neurons; the ‘information’ is carried in the pattern of connections. A particular stimulus will trigger a series of electrical impulses through a particular network of connected neurons. So, if I spot my cat in the garden, that sight will trigger a series of electrical impulses that activates a particular network of neurons; the connections between the neurons represent all the information I’ve ever acquired about my cat. If I see my neighbour’s cat, much of the same neural pathway will be triggered because both cats are cats, it will then diverge slightly because I have acquired different information about each cat.

Novelty value

Neurons make connections with other neurons via synapses. Our current understanding of the role of synapses in information storage and retrieval suggests that new information triggers the formation of new synapses between neurons. If the same associations are encountered repeatedly, the relevant synapses are used repeatedly and those connections between neurons are strengthened, but if synapses aren’t active for a while, they are ‘pruned’. Toddlers form huge numbers of new synapses, but from the age of three through to adulthood, the number reduces dramatically as pruning takes place. It’s not clear whether synapse formation and pruning are pre-determined developmental phases or whether they happen in response to the kind of information that the brain is processing. Toddlers are exposed to vast amounts of novel information, but novelty rapidly tails off as they get older. Older adults tend to encounter very little novel information, often complaining that they’ve ‘seen it all before’.

The way working memory works

Most of the associations made by the brain occur in the cortex, the outer layer of the brain. Sensory information processed in specialised areas of cortex is ‘chunked’ into coherent wholes – what we call ‘perception’. Perceptual information is further chunked in the frontal areas of the brain to form an integrated picture of what’s going on around and within us. The picture that’s emerging from studies of prefrontal cortex is that this area receives, attends to, evaluates and responds to information from many other areas of the brain. It can do this because patterns of the electrical activity from other brain areas are maintained in prefrontal areas for a short time whilst evaluation takes place. As Antonio Damasio points out in Descartes’ Error, the evaluation isn’t always an active, or even a conscious process; there’s no little homunculus sitting at the front of the brain figuring out what information should take priority. What does happen is that streams of incoming information compete for attention. What gets attention depends on what information is coming in at any one time. If something happens that makes you angry during a maths lesson, you’re more likely to pay attention to that than to solving equations. During an exam, you might be concentrating so hard that you are unaware of anything happening around you.

The information coming into prefrontal cortex varies considerably. There’s a constant inflow from three main sources, of:

• real-time information from the environment via the sense organs;
• information about the physiological state of the body, including emotional responses to incoming information;
• information from the neural pathways formed by previous experience and activated by that sensory and physiological input (Kirschner, Sweller & Clark would call this long-term memory).

Working memory and long-term memory

‘Information’ and models of information processing are abstract concepts. You can’t pick them up or weigh them, so it’s tempting to think of information processing in the brain as an abstract process, involving rather abstract forces like electrical impulses. It would be easy to form the impression from Kirschner, Sweller & Clark’s model that well-paced, bite-sized chunks of novel information will flow smoothly from working memory to long-term memory, like water between two tanks. But the human brain is a biological organ, and it retains and accesses information using some very biological processes. Developing new synapses involves physical changes to the structure of neurons, and those changes take time, resources and energy. I’ll return to that point later, but first I want to focus on something that Kirschner, Sweller & Clark say about the relationship between working memory and long-term memory that struck me as a bit odd;

The limitations of working memory only apply to new, yet to be learned information that has not been stored in long-term memory. New information such as new combinations of numbers or letters can only be stored for brief periods with severe limitations on the amount of such information that can be dealt with. In contrast, when dealing with previously learned information stored in long-term memory, these limitations disappear.” (p77)

This statement is odd because it doesn’t tally with Atkinson & Shiffrin’s concept of the short-term store, and isn’t supported by decades of experimental work that show that capacity limitations apply to all information in working memory, regardless of its source. But Kirschner, Sweller & Clark go on to qualify their claim;

In the sense that information can be brought back from long-term memory to working memory over indefinite periods of time, the temporal limits of working memory become irrelevant.” (p77).

I think I can see what they’re getting at; because information is stored permanently in long-term memory it doesn’t rapidly fade away and you can access it any time you need to. But you have to access it via working memory, so it’s still subject to working memory constraints. I think the authors are referring implicitly to two ways in which the brain organizes information and which increase the capacity of working memory – chunking and schemata.

Chunking

If the brain frequently encounters small items of information that are usually associated with each other, it eventually ‘chunks’ them together and then processes them automatically as single units. George Miller, who in the 1950s did some pioneering research into working memory capacity, noted that people familiar with the binary notation then in widespread use by computer programmers, didn’t memorise random lists of 1s and 0s as random lists, but as numbers in the decimal system. So 10 would be remembered as 2, 100 as 8, 101 as 9 and so on. In this way, very long strings of 1s and 0s could be held in working memory in the form of decimal numbers that would automatically be translated back into 1s and 0s when the people taking part in the experiments were asked to recall the list. Morse code experts do the same; they don’t read messages as a series of dots and dashes, but chunk up the patterns of dots and dashes into letters and then into words. Exactly the same process occurs in reading, but we don’t call it chunking, we call it learning to read. Chunking effectively increases the capacity of working memory – but it doesn’t increase it by very much. Curiously, although Kirschner, Sweller & Clark refer to a paper by Egan and Schwartz that’s explicitly about chunking, they don’t mention chunking as such.

Schemata

What they do mention is the concept of the schema, particularly those of chess players. In the 1940s Adriaan de Groot discovered that expert chess players memorise a vast number of configurations of chess pieces on a board; he called each particular configuration a schema. I get the impression that Kirschner, Sweller & Clark see schemata and chunking as synonymous, even though a schema usually refers to a meta-level way of organising information, like a life-script or an overview, rather than an automatic processing of several bits of information as one unit. It’s quite possible that expert chess players do automatically read each configuration of chess pieces as one unit, but de Groot didn’t call it ‘chunking’ because his research was carried out a decade before George Miller coined the term.

Thinking about everything at once

Whether you call them chunks or schemata, what’s clear is that the brain has ways of increasing the amount of information held in working memory. Expert chess players aren’t limited to thinking about the four or five possible moves for one piece, but can think about four or five possible configurations for all pieces. But it doesn’t follow that the limitations of working memory in relation to long-term memory disappear as a result.

I mentioned in my previous post what information is made accessible via my neural networks if I see an apple. If I free-associate, I think of apples – apple trees – should we cover our apple trees if it’s wet and windy after they blossom? – will there be any bees to pollinate them? – bee viruses – viruses in ancient bodies found in melted permafrost – bodies of climbers found in melted glaciers, and so on. Because my neural connections represent multiple associations I can indeed access vast amounts of information stored in my brain. But I don’t access it all simultaneously. That’s just as well, because if I could access all that information at once my attempts to decide what to do with our remaining windfall apples would be thwarted by totally irrelevant thoughts about mountain rescue teams and St Bernard dogs. In short, if information stored in long-term memory weren’t subject to the capacity constraints of working memory, we’d never get anything done.

Chess masters (or ornithologists or brain surgeons) have access to vast amounts of information, but in any given situation they don’t need to access it all at once. In fact, accessing it all at once would be disastrous because it would take forever to eliminate information they didn’t need. At any point in any chess game, only a few configurations of pieces are possible, and that number is unlikely to exceed the capacity of working memory. Similarly, even if an ornithologist/brain surgeon can recognise thousands of species of birds/types of brain injury, in any given environment, most of those species/injuries are likely to be irrelevant, so don’t even need to be considered. There’s a good reason for working memory’s limited capacity and why all the information we process is subject to that limit.

In the next post, I want to look at how the limits of working memory impact on learning.

References

Atkinson, R, & Shiffrin, R (1968). Human memory: A proposed system and its control processes. In K. Spence & J. Spence (Eds.), The psychology of learning and motivation (Vol. 2, pp. 89–195). New York: Academic Press
Damasio, A (1994). Descartes’ Error, Vintage Books.
Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.