learning styles: how does Daniel Willingham see them?

In 2005, Daniel Willingham used his “Ask the cognitive scientist” column in American Educator to answer the question “What does cognitive science tell us about the existence of visual, auditory, and kinesthetic learners and the best way to teach them?

The question refers to the learning styles model used in many schools which assumes that children learn best using their preferred sensory modality – visual, auditory or kinaesthetic. Fleming’s VARK model, and the more common VAK variant, frame learning styles in terms of preferences for learning in a particular sensory modality. Other learning styles models are framed in terms of individuals having other stable traits in respect of the way they learn. Willingham frames the VAK model in terms of abilities.

He summarises the relevant cognitive science research like this; “children do differ in their abilities with different modalities, but teaching the child in his best modality doesn’t affect his educational achievement” and goes on to discuss what cognitive science has to say about sensory modalities and memory. Willingham’s response is informative about the relevant research, but I think it could be misleading. For two reasons; he doesn’t differentiate between groups and individuals, and doesn’t adequately explain the role of sensory modalities in memory.

groups and individuals

In the previous post I mentioned the challenge to researchers posed by differences at the population, group and individual levels. Willingham’s summary of the research begins at the population level “children do differ in their abilities with different modalities” but then shifts to the individual level “but teaching the child in his best modality doesn’t affect his educational achievement” [my emphasis].

Even if Willingham’s choice of words is merely a matter of style, it inadvertently conflates findings at the group and individual levels. Group averages tell you what you need to know if you’re interested in broad pedagogical approaches or educational policy; in the case of learning styles, there’s no robust evidence warranting their use as a general approach in teaching. It doesn’t follow that individual children don’t have a ‘best’ (or more likely ‘worst’) modality, nor that they can’t benefit from learning in a particular modality. For example, Picture Exchange Communication System (PECS) and sign languages are the only way some children can communicate effectively and ‘talking books’ gives others access to literature that would otherwise be out of their reach. On his learning styles FAQ page, Willingham claims this is a matter of ‘ability’ rather than ‘style’; but ability is likely to have an impact on preference.

memory and modality

Willingham goes on to explain “a few things that cognitive scientists know about modalities”. His first claim is that “memory is usually stored independent of any modality” [Willingham’s emphasis]. “You typically store memories in terms of meaning — not in terms of whether you saw, heard, or physically interacted with the information”.

He supports this assertion with a finding from research into episodic memory – that whilst people are good at remembering the gist of a story, they tend to be hazy when it comes to specific details. His claim appears to be further supported by research into witness testimony. People might accurately remember a car crashing into a lamppost, but misremember the colour of the car; they correctly recall the driver behaving in an aggressive manner, but are wrong about the words she uttered.

Willingham then extends the role of meaning to the facet of memory that deals with facts and knowledge – semantic memory. He says “the vast majority of educational content is stored in terms of meaning and does not rely on visual, auditory, or kinesthetic memory” and “teachers almost always want students to remember what things mean, not what they look like or sound like”. He uses the example ‘a fire requires oxygen to burn’ and says “the initial experience by which you learned this fact may have been visual (watching a flame go out under a glass) or auditory (hearing an explanation), but the resulting representation of that knowledge in your mind is neither visual nor auditory.” Certainly the idea of a fire requiring oxygen to burn might be neither visual nor auditory, but how many students will not visualise flames being extinguished under a glass when they recall this fact?

substitute modalities

Willingham’s second assertion about memory and sensory modalities is that “the different visual, auditory, and meaning-based representations in our minds cannot serve as substitutes for one another”. He cites a set of experiments reported by Dodson and Shimamura (2000). In the experiments a list of words was read to participants by either a man or a woman. Participants then listened to a second list and were asked to judge which of the words had been in the first list. They were also asked whether a man or woman had spoken the word the first time round. People were five times better at remembering who spoke an item if a test word was read by the same voice than if it was read by the alternative voice. But mismatching the voices didn’t make a difference to the number of words that were recognised.

Dodson and Shimamura see the study as demonstrating that memory is highly susceptible to sensory cues. But Willingham’s conclusion is different; “this experiment indicates that subjects do store auditory information, but it only helps them remember the part of the memory that is auditory — the sound of the voice — and not the word itself, which is stored in terms of its meaning.” This is a rather odd conclusion, given that almost all the words in the experiments were spoken, so auditory memory must have been involved in recognising the words as well as identifying the gender of the speaker. I couldn’t see how the study supported Willingham’s assertion about substitute modalities. And substitute modalities are widely used and used very effectively; writing, sign language and lip-reading are all visual/kinaesthetic substitutes for speech in the auditory modality.

little difference in the classroom

Willingham’s third assertion is “children probably do differ in how good their visual and auditory memories are, but in most situations, it makes little difference in the classroom”. That’s a fair conclusion given the findings of reviews of learning styles studies. He also points out that studies of mental imagery suggest that paying attention to the modality best suited to the content of what’s being taught, rather than the student’s ‘best’ modality, is more likely to help students understand and remember.

the meaning of meaning

Meaning is one of those rather fuzzy words that people use in different ways. It’s widely used to denote the relationship between a symbol and the entity the symbol represents. You could justify talking about memory in terms of meaning in the sense that memory consists of our representations of entities rather than the entities themselves, but I don’t think that’s what Willingham is getting at. I think when he uses the term meaning he’s referring to schemas.

The sequence of a series of events, the gist of a story and the connections between groups of facts are all schemas. There’s no doubt that in the case of complex memories, most people focus on the schema rather than the detail. And teachers do want students to remember the deep structure schemas linking facts rather than just the surface level details. But our memories of chains of events, the plots of stories and factual information are quite clearly not “independent of any modality”. Witnesses who saw a car careering down a road at high speed, collide with a lamppost and the driver emerge swearing at shocked onlookers, might focus on the meaning of that series of events, but they must have some sensory representation of the car and the driver’s voice in order to recall those meaningful events. And how could we recall the narrative of Hansel and Gretel without a sensory representation of two children in a forest, or think about a fire ceasing to burn in the absence of oxygen without a sensory representation of flames and then no flames?

I found it difficult to get a clear picture of Willingham’s conceptual model of memory. When he says “the mind is capable of storing memories in a number of different formats”, and “some memories are stored visually, some auditorily, and some in terms of meaning“, one could easily get the impression that memory is neatly compartmentalised, with ‘meaning’ as one of the compartments. That impression wouldn’t be accurate.

mechanisms of memory

In the brain, sensory information (our only source of information about the outside world) is carried in networks of neurons – brain cells. The pattern of activation in the neural networks forms the representations of both real-time sensory input and of what we remember. It’s like the way an almost infinite number of images can be displayed on a computer screen using a limited number of pixels. It’s true that sensory information is initially processed in areas of the brain dedicated to specific sensory modalities. But those streams of information begin to be integrated quite near the beginning of their journey through the brain, and are rapidly brought together to form a bigger picture of what’s happening that can be compared to representations we’ve formed previously – what we call memory.

The underlying biological mechanism appears to be essentially the same for all sensory modalities and for all types of memory – whether they are of stories, sequences of events, facts about fire, or, to cite Willingham’s examples, of Christmas trees, peas, or Clinton’s or Bush’s voice. ‘Meaning’ as far as the brain is concerned, is about associations – which neurons are activating which other neurons and therefore which representations are being activated. Whether we remember the gist of a story, a fact about fire, or what a Christmas tree or frozen pea looks like, we’re activating patterns of neurons that represent information associated with those events, facts or objects.

Real life experiences usually involve incoming information in multiple sensory modalities. We very rarely encounter the world via only one sensory domain and never in terms of ‘meaning’ only – how would we construct that meaning without our senses being involved? Having several sensory channels increases the amount of information we get from the outside world, and increases the likelihood of our accessing memories. A whiff of perfume or a fragment of music can remind us vividly of a particular event or can trigger a chain of factual associations. Teachers are indeed focused on the ‘meaning’ of what they teach, but meaning isn’t divorced from sensory modalities. Indeed, what things look like is vitally important in biology, chemistry and art. And what they sound like is crucial for drama, poetry or modern foreign languages.

In his American Educator piece, Willingham agrees that “children do differ in their abilities with different modalities“.  But by 2008 he was claiming in a video presentation that Learning Styles Don’t Exist. The video made a big impression on teacher Tom Bennett. He says it “explains the problems with the theory so clearly that even dopey old me can get my head around it”.

Tom’s view of learning styles is the subject of the next post.

References
Dodson, C.S. and Shimamura, A.P. (2000). Differential effects of cue dependency on item and source memory. Journal of Experimental Psychology: Learning, Memory and Cognition, 26, 1023-1044.

Willingham, D (2005). Ask the cognitive scientist: Do visual, auditory, and kinesthetic learners need visual, auditory, and kinesthetic instruction? American Educator, Summer.

Advertisements

there’s more to working memory than meets the eye

I’ve had several conversations on Twitter with Peter Blenkinsop about learning and the brain. At the ResearchEd conference on Saturday, we continued the conversation and discovered that much of our disagreement was because we were using different definitions of learning. Peter’s definition is that learning involves being able to actively recall information; mine is that it involves changes to the brain in response to information.

working memory

Memory is obviously essential to learning. One thing that’s emerged clearly from years of research into how memory works is that the brain retains information for a very short time in what’s known as working memory, and indefinitely in what’s called long-term memory – but that’s not all there is to it. I felt that advocates of direct instruction at the conference were relying on a model of working memory that was oversimplified and could be misleading. The diagram they were using looked like this;

simple model of memory

simple model of memory

This model is attributed to Daniel Willingham. From what the teachers were saying, the diagram is simpler than most current representations of working memory because its purpose is to illustrate three key points;

• the capacity of working memory is limited and it holds information for a short time
• information in long-term memory is available for recall indefinitely and
• information can be transferred from working memory to long-term memory and vice versa.

So far, so good.

My reservation about the diagram is that if it’s the only diagram of working memory you’ve ever seen, you might get the impression that it shows the path information follows when it’s processed by the brain. From it you might conclude that;

• information from the environment goes directly into working memory
• if you pay attention to that information, it will be stored permanently in long-term memory
• if you don’t pay attention to it it will be lost forever, and
• there’s a very low limit to how much information from the environment you can handle at any one time.

But that’s not quite what happens to information coming into the brain. As Peter pointed out during our conversation, simplifying things appropriately is challenging; you want to simplify enough to avoid confusing people, but not so much that they might misunderstand.

In this post, I’m going to try to explain the slightly bigger picture of how brains process information, and where working memory and long-term memory fit in.

sensory information from the external environment

All information from the external environment comes into the brain via the sense organs. The incoming sensory information is on a relatively large scale, particularly if it’s visual or auditory information; you can see an entire classroom at once and hear simultaneously all the noises emanating from it. But individual cells within the retina or the cochlea respond to tiny fragments of that large-scale information; lines at different angles, areas of light and dark and colour, minute changes in air pressure. Information from the fragments is transmitted via tiny electrical impulses, from the sense organs to the brain. The brain then chunks the fragments together to build larger-scale representations that closely match the information coming in from the environment. As a result, what we perceive is a fairly accurate representation of what’s actually out there. I say ‘fairly accurate’ because perception isn’t 100% accurate, but that’s another story.

chunking

The chunking of sensory information takes place via networks of interconnected neurons (long spindly brain cells). The brain forms physical connections (synapses) between neighbouring neurons in response to novel information. The connections allow electrical activation to pass from one neuron to another. The connections work on a use-it-or-lose-it principle; the more they are used the stronger they get, and if they’re not used much they weaken and disappear. Not surprisingly, toddlers have vast numbers of connections, but that number diminishes considerably during childhood and adolescence. That doesn’t mean we have to keep remembering everything we ever learned or we’ll forget it, it’s a way of ensuring that the brain can process efficiently the types of information from the environment that it’s most likely to encounter.

working memory

Broadly speaking, incoming sensory information is processed in the brain from the back towards the front. It’s fed forward into areas that Alan Baddeley has called variously a ‘loop’, ‘sketchpad’ and ‘buffer’. Whatever you call them, they are areas where very limited amounts of information can be held for very short periods while we decide what to do with it. Research evidence suggests there are different loops/sketchpads/buffers for different types of sensory information – for example Baddeley’s most recent model of working memory includes temporary stores for auditory, visuospatial and episodic information.

Baddeley's working memory model

Baddeley’s working memory model

The incoming information held briefly in the loops/sketchpads/buffers is fed forward again to frontal areas of the brain where it’s constantly monitored by what’s called the central executive – an area that deals with attention and decision-making. The central executive and the loops/sketchpads/buffers together make up working memory.

long-term memory

The information coming into working memory activates the more permanent neural networks that carry information relevant to it – what’s called long-term memory. The neural networks that make up long-term memory are distributed throughout the brain. Several different types of long-term memory have been identified but the evidence points increasingly to the differences being due to where neural networks are located, not to differences in the biological mechanisms involved.

Information in the brain is carried in the pattern of connections between neurons. The principle is similar to the way pixels represent information on a computer screen; that information is carried in the patterns of pixels that are activated. This makes computer screens – and brains – very versatile; they can carry a huge range of different types of information in a relatively small space. One important difference between the two processes is that pixels operate independently, whereas brain cells form physical connections if they are often activated at the same time. The connections allow fast, efficient processing of information that’s encountered frequently.

For example, say I’m looking out of my window at a pigeon. The image of the pigeon falling on my retina will activate the neural networks in my brain that carry information about pigeons; what they look like, sound like, feel like, their flight patterns and feeding habits. My thoughts might then wander off on to related issues; other birds in my garden, when to prune the cherry tree, my neighbour repairing her fence. If I glance away from the pigeon and look at my blank computer screen, other neural networks will be activated, those that carry information about computers, technology, screens and rectangles in general. I will no longer be thinking about pigeons, but my pigeon networks will still be active enough for me to recall that I was looking at a pigeon previously and I might glance out of the window to see if it is still there.

Every time my long-term neural networks are activated by incoming sensory information, they are updated. If the same information comes in repeatedly the connections within the network are strengthened. What’s not clear is how much attention needs to be paid to incoming information in order for it to update long-term memory. Large amounts of information about the changing environment are flowing through working memory all the time, and evidence from brain-damaged patients suggests that long-term memory can be changed even if we’re not paying attention to the information that activates it.

the central executive

Information from incoming sensory information and from long-term memory is fed forward to the central executive. The function of the central executive is a bit like the function of a CCTV control room. According to Antonio Damasio it monitors, evaluates and responds to information from three main sources;

• the external environment (sensory information)
• the internal environment (body states) and
• previous representations of the external and internal environments (carried in the pattern of connections in neural networks).

One difference is that loops/sketchpads/buffers and the system that monitors them consist of networks of interconnected neurons, not TV screens (obviously). Another is that there isn’t anybody watching the brain’s equivalent of the CCTV screens – it’s an automated process. We become aware of information in the loops/sketchpads/buffers only if we need to be aware of it – so we are usually conscious of what’s happening in the external environment or if there are significant changes internally or externally.

The central executive constantly compares the streams of incoming information. It responds to it via networks of neurons that feed back information to other areas of the brain. If the environment has changed significantly, or an interesting or threatening event occurs, or we catch sight of something moving on the periphery of our field of vision, or experience sudden discomfort or pain, the feedback from the central executive ensures that we pay attention to that, rather than anything else. It’s important to note that information from the body includes information about our overall physiological state, including emotions.

So a schematic general diagram of how working memory fits in with information processing in the brain would look something like this:

Slide1

It’s important to note that we still don’t have a clear map of the information processing pathways. Researchers keep coming across different potential loops/sketchpads/buffers and there’s evidence that the feedback and feed-forward pathways are more complex than this diagram shows.

I began this post by suggesting that an over-simplified model of working memory could be misleading. I’ll explain my reasons in more detail in the next post, but first I want to highlight an important implication of the way incoming sensory information is handled by the brain.

pre-conscious processing

A great deal of sensory information is processed by the brain pre-consciously. Advocates of direct instruction emphasise the importance of chunking information because it increases the capacity of working memory. A popular example is the way expert chess players can hold simultaneously in working memory several different configurations of chess pieces, chunking being seen as something ‘experts’ do. But it’s important to remember that the brain chunks information automatically if we’re exposed to it frequently enough. That’s how we recognise faces, places and things – most three year-olds are ‘experts’ in their day-to-day surroundings because they have had thousands of exposures to familiar faces, places and things. They don’t have to sit down and study these things in order to chunk the fragments of information that make up faces, places and things – their visual cortex does it automatically.

This means that a large amount of information going through young children’s working memory is already chunked. We don’t know to what extent the central executive has to actively pay attention to that information in order for it to change long-term memory, but pre-conscious chunking does suggest that a good deal of learning happens implicitly. I’ll comment on this in more detail in my next post.

Daisy debunks myths: or does she?

At the beginning of this month, Daisy Christodolou, star performer on University Challenge, CEO of The Curriculum Centre and a governor of the forthcoming Michaela Community School, published a book entitled Seven Myths about Education. Daisy has summarised the myths on her blog, The Wing to Heaven. There are few things I like better than seeing a myth debunked, but I didn’t rush to buy Daisy’s book. In fact I haven’t read it yet. Here’s why.

Debunking educational ‘myths’ is currently in vogue. But some of the debunkers have replaced the existing myths with new myths of their own; kind of second-order myths. The first myth is at least partly wrong, but the alternative proposed isn’t completely right either, which really doesn’t help. I’ve pointed this out previously in relation to ‘neuromyths’. One of the difficulties involved in debunking educational myths is that they are often not totally wrong, but in order to tease out what’s wrong and what’s right, you need to go into considerable detail, and busy teachers are unlikely to have the time or background knowledge to judge whether or not the criticism is valid.

Human beings have accumulated a vast body of knowledge about ourselves and the world we inhabit, which suggests strongly that the world operates according to knowable principles. It’s obviously necessary to be familiar with the structure and content of any particular knowledge domain in order to have a good understanding of it. And I agree with some of Daisy’s criticisms of current approaches to learning. So why do I feel so uneasy about what she’s proposing to put in its place?

Daisy’s claims

Daisy says she makes two claims in her book and presents evidence to support them. The claims and the evidence are:

Claim one: “that in English education, a certain set of ideas about education are predominant…” Daisy points out that it’s difficult to prove or disprove the first claim, but cites a number of sources to support it.

Claim two: “that these ideas are misguided”. Daisy says “Finding the evidence to prove the second point was relatively straightforward” and lists a number of references relating to working and long-term memory.

Daisy’s reasoning

The responses to claim one suggest that Daisy is probably right that ‘certain ideas’ are predominant in English education.

She is also broadly right when she says “it is scientifically well-established that working memory is limited and that long-term memory plays a significant role in the human intellect” – although she doesn’t define what she means by ‘intellect’.

She then says “this has clear implications for classroom practice, implications which others have made and which I was happy to recap.”

Her reasoning appears to follow that of Kirschner, Sweller & Clark, who lump together ‘constructivist, discovery, problem-based, experiential, and inquiry-based teaching’ under the heading ‘minimal instruction’ and treat them all as one. The authors then make the assumption that because some aspects of ‘minimal instruction’ might impose a high cognitive load on students, it should be discarded in favour of ‘direct instruction’ that takes into account the limitations of working memory.

This is the point at which I parted company with Daisy (and Kirschner, Sweller & Clark). Lumping together a set of complex and often loosely defined ideas and approaches to learning is hardly helpful, since it’s possible that some of their components might overload working memory, but others might not. I can see how what we know about working and long-term memory demonstrates that some aspects of the predominant ‘certain set of ideas’ might be ‘misguided’, but not how it demonstrates that they are misguided en masse.

The nature of the evidence

I also had reservations about the evidence Daisy cites in support of claim two.

First on the list is Dan Willingham’s book Why Don’t Students Like School? Willingham is a cognitive psychologist interested in applying scientific findings to education. I haven’t read his book either*, but I’ve yet to come across anything else he’s written that has appeared flawed. Why Don’t Students Like School? appears to be a reliable, accessible book written for a wide readership. So far, so good.

Next, Daisy cites Kirschner, Sweller and Clark’s paper “Why minimal guidance during instruction does not work: an analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching”. This paper is obviously harder going than Willingham’s book, but is published in Educational Psychologist, so would be accessible to many teachers. I have several concerns about this paper and have gone through its arguments in detail.

My main reservations are;
• the simplistic way in which the pedagogical debate is presented,
• what’s left out of the discussion
• why a model of memory that’s half a century out of date is referred to.

That last point could apply to the next three items on Daisy’s list; two papers by Herb Simon, a Nobel prizewinner whose ideas have been highly influential in information theory, and one by John Anderson on his Adaptive Character of Thought model. Simon’s papers were published in 1973 and 1980 respectively, and Anderson’s in 1996 although his model dates from the 1970s.

Another feature of these papers is that they’re not easy reading – if you can actually get access to them, that is. Daisy’s links were to more links and I couldn’t get the Simon papers to open. And although Anderson’s paper is entitled ‘A simple theory of complex cognition’, what he means by that is that an apparently complex cognitive process can be explained by a simple information processing heuristic, not that his theory is easy to understand. He and Simon both write lucidly, but their material isn’t straightforward.

I completely agree with Daisy that the fundamentals of a knowledge domain don’t date – as she points out elsewhere, Pythagoras and Euripides have both stood the test of time. There’s no question that Simon’s and Anderson’s papers are key ones – for information scientists at least – and that the principles they set out have stood the test of time too. But quite why she should cite them and not more accessible material that takes into account several further decades of research into brain function, is puzzling.

It could be that there simply aren’t any publications that deal specifically with recent findings about memory and apply them to pedagogy. But even if there aren’t, it’s unlikely that most teachers would find Simon and Anderson the most accessible alternatives; for example Rita Carter’s Mapping the Mind is a beautifully illustrated, very informative description of how the brain works. (It’s worth forking out for the University of California Press edition because of the quality of the illustrations). Stanislas Dehaene’s Reading in the Brain is about reading, but is more recent and explains in more detail how the brain chunks, stores and accesses information.

It looks to me as if someone has given Daisy some key early references about working memory and she’s dutifully cited them, rather than ensuring that she has a thorough grasp of the knowledge domain of which they are part. If that’s true, it’s ironic, because having a thorough grasp of a knowledge domain is something Daisy advocates.

So Daisy’s logic is a bit flaky and her evidence base is a bit out of date. So what? The reason Daisy’s logic and evidence base are important because they form the foundation for an alternative curriculum being used by a chain of academies and a high-profile free school.

Implications for curriculum design

Daisy’s name doesn’t appear in the ‘who we are’ or ‘our advisors’ sections of The Curriculum Centre’s (supporting Future Academies) website, although their blog refers to her as their CEO. That might indicate the site simply needs updating. But disappointingly for an organisation describing itself as The Curriculum Centre their ‘complete offer – The Future Curriculum™ – is described as ‘information coming soon’, and the page about the three year KS2 curriculum is high on criticism of other approaches but low on information about itself.

Daisy is also ‘governor for knowledge’ at the Michaela Community School (headteacher Katherine Birbalsingh), a free school that’s already attracted press criticism even though it doesn’t open until September. Their curriculum page is a bit more detailed than that of The Curriculum Centre, but has some emphases that aren’t self-evident and aren’t explained, such as:

Our emphasis on traditional academic subjects will provide a solid base on which young people can build further skills and future careers, thus enabling them to grow into thinkers, authors, leaders, orators or whatever else they wish.

One has to wonder why the ‘traditional academic subjects’ don’t appear to be preparing pupils for careers with a more practical bent, such as doctors, economists or engineers.

Michaela recognises that English and Maths are fundamental to all other learning.”

No, they’re not. They are useful tools in accessing other learning, but non-English speakers who aren’t good at maths can be still be extremely knowledgeable.

Michaela Community School will teach knowledge sequentially so that the entire body of knowledge for a subject will be coherent and meaningful. The History curriculum will follow a chronological sequence of events. The English curriculum will follow a similar chronology of the history of literature, and will also build up knowledge of grammar and the parts of speech.”

The rationale for teaching history chronologically is obvious, but history is more than a sequence of events, and it’s not clear why it’s framed in that way. Nor is there an explanation for why literature should be taught chronologically. Nor why other subjects shouldn’t be. As it happens, I’m strongly in favour of structuring the curriculum chronologically, but I know from experience it’s impossible to teach English, Maths, Science, History, Geography, a modern foreign language (French/Spanish), Music and Art chronologically and in parallel because your chronology will be out of synch across the different subject areas. I’ve used a chronological curriculum with my own children and it gave them an excellent understanding of how everything connects. We started with the Big Bang and worked forward from there. But it meant that for about a year our core focus was on physics, chemistry and geography because for much of the earth’s history nothing else existed. I don’t get the impression Michaela or the Curriculum Centre have actually thought through curriculum development from first principles.

Then there was:

The Humanities curriculum at Michaela Community School will develop a chronologically secure knowledge and understanding of British, local and world history and introduce students to the origins and evolution of the major world religions and their enduring influence.”

I couldn’t help wondering why ‘British’ came before local and world history. And why highlight religions and ‘their enduring influence’? It could be that the curriculum section doesn’t summarise the curriculum very well, or it could be that there’s an agenda here that isn’t being made explicit.

I’m not convinced that Daisy has properly understood how human memory works, has used what’s been scientifically established about it to debunk any educational myths, or has thoroughly thought through its implications for classroom practice. Sorry, Daisy, but I think you need to have another go.

References
Carter, R (2010). Mapping the Mind. University of California Press.
Dehaene, S (2010). Reading in the Brain. Penguin.
Willingham, DT (2010). Why Don’t Students Like School? Jossey Bass.

* My bookshelves are groaning under the weight of books I’ve bought solely for the purpose of satisfying people who’ve told me I can’t criticise what someone’s saying until I’ve read their book. Very occasionally I come across a gem. More often than not, one can read between the lines of reviews.

Kirschner, Sweller & Clark: a summary of my critique

It’s important not just to know things, but to understand them, which is why I took three posts to explain my unease about the paper by Kirschner, Sweller & Clark. From the responses I’ve received I appear to have overstated my explanation but understated my key points, so for the benefit of anybody unable or unwilling to read all the words, here’s a summary.

1. I have not said that Kirschner, Sweller & Clark are wrong to claim that working memory has a limited capacity. I’ve never come across any evidence that says otherwise. My concerns are about other things.

2. The complex issue of approaches to learning and teaching is presented as a two-sided argument. Presenting complex issues in an oversimplified way invariably obscures rather than clarifies the debate.

3. The authors appeal to a model of working memory that’s almost half a century old, rather than one revised six years before their paper came out and widely accepted as more accurate. Why would they do that?

4. They give the distinct impression that long-term memory isn’t subject to working memory constraints, when it is very much subject to them.

5. They completely omit any mention of the biological mechanisms involved in processing information. Understanding the mechanisms is key if you want to understand how people learn.

6. They conclude that explicit, direct instruction is the only viable teaching approach based on the existence of a single constraining factor – the capacity of working memory to process yet-to-be learned information (though exactly what they mean by yet-to-be learned isn’t explained). In a process as complex as learning, it’s unlikely that there will be only one constraining factor.

Kirschner, Sweller & Clark appear to have based their conclusion on a model of memory that was current in the 1970s (I know because that’s when I first learned about it), to have ignored subsequent research, and to have oversimplified the picture at every available opportunity.

What also concerns me is that some teachers appear to be taking what Kirschner, Sweller & Clark say at face value, without making any attempt to check the accuracy of their model, to question their presentation of the problem or the validity of their conclusion. There’s been much discussion recently about ‘neuromyths’. Not much point replacing one set of neuromyths with another.

Reference
Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

cognitive load and learning

In the previous two posts I discussed the model of working memory used by Kirschner, Sweller & Clark and how working memory and long-term memory function. The authors emphasise that their rejection of minimal guidance approaches to teaching is based on the limited capacity of working memory in respect of novel information, and that even if experts might not need much guidance “…nearly everyone else thrives when provided with full, explicit instructional guidance (and should not be asked to discover any essential content or skills)” (Clark, Kirschner & Sweller, p.6) Whether they are right or not depends on what they mean by ‘novel’ information.

So what’s new?

Kirschner, Sweller & Clark define novel information as ‘new, yet to be learned’ information that has not been stored in long-term memory (p.77). But novelty isn’t a simple case of information either being yet–to-be-learned or stored-in-long-term memory. If I see a Russian sentence written in Cyrillic script, its novelty value to me on a scale of 1-10 would be about 9. I can recognise some Cyrillic letters and know a few Russian words, but my working memory would be overloaded after about the third letter because of the multiple operations involved in decoding, blending and translating. A random string of Arabic numerals would have a novelty value of about 4, however, because I am very familiar with Arabic numerals; the only novelty would be in their order in the string. The sentence ‘the cat sat on the mat’ would have a novelty value close to zero because I’m an expert at chunking the letter patterns in English and I’ve encountered that sentence so many times.

Because novelty isn’t an either/or thing but sits on a sliding scale, and because the information coming into working memory can vary between simple and complex, that means that ‘new, yet to be learned’ information can vary in both complexity and novelty.

You could map it on a 2×2 matrix like this;

novelty, complexity & cognitive load

novelty, complexity & cognitive load

A sentence such as ‘the monopsonistic equilibrium at M should now be contrasted with the equilibrium that would obtain under competitive conditions’ is complex (it contains many bits of information) but its novelty content would depend on the prior knowledge of the reader. It would score high on both the novelty and complexity scales of the average 5 year old. I don’t understand what the sentence means, but I do understand many of the words, so it would be mid-range in both novelty and complexity for me. An economist would probably give it a 3 for complexity but 0 for novelty. Trying to teach a 5 year-old what the sentence meant would completely overload their working memory. But it would be a manageable challenge for mine, and an economist would probably feel bored.

Kirschner, Sweller & Clark reject ‘constructivist, discovery, problem-based, experiential and inquiry-based approaches’ on the basis that they overload working memory and the excessive cognitive load means that learners don’t learn as efficiently as they would using explicit direct instruction. If only it were that simple.

‘Constructivist, discovery, problem-based, experiential and inquiry-based approaches’ were adopted initially not because teachers preferred them or because philosophers thought they were a good idea, but because by the end of the 19th century explicit, direct instruction – the only game in town for fledgling mass education systems – clearly wasn’t as effective as people had thought it would be. Alternative approaches were derived from three strategies that young children apply when learning ‘naturally’.

How young children learn

Human beings are mammals and young mammals learn by applying three key learning strategies which I’ll call ‘immersion’, trial-and-error and modelling (imitating the behaviour of other members of their species). By ‘strategy’, I mean an approach that they use, not that the baby mammals sit down and figure things out from first principles; all three strategies are outcomes of how mammals’ brains work.

Immersion

Most young children learn to walk, talk, feed and dress themselves and acquire a vast amount of information about their environment with very little explicit, direct instruction. And they acquire those skills pretty quickly and apparently effortlessly. The theory was that if you put school age children in a suitable environment, they would pick up other skills and knowledge equally effortlessly, without the boredom of rote-learning and the grief of repeated testing. Unfortunately, what advocates of discovery, problem-based, experiential and inquiry-based learning overlooked was the sheer amount of repetition involved in young children learning ‘naturally’.

Although babies’ learning is kick-started by some hard-wired processes such as reflexes, babies have to learn to do almost everything. They repeatedly rehearse their gross motor skills, fine motor skills and sensory processing. They practice babbling, crawling, toddling and making associations at every available opportunity. They observe things and detect patterns. A relatively simple skill like face-recognition, grasping an object or rolling over might only take a few attempts. More complex skills like using a spoon, crawling or walking take more. Very complex skills like using language require many thousands of rehearsals; it’s no coincidence that children’s speech and reading ability take several years to mature and their writing ability (an even more complex skill) doesn’t usually mature until adulthood.

The reason why children don’t learn to read, do maths or learn foreign languages as ‘effortlessly’ as they learn to walk or speak in their native tongue is largely because of the number of opportunities they have to rehearse those skills. An hour a day of reading or maths and a couple of French lessons a week bears no resemblance to the ‘immersion’ in motor development and their native language that children are exposed to. Inevitably, it will take them longer to acquire those skills. And if they take an unusually long time, it’s the child, the parent, the teacher or the method of that tends to be blamed, not the mechanism by which the skill is acquired.

Trial-and-error

The second strategy is trial-and-error. It plays a key role in the rehearsals involved in immersion, because it provides feedback to the brain about how the skill or knowledge is developing. Some skills, like walking, talking or handwriting, can only be acquired through trial-and-error because of the fine-grained motor feedback that’s required. Learning by trial-and-error can offer very vivid, never-forgotten experiences, regardless of whether the initial outcome is success or failure.

Modelling

The third strategy is modelling – imitating the behaviour of other members of the species (and sometimes other species or inanimate objects). In some cases, modelling is the most effective way of teaching because it’s difficult to explain (or understand) a series of actions in verbal terms.

Cognitive load

This brings us back to the issue of cognitive load. It isn’t the case that immersion, trial-and-error and modelling or discovery, problem-based, experiential and inquiry-based approaches always impose a high cognitive load, and that explicit direct instruction doesn’t. If that were true, young children would have to be actively taught to walk and talk and older ones would never forget anything. The problem with all these educational approaches is that they have all initially been seen as appropriate for teaching all knowledge and skills and have subsequently been rejected as ineffective. That’s not at all surprising, because different types of knowledge and skill require different strategies for effective learning.

Cognitive load is also affected by the complexity of incoming information and how novel it is to the learner. Nor is cognitive load confined to the capacity of working memory. 40 minutes of explicit, direct novel instruction, even if presented in well-paced working-memory-sized chunks, would pose a significant challenge to most brains. The reason, as I pointed out previously, is because the transfer of information from working memory to long-term memory is a biological process that takes time, resources and energy. Research into changes in the motor cortex suggests that the time involved might be as little as hours, but even that has implications for the pace at which students are expected to learn and how much new information they can process. There’s a reason why someone would find acquiring large amounts of new information tiring – their brain uses up a considerable amount of glucose getting that information embedded in the form of neural connections. The inevitable delay between information coming into the brain and being embedded in long-term memory suggests that down-time is as important as learning time – calling into question the assumption that the longer children spend actively ‘learning’ the more they will know.

Final thoughts

If I were forced to choose between constructivist, discovery, problem-based, experiential and inquiry-based approach to learning or explicit, direct instruction, I’d plump for explicit, direct instruction because the world we live in works according to discoverable principles and it makes sense to teach kids what those principles are, rather than to expect them to figure them out for themselves. However, it would have to be a forced choice, because we do learn through constructing our knowledge and through discovery, problem-solving, experiencing and inquiring as well as by explicit, direct instruction. The most appropriate learning strategy will depend on the knowledge or skill being learned.

The Kirschner, Sweller & Clark paper left me feeling perplexed and rather uneasy. I couldn’t understand why the authors frame the debate about educational approaches in terms of minimal guidance ‘on one side’ and direct instructional guidance ‘on the other’, when self-evidently the debate is more complex than that. Nor why they refer to Atkinson & Shiffrin’s model of working memory when Baddeley & Hitch’s more complex model is so widely accepted as more accurate. Nor why they omit any mention of the biological mechanisms involved in learning; not only are the biological mechanisms responsible for the way working memory and long-term memory operate, they also shed light on why any single educational approach doesn’t work for all knowledge, all skills – or even all students.

I felt it was ironic that the authors place so much emphasis on the way novices think but present a highly complex debate in binary terms – a classic feature of the way novices organise their knowledge. What was also ironic was that despite their emphasis on explicit, direct instruction, they failed to mention several important features of memory that would have helped a lay readership understand how memory works. This is all the more puzzling because some of these omissions (and a more nuanced model of instruction) are referred to in a paper on cognitive load by Paul Kirschner published four years earlier.

In order to fully understand what Kirschner, Sweller & Clark are saying, and to decide whether they were right or not, you’d need to have a fair amount of background knowledge about how brains work. To explain that clearly to a lay readership, and to address possible objections to their thesis, the authors would have had to extend the paper’s length by at least 50%. Their paper is just over 10 000 words long, suggesting that word-count issues might have resulted in them having to omit some points. That said, Educational Psychologist doesn’t currently apply a word limit, so maybe the authors were trying to keep the concepts as simple as possible.

Simplifying complex concepts for the benefit of a lay readership can certainly make things clearer, but over-simplifying them runs the risk of giving the wrong impression, and I think there’s a big risk of that happening here. Although the authors make it clear that explicit direct instruction can take many forms, they do appear to be proposing a one-size fits all approach that might not be appropriate for all knowledge, all skills or all students.

References

Clark, RE, Kirschner, PA & Sweller, J (2012). Putting students on the path to learning: The case for fully guided instruction, American Educator, Spring.

Kirschner, PA (2002). Cognitive load theory: implications of cognitive load theory on the design of learning, Learning and Instruction, 12 1–10.

Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

memories are made of this

Education theory appears to be dominated by polarised debates. I’ve just come across another; minimal guidance vs direct instruction. Harry Webb has helpfully brought together what he calls the Kirschner, Sweller & Clark cycle of papers that seem to encapsulate it. The cycle consists of papers by these authors and responses to them, mostly published in Educational Psychologist during 2006-7.

Kirschner, Sweller & Clark are opposed to minimal guidance approaches in education and base their case on the structure of human cognitive architecture. As they rightly observe “Any instructional procedure that ignores the structures that constitute human cognitive architecture is not likely to be effective” (p.76). I agree completely, so let’s have a look at the structures of human cognitive architecture they’re referring to.

Older models

Kirschner, Sweller & Clark claim that “Most modern treatments of human cognitive architecture use the Atkinson and Shiffrin (1968) sensory memory–working memory–long-term memory model as their base” (p.76).

That depends on how you define ‘using a model as a base’. Atkinson and Shiffrin’s model is 45 years old. 45 years is a long time in the fast-developing field of brain research, so claiming that modern treatments use it as their base is a bit like claiming that modern treatments of blood circulation are based on William Harvey’s work (1628) or that modern biological classification is based on Carl Linnaeus’ system (1735). It would be true to say that modern treatments are derived from those models, but our understanding of circulation and biological classification has changed significantly since then, so the early models are almost invariably referred to only in an historical context. A modern treatment of cognitive architecture might mention Atkinson & Shiffrin if describing the history of memory research, but I couldn’t see why anyone would use it as a base for an educational theory – because the reality has turned out to be a lot more complicated than Atkinson and Shiffrin could have known at the time.

Atkinson and Shiffrin’s model was influential because it provided a coherent account of some apparently contradictory research findings about the characteristics of human memory. It was also based on the idea that features of information processing systems could be universally applied; that computers worked according to the same principles as did the nervous systems of sea slugs or the human brain. That idea wasn’t wrong, but the features of information processing systems have turned out to be a bit more complex than was first imagined.

The ups and downs of analogies

Theoretical models are rather like analogies; they are useful in explaining a concept that might otherwise be difficult for people to grasp. Atkinson and Shiffrin’s model essentially made the point that human memory wasn’t a single thing that behaved in puzzlingly different ways in different circumstances, but that it could have three components, each of which behaved consistently but differently.

But there’s a downside to analogies (and theoretical models); sometimes people forget that analogies are for illustrative purposes only, and that models show what hypotheses need to be tested. So they remember the analogy/model and forget what it’s illustrating, or they assume the analogy/model is an exact parallel of the reality, or, as I think has happened in this case, the analogy/model takes on a life of its own.

You can read most of Atkinson & Shiffrin’s chapter about their model here. There’s a diagram on p.113. Atkinson and Shiffrin’s model is depicted as consisting of three boxes. One box is the ‘sensory register’ – sensory memory that persists for a very short time and then fades away. The second box is a short-term store with a very limited capacity (5-9 bits of information) that can retain that information for a few seconds. The third box is a long-term store, where information is retained indefinitely. The short-term and long-term stores are connected to each other and information can be transferred between them in both directions. The model is based on what was known in 1968 about how memory behaved, but Atkinson and Shiffrin are quite explicit that there was a lot that wasn’t known.

Memories are made of this

Anyone looking at Atkinson & Shiffrin’s model for the first time could be forgiven for thinking that the long-term memory ‘store’ is like a library where memories are kept. That was certainly how many people thought about memory at the time. One of the problems with that way of thinking about memory is that the capacity required to store all the memories that people clearly do store, would exceed the number of cells in the brain and that accessing the memories by systematically searching through them would take a very long time – which it often doesn’t.

This puzzle was solved by the gradual realisation that the brain didn’t store individual memories in one place as if they were photographs in a huge album, but that ‘memories’ were activated via a vast network of interconnected neurons. A particular stimulus would activate a particular part of the neural network and that activation is the ‘memory’.

For example, if I see an apple, the pattern of light falling on my retina will trigger a chain of electrical impulses that activates all the neurons that have previously been activated in response to my seeing an apple. Or hearing about or reading about or eating apples. I will recall other apples I’ve seen, how they smell and taste, recipes that use apples, what the word ‘apple’ sounds like, how it’s spelled and written, ‘apple’ in other languages etc. That’s why memories can (usually) be retrieved so quickly. You don’t have to search through all memories to find the one you want. As Antonio Damasio puts it;

Images are not stored as facsimile pictures of things, or events or words, or sentences…In brief, there seem to be no permanently held pictures of anything, even miniaturized, no microfiches or microfilms, no hard copies… as the British psychologist Frederic Bartlett noted several decades ago, when he first proposed that memory is essentially reconstructive.” (p.100)

But Atkinson and Shiffrin don’t appear to have thought of memory in this way when they developed their model. Their references to ‘store’ and ‘search’ suggest they saw memory as more of a library than a network. That’s also how Kirschner, Sweller & Clark seem to view it. Although they say “our understanding of the role of long-term memory in human cognition has altered dramatically over the last few decades” (p.76), they repeatedly refer to long-term memory as a ‘store’ ‘containing huge amounts of information’. I think that description is misleading. Long-term memory is a property of neural networks – if any information is ‘stored’ it’s stored in the pattern and strength of the connections between neurons.

This is especially noticeable in the article the authors published in 2012 in American Educator from which it’s difficult not to draw the conclusion that long term memory is a store that contains many thousands of schemas, rather than a highly flexible network of connections that can be linked in an almost infinite number of ways.

Where did I put my memory?

In the first paper I mentioned, Kirschner, Sweller & Clark also refer to long-term memory and working memory as ‘structures’. Although they could mean ‘configurations’, the use of ‘structures’ does give the impression that there’s a bit of the brain dedicated to storing information long-term and another where it’s just passing through. Although some parts of the brain do have dedicated functions, those localities should be thought of as localities within a network of neurons. Information isn’t stored in particular locations in the brain, it’s distributed across it, although particular connections are located in particular places in the brain.

Theories having a life of their own

Atkinson and Shiffrin’s model isn’t exactly wrong; human memory does encompass short-lived sensory traces, short-term buffering and information that’s retained indefinitely. But implicit in their model are some assumptions about the way memory functions that have been superseded by later research.

At first I couldn’t figure out why anyone would base an educational theory on an out-dated conceptual model. Then it occurred to me that that’s exactly what’s happened in respect of theories about child development and autism. In both cases, someone has come up with a theory based on Freud’s ideas about children. Freud’s ideas in turn were based on his understanding of genetics and how the brain worked. Freud died in 1939, over a decade before the structure of DNA was discovered, and two decades before we began to get a detailed understanding of how brains process information. But what happened to the theories of child development and autism based on Freud’s understanding of genetics and brain function, is that they developed an independent existence and carried on regardless, instead of constantly being revised in the light of new understandings of genetics and brain function. Theories dominating autism research are finally being presented with a serious challenge from geneticists, but child development theories still have some way to go. Freud did a superb job with the knowledge available to him, but that doesn’t mean it’s a good idea to base a theory on his ideas as if new understandings of genetics and brain function haven’t happened.

Again I completely agree with Kirschner, Sweller & Clark that “any instructional procedure that ignores the structures that constitute human cognitive architecture is not likely to be effective”, but basing an educational theory on one aspect of human cognitive architecture – memory – and on an outdated concept of memory at that, is likely to be counterproductive.

A Twitter discussion of the Kirschner, Sweller & Clark model centred around the role of working memory, which is what I plan to tackle in my next post.

References

Atkinson, R, & Shiffrin, R (1968). Human memory: A proposed system and its control processes. In K. Spence & J. Spence (Eds.), The psychology of learning and motivation (Vol. 2, pp. 89–195). New York: Academic Press
Clark, RE, Kirschner, PA & Sweller, J (2012). Putting students on the path to learning: The case for fully guided instruction, American Educator, Spring.
Damasio, A (1994). Descartes’ Error, Vintage Books.
Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.