direct instruction: the evidence

A discussion on Twitter raised a lot of questions about working memory and the evidence supporting direct instruction cited by Kirschner, Sweller and Clark. I couldn’t answer in 140 characters, so here’s my response. I hope it covers all the questions.

Kirschner Sweller & Clark’s thesis is;

• working memory capacity is limited
• constructivist, discovery, problem-based, experiential, and inquiry-based teaching (minimal guidance) all overload working memory and
• evidence from studies investigating efficacy of different methods supports the superiority of direct instruction.
Therefore, “In so far as there is any evidence from controlled studies, it almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners.” (p.83)

Sounds pretty unambiguous – but it isn’t.

1. Working memory (WM) isn’t simple. It includes several ‘dissociable’ sensory buffers and a central executive that monitors, attends to and responds to sensory information, information from the body and information from long term memory (LTM) (Wagner, Bunge & Badre, 2004; Damasio, 2006).

2. Studies comparing minimal guidance with direct instruction are based on ‘pure’ methods. Sweller’s work on cognitive load theory (CLT) (Sweller, 1988) was based on problems involving use of single buffer/loop e.g. mazes, algebra. New items coming into the buffer displace older items, so buffer capacity would be limiting factor. But real-world problems tend to involve different buffers, so items in the buffers can be easily maintained while they are manipulated by the central executive. For example, I can’t write something complex and listen to Radio 4 at the same time because my phonological loop can’t cope. But I can write and listen to music, or listen to Radio 4 whilst I cook a new recipe because I’m using different buffers. Discovery, problem-based, experiential, and inquiry-based teaching in classrooms tends to more closely resemble real world situations than the single-buffer problems used by Sweller to demonstrate the concept of cognitive load, so the impact of the buffer limit would be lessened.

3. For example, Klahr & Nigam (2004) point out that because there’s no clear definition of discovery learning, in their experiment involving a scientific concept they ‘magnified the difference between the two instructional treatments’ – ie used an ‘extreme type’ of both methods – that’s unlikely to occur in any classroom. Essentially they disproved the hypothesis that children always learn better by discovering things for themselves; but children are unlikely to ‘discover things for themselves’ in circumstances like those in the Klahr & Nigam study.

It’s worth noting that 8 of the children in their study figured out what to do at the outset, so were excluded from the results. And 23% of the direct instruction children didn’t master the concept well enough to transfer it.

That finding – that some learners failed to learn even when direct instruction was used, and that some learners might benefit from less direct instruction, comes up time and again in the evidence cited by Kirschner, Sweller and Clark, but gets overlooked in their conclusion.

I can quite see why educational methods using ‘minimal instruction’ might fail, and agree that proponents of such methods don’t appear to have taken much notice of such research findings as there are. But the findings are not unambiguous. It might be true that the evidence ‘almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners’ [my emphasis] but teachers aren’t faced with that forced choice. Also the evidence doesn’t show that direct, strong instructional guidance is always effective for all learners. I’m still not convinced that Kirschner, Sweller & Clark’s conclusion is justified.


References

Damasio, A (2006) Descartes’ Error. Vintage Books
Klahr, D & Klahr, D, & Nigam, M. (2004). The equivalence of learning paths in early
science instruction: Effects of direct instruction and discovery learning.
Psychological Science, 15, 661–667.
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning.
Cognitive Science, 12, 257–285.
Wagner, A.D., Bunge, S.A. & Badre, D. (2004). Cognitive control, semantic memory and priming: Contributions from prefontal cortex. In M. S. Gazzaniga (Ed.) The Cognitive Neurosciences (3rd edn.). Cambridge, MA: MIT Press.

“waiter’s memory”

At the ResearchED conference last Saturday, when I queried the usefulness of the diagram of working memory that was being used, I was asked two questions. Here’s the first:

What’s wrong with Willingham’s model of working memory?

Nothing’s wrong with Willingham’s model. As far as I can tell, the diagram of working memory that was being used by teachers at the ResearchED conference had been simplified to illustrate two key points; that working memory has limited capacity and that information can be transferred from working memory to long-term memory and vice-versa.

My reservation about it is that if it’s the only model of working memory you’ve seen, you won’t know what Willingham has left out, nor how working memory fits into the way the brain processes information. And over-simplified models of things, if unconstrained by reality, tend to take on a life of their own which doesn’t help anyone. The left-brain right-brain mythology is a case in point. An oversimplified understanding of the differences between right and left hemispheres followed by a process of Chinese whispers ended up producing some bizarre educational practices.

The second question was this:

What difference would it make if we knew more about how information is processed in the brain?

It’s a good question. The short answer is that if you rely on Willingham’s diagram for your understanding of working memory, you could conclude, as some people have done, that direct instruction is the only way students should be taught. As I hope I showed in my previous post, the way information is processed is more complex than the diagram suggests. I think there are three key points that are worth bearing in mind.

Long-term memory is constantly being updated by incoming sensory information

Children are learning all the time. They learn implicitly, informally and incidentally from their environment as well as explicitly when being taught. It’s well worth utilising that ability to learn from ‘background’ information. Posters, displays, playground activities, informal conversations, and dvds and books used primarily for entertainment, can all exploit implicit, informal and incidental learning that will support and extend and reinforce explicit learning.

We’re not always aware that we are learning

I only need two or three exposures to an unfamiliar place, or face or song before I can recognise it again, and I don’t need to actively pay attention to, or put any effort into recalling, the place, face or song in order to do so. I would have reliably learned new things, but my learning would be implicit. I wouldn’t be able to give accurate directions, describe the face so that someone else would recognise it, or hum the tune. (Daniel Willingham suggests that implicit memory doesn’t exist, but he’s talking about the classification rather than the phenomenon.)

Peter Blenkinsop and I found that we were using different definitions of learning. My definition was; long-term changes to the brain as a result of incoming information. His was; being able to explicitly recall information from long-term memory. Both definitions are valid, but they are different.

Working memory is complex

George Miller’s paper ‘The magical number seven, plus or minus two’ is well worth reading. What’s become clear since Miller wrote it is that his finding that working memory can handle only 7±2 bits of information at once applies to the loops/sketchpads/buffers in working memory. At first, it was assumed there was only one loop/sketchpad/buffer. Since then more have been discovered. In addition, due to information being chunked, the amount of information in the loops/sketchpads/buffers can actually be quite large. On top of that, the central executive is simultaneously monitoring information from the environment, the body and long-term memory. That’s quite a lot of information flowing through working memory all the time. We don’t actively pay attention to all of it, but it doesn’t follow that anything we don’t pay attention to disappears forever. In addition to working memory capacity there are several other things the brain does that make it easier, or harder, for people to learn.

Things that make learning easier (and harder)

1. Pre-existing information

People learn by extending their existing mental schemata. This involves extending neural networks – literally. If information is totally novel to us, it won’t mean anything to us and we’re unlikely to remember it. Because each human being has had a unique set of life experiences, each of us has a unique set of neural networks and the way we structure our knowledge is also unique. It doesn’t follow that everybody’s knowledge framework is equally valid. The way the world is structured and the way it functions are pretty reliable and we know quite a lot about both. Students do need to acquire core knowledge about the world and it is possible to teach it. Having said that, there are often fundamental disagreements within knowledge domains about the nature of that core knowledge, so students also need to know how to look at knowledge from different perspectives and how to test its reliability and validity.

Tapping into children’s existing schemata, not just those relating to what they are supposed to be learning in school but what they know about the world in general, can provide hooks on which to hang tricky concepts. Schemata from football, pop culture or Dr Who can be exploited, not in order to make learning ‘fun’, but to make sense of it. That doesn’t mean that teachers have to refer to pop culture, or that they should do so if it’s likely to prove a distraction.

2. Multi-sensory input

Because learning is about the real world and takes place in the real world, it usually involves more than one sensory modality – human beings rely most heavily on the visual, auditory and tactile senses. Neural connections linking information from several sensory modalities make things we’ve learned more secure because they can be accessed via several different sensory routes. It also makes sense to map the way information is presented as accurately as possible onto what it relates to in the real world. Visits, audio-visuals, high quality illustrations and physical activities can convey information that chalk-and-talk and a focus on abstract information can’t. Again, the job of multi-sensory vehicles for learning isn’t to make the learning ‘fun’ (although they might do that) or to distract the learner, but to increase the amount of information available.

3. Trial-and-error

The brain relies on trial-and-error feedback to fine-tune skills and ensure that knowledge is fit for purpose. We call trial-and-error learning in young children ‘play’. Older children and adults also use play to learn – if they get the opportunity. In more formal educational settings, formative assessment that gives feedback to individual students is a form of trial-and-error learning. It’s important to note that human beings tend to attach greater weight to the risk of failure and sanctions than they do to opportunities for success and reward. This means that tasks need to be challenging but not too challenging. Too many failures – or too many successes – can reduce interest and motivation.

4. Rehearsal

Willingham emphasises the importance of rehearsal in learning. The more times neural networks are activated, the stronger the connections become within them, and the more easily information will be recalled. Rehearsal at intervals is more effective than ‘cramming’. That’s because the connections between neurons have to be formed, physically, and there’s no opportunity for that to happen if the network is being constantly activated by incoming information. There’s a reason why human beings need rest and relaxation.

5. Problem-solving

Willingham is often quoted as saying ‘the brain is not designed for thinking’. That’s true in the sense that our brains default to quick-and-dirty solutions to problems rather than using logical, rational thought. What’s also true is what Willingham goes on to say; ‘people like to solve problems, but not to work on unsolveable problems’ (p.3). The point he’s making is that our problem-solving capacity is limited. Nonetheless, human technology bears witness to the fact that human beings are problem-solvers extraordinaire, and the attempts to resolve problems have resulted in a vast body of knowledge about how the world works. It’s futile to expect children to do all their learning by problem-solving, but because problem-solving involves researching, re-iterating, testing and reconfiguring knowledge it can be an effective way of acquiring new information and making it very memorable.

6. Writing things down

Advocates of direct instruction place a lot of emphasis on the importance of long-term memory; the impression one gets is that if factual information is memorised it can be recalled whenever it’s needed. Unfortunately, long-term memory doesn’t work like that. Over time information fades if it’s not used very often and memories can become distorted (assuming they were accurate in the first place). If we’ve acquired a great deal of factual information, we won’t have time to keep rehearsing all of it to keep it all easily accessible. Memorising factual information we currently need makes sense, but what we need long-term is factual information to hand when required, and that’s why we invented writing. And books. And the internet, although that has some of the properties of long-term memory. Recording information enormously increases the capacity and reliability of long-term memory.

grover

In a classic Sesame Street sketch, Mr Johnson the restaurant customer suggests that Grover the waiter write down his order. Grover is affronted: “Sir! I am a trained professional! I do not need to write things down. Instead, I use my ‘waiter’s memory’.” Waiters are faced with an interesting memory challenge; they need to remember a customer’s order for longer than is usually possible in working memory, but don’t need to remember the order long-term. So they tend to use technical support in the form of a written note. Worth watching the sketch, because it’s a beautiful illustration of how a great deal of information can be packed into a small timeframe, without any obvious working memory overload. (First time round most children would miss some of it, but Sesame Street repeats sketches for that reason.)

Conclusion

It won’t have escaped the attention of some readers that I have offered evidence from cognitive science to support educational methods lumped together as ‘minimal guidance’ and described as ‘failing’ by Kirschner, Sweller and Clark; constructivist, discovery, problem-based, experiential, and inquiry-based teaching. A couple of points are worth noting in relation to these approaches.

The first is that they didn’t appear suddenly out of the blue. Each of them has emerged at different points in time from 150 years of research into how human beings learn. We do learn by experiencing, inquiring, discovering, problem-solving and constructing our knowledge in different ways. There is no doubt about that. There’s also no doubt that we can learn by direct instruction.

The second point is that the reason why these approaches have demonstrably failed to ensure that all children have a good knowledge of how the world works, is because they have been extended beyond what George Kelley called their range of convenience.

In other words they’ve been applied inappropriately. You can’t just construct your own understanding of the world and expect the world to conform to it. Trying to learn everything by experience, discovery, inquiry or problem-solving is a waste of effort if someone’s already experienced, discovered or inquired about it, or if a problem’s already been solved. Advocates of direct instruction are quite right to point out that you usually need prior knowledge before you can solve a problem, and a good understanding of a knowledge domain before you know what you need to inquire about, and that many failures in education have come about because novices have been expected to mimic the surface features of experts’ behavior without having the knowledge of experts.

Having said that, relying on an oversimplified model of working memory introduces the risk of exactly the same thing happening with direct instruction. The way the brain processes information is complex, but not so complex it can’t be summarised in a few key principles. Human beings acquire information in multiple ways, but not in so many ways we can’t keep track of them. Figuring out what teaching approaches are best used for what knowledge might take a bit of time, but it’s a worthwhile investment, and should help to avoid the one-size-fits-all approach that has bedevilled the education system for too long.

Acknowledgements

Image of Grover from Muppet Wiki http://muppet.wikia.com/wiki/Grover

there’s more to working memory than meets the eye

I’ve had several conversations on Twitter with Peter Blenkinsop about learning and the brain. At the ResearchEd conference on Saturday, we continued the conversation and discovered that much of our disagreement was because we were using different definitions of learning. Peter’s definition is that learning involves being able to actively recall information; mine is that it involves changes to the brain in response to information.

working memory

Memory is obviously essential to learning. One thing that’s emerged clearly from years of research into how memory works is that the brain retains information for a very short time in what’s known as working memory, and indefinitely in what’s called long-term memory – but that’s not all there is to it. I felt that advocates of direct instruction at the conference were relying on a model of working memory that was oversimplified and could be misleading. The diagram they were using looked like this; Slide1

simple model of memory

This model is attributed to Daniel Willingham. From what the teachers were saying, the diagram is simpler than most current representations of working memory because its purpose is to illustrate three key points;

• the capacity of working memory is limited and it holds information for a short time
• information in long-term memory is available for recall indefinitely and
• information can be transferred from working memory to long-term memory and vice versa.

So far, so good.

My reservation about the diagram is that if it’s the only diagram of working memory you’ve ever seen, you might get the impression that it shows the path information follows when it’s processed by the brain. From it you might conclude that;

• information from the environment goes directly into working memory
• if you pay attention to that information, it will be stored permanently in long-term memory
• if you don’t pay attention to it it will be lost forever, and
• there’s a very low limit to how much information from the environment you can handle at any one time.

But that’s not quite what happens to information coming into the brain. As Peter pointed out during our conversation, simplifying things appropriately is challenging; you want to simplify enough to avoid confusing people, but not so much that they might misunderstand.

In this post, I’m going to try to explain the slightly bigger picture of how brains process information, and where working memory and long-term memory fit in.

sensory information from the external environment

All information from the external environment comes into the brain via the sense organs. The incoming sensory information is on a relatively large scale, particularly if it’s visual or auditory information; you can see an entire classroom at once and hear simultaneously all the noises emanating from it. But individual cells within the retina or the cochlea respond to tiny fragments of that large-scale information; lines at different angles, areas of light and dark and colour, minute changes in air pressure. Information from the fragments is transmitted via tiny electrical impulses, from the sense organs to the brain. The brain then chunks the fragments together to build larger-scale representations that closely match the information coming in from the environment. As a result, what we perceive is a fairly accurate representation of what’s actually out there. I say ‘fairly accurate’ because perception isn’t 100% accurate, but that’s another story.

chunking

The chunking of sensory information takes place via networks of interconnected neurons (long spindly brain cells). The brain forms physical connections (synapses) between neighbouring neurons in response to novel information. The connections allow electrical activation to pass from one neuron to another. The connections work on a use-it-or-lose-it principle; the more they are used the stronger they get, and if they’re not used much they weaken and disappear. Not surprisingly, toddlers have vast numbers of connections, but that number diminishes considerably during childhood and adolescence. That doesn’t mean we have to keep remembering everything we ever learned or we’ll forget it, it’s a way of ensuring that the brain can process efficiently the types of information from the environment that it’s most likely to encounter.

working memory

Broadly speaking, incoming sensory information is processed in the brain from the back towards the front. It’s fed forward into areas that Alan Baddeley has called variously a ‘loop’, ‘sketchpad’ and ‘buffer’. Whatever you call them, they are areas where very limited amounts of information can be held for very short periods while we decide what to do with it. Research evidence suggests there are different loops/sketchpads/buffers for different types of sensory information – for example Baddeley’s most recent model of working memory includes temporary stores for auditory, visuospatial and episodic information.

Baddeley's working memory model

Baddeley’s working memory model

The incoming information held briefly in the loops/sketchpads/buffers is fed forward again to frontal areas of the brain where it’s constantly monitored by what’s called the central executive – an area that deals with attention and decision-making. The central executive and the loops/sketchpads/buffers together make up working memory.

long-term memory

The information coming into working memory activates the more permanent neural networks that carry information relevant to it – what’s called long-term memory. The neural networks that make up long-term memory are distributed throughout the brain. Several different types of long-term memory have been identified but the evidence points increasingly to the differences being due to where neural networks are located, not to differences in the biological mechanisms involved.

Information in the brain is carried in the pattern of connections between neurons. The principle is similar to the way pixels represent information on a computer screen; that information is carried in the patterns of pixels that are activated. This makes computer screens – and brains – very versatile; they can carry a huge range of different types of information in a relatively small space. One important difference between the two processes is that pixels operate independently, whereas brain cells form physical connections if they are often activated at the same time. The connections allow fast, efficient processing of information that’s encountered frequently.

For example, say I’m looking out of my window at a pigeon. The image of the pigeon falling on my retina will activate the neural networks in my brain that carry information about pigeons; what they look like, sound like, feel like, their flight patterns and feeding habits. My thoughts might then wander off on to related issues; other birds in my garden, when to prune the cherry tree, my neighbour repairing her fence. If I glance away from the pigeon and look at my blank computer screen, other neural networks will be activated, those that carry information about computers, technology, screens and rectangles in general. I will no longer be thinking about pigeons, but my pigeon networks will still be active enough for me to recall that I was looking at a pigeon previously and I might glance out of the window to see if it is still there.

Every time my long-term neural networks are activated by incoming sensory information, they are updated. If the same information comes in repeatedly the connections within the network are strengthened. What’s not clear is how much attention needs to be paid to incoming information in order for it to update long-term memory. Large amounts of information about the changing environment are flowing through working memory all the time, and evidence from brain-damaged patients suggests that long-term memory can be changed even if we’re not paying attention to the information that activates it.

the central executive

Information from incoming sensory information and from long-term memory is fed forward to the central executive. The function of the central executive is a bit like the function of a CCTV control room. According to Antonio Damasio it monitors, evaluates and responds to information from three main sources;

• the external environment (sensory information)
• the internal environment (body states) and
• previous representations of the external and internal environments (carried in the pattern of connections in neural networks).

One difference is that loops/sketchpads/buffers and the system that monitors them consist of networks of interconnected neurons, not TV screens (obviously). Another is that there isn’t anybody watching the brain’s equivalent of the CCTV screens – it’s an automated process. We become aware of information in the loops/sketchpads/buffers only if we need to be aware of it – so we are usually conscious of what’s happening in the external environment or if there are significant changes internally or externally.

The central executive constantly compares the streams of incoming information. It responds to it via networks of neurons that feed back information to other areas of the brain. If the environment has changed significantly, or an interesting or threatening event occurs, or we catch sight of something moving on the periphery of our field of vision, or experience sudden discomfort or pain, the feedback from the central executive ensures that we pay attention to that, rather than anything else. It’s important to note that information from the body includes information about our overall physiological state, including emotions.

So a schematic general diagram of how working memory fits in with information processing in the brain would look something like this:

Slide1

It’s important to note that we still don’t have a clear map of the information processing pathways. Researchers keep coming across different potential loops/sketchpads/buffers and there’s evidence that the feedback and feed-forward pathways are more complex than this diagram shows.

I began this post by suggesting that an over-simplified model of working memory could be misleading. I’ll explain my reasons in more detail in the next post, but first I want to highlight an important implication of the way incoming sensory information is handled by the brain.

pre-conscious processing

A great deal of sensory information is processed by the brain pre-consciously. Advocates of direct instruction emphasise the importance of chunking information because it increases the capacity of working memory. A popular example is the way expert chess players can hold simultaneously in working memory several different configurations of chess pieces, chunking being seen as something ‘experts’ do. But it’s important to remember that the brain chunks information automatically if we’re exposed to it frequently enough. That’s how we recognise faces, places and things – most three year-olds are ‘experts’ in their day-to-day surroundings because they have had thousands of exposures to familiar faces, places and things. They don’t have to sit down and study these things in order to chunk the fragments of information that make up faces, places and things – their visual cortex does it automatically.

This means that a large amount of information going through young children’s working memory is already chunked. We don’t know to what extent the central executive has to actively pay attention to that information in order for it to change long-term memory, but pre-conscious chunking does suggest that a good deal of learning happens implicitly. I’ll comment on this in more detail in my next post.

cognitive load and learning

In the previous two posts I discussed the model of working memory used by Kirschner, Sweller & Clark and how working memory and long-term memory function. The authors emphasise that their rejection of minimal guidance approaches to teaching is based on the limited capacity of working memory in respect of novel information, and that even if experts might not need much guidance “…nearly everyone else thrives when provided with full, explicit instructional guidance (and should not be asked to discover any essential content or skills)” (Clark, Kirschner & Sweller, p.6) Whether they are right or not depends on what they mean by ‘novel’ information.

So what’s new?

Kirschner, Sweller & Clark define novel information as ‘new, yet to be learned’ information that has not been stored in long-term memory (p.77). But novelty isn’t a simple case of information either being yet–to-be-learned or stored-in-long-term memory. If I see a Russian sentence written in Cyrillic script, its novelty value to me on a scale of 1-10 would be about 9. I can recognise some Cyrillic letters and know a few Russian words, but my working memory would be overloaded after about the third letter because of the multiple operations involved in decoding, blending and translating. A random string of Arabic numerals would have a novelty value of about 4, however, because I am very familiar with Arabic numerals; the only novelty would be in their order in the string. The sentence ‘the cat sat on the mat’ would have a novelty value close to zero because I’m an expert at chunking the letter patterns in English and I’ve encountered that sentence so many times.

Because novelty isn’t an either/or thing but sits on a sliding scale, and because the information coming into working memory can vary between simple and complex, that means that ‘new, yet to be learned’ information can vary in both complexity and novelty.

You could map it on a 2×2 matrix like this;

novelty, complexity & cognitive load

novelty, complexity & cognitive load

A sentence such as ‘the monopsonistic equilibrium at M should now be contrasted with the equilibrium that would obtain under competitive conditions’ is complex (it contains many bits of information) but its novelty content would depend on the prior knowledge of the reader. It would score high on both the novelty and complexity scales of the average 5 year old. I don’t understand what the sentence means, but I do understand many of the words, so it would be mid-range in both novelty and complexity for me. An economist would probably give it a 3 for complexity but 0 for novelty. Trying to teach a 5 year-old what the sentence meant would completely overload their working memory. But it would be a manageable challenge for mine, and an economist would probably feel bored.

Kirschner, Sweller & Clark reject ‘constructivist, discovery, problem-based, experiential and inquiry-based approaches’ on the basis that they overload working memory and the excessive cognitive load means that learners don’t learn as efficiently as they would using explicit direct instruction. If only it were that simple.

‘Constructivist, discovery, problem-based, experiential and inquiry-based approaches’ were adopted initially not because teachers preferred them or because philosophers thought they were a good idea, but because by the end of the 19th century explicit, direct instruction – the only game in town for fledgling mass education systems – clearly wasn’t as effective as people had thought it would be. Alternative approaches were derived from three strategies that young children apply when learning ‘naturally’.

How young children learn

Human beings are mammals and young mammals learn by applying three key learning strategies which I’ll call ‘immersion’, trial-and-error and modelling (imitating the behaviour of other members of their species). By ‘strategy’, I mean an approach that they use, not that the baby mammals sit down and figure things out from first principles; all three strategies are outcomes of how mammals’ brains work.

Immersion

Most young children learn to walk, talk, feed and dress themselves and acquire a vast amount of information about their environment with very little explicit, direct instruction. And they acquire those skills pretty quickly and apparently effortlessly. The theory was that if you put school age children in a suitable environment, they would pick up other skills and knowledge equally effortlessly, without the boredom of rote-learning and the grief of repeated testing. Unfortunately, what advocates of discovery, problem-based, experiential and inquiry-based learning overlooked was the sheer amount of repetition involved in young children learning ‘naturally’.

Although babies’ learning is kick-started by some hard-wired processes such as reflexes, babies have to learn to do almost everything. They repeatedly rehearse their gross motor skills, fine motor skills and sensory processing. They practice babbling, crawling, toddling and making associations at every available opportunity. They observe things and detect patterns. A relatively simple skill like face-recognition, grasping an object or rolling over might only take a few attempts. More complex skills like using a spoon, crawling or walking take more. Very complex skills like using language require many thousands of rehearsals; it’s no coincidence that children’s speech and reading ability take several years to mature and their writing ability (an even more complex skill) doesn’t usually mature until adulthood.

The reason why children don’t learn to read, do maths or learn foreign languages as ‘effortlessly’ as they learn to walk or speak in their native tongue is largely because of the number of opportunities they have to rehearse those skills. An hour a day of reading or maths and a couple of French lessons a week bears no resemblance to the ‘immersion’ in motor development and their native language that children are exposed to. Inevitably, it will take them longer to acquire those skills. And if they take an unusually long time, it’s the child, the parent, the teacher or the method of that tends to be blamed, not the mechanism by which the skill is acquired.

Trial-and-error

The second strategy is trial-and-error. It plays a key role in the rehearsals involved in immersion, because it provides feedback to the brain about how the skill or knowledge is developing. Some skills, like walking, talking or handwriting, can only be acquired through trial-and-error because of the fine-grained motor feedback that’s required. Learning by trial-and-error can offer very vivid, never-forgotten experiences, regardless of whether the initial outcome is success or failure.

Modelling

The third strategy is modelling – imitating the behaviour of other members of the species (and sometimes other species or inanimate objects). In some cases, modelling is the most effective way of teaching because it’s difficult to explain (or understand) a series of actions in verbal terms.

Cognitive load

This brings us back to the issue of cognitive load. It isn’t the case that immersion, trial-and-error and modelling or discovery, problem-based, experiential and inquiry-based approaches always impose a high cognitive load, and that explicit direct instruction doesn’t. If that were true, young children would have to be actively taught to walk and talk and older ones would never forget anything. The problem with all these educational approaches is that they have all initially been seen as appropriate for teaching all knowledge and skills and have subsequently been rejected as ineffective. That’s not at all surprising, because different types of knowledge and skill require different strategies for effective learning.

Cognitive load is also affected by the complexity of incoming information and how novel it is to the learner. Nor is cognitive load confined to the capacity of working memory. 40 minutes of explicit, direct novel instruction, even if presented in well-paced working-memory-sized chunks, would pose a significant challenge to most brains. The reason, as I pointed out previously, is because the transfer of information from working memory to long-term memory is a biological process that takes time, resources and energy. Research into changes in the motor cortex suggests that the time involved might be as little as hours, but even that has implications for the pace at which students are expected to learn and how much new information they can process. There’s a reason why someone would find acquiring large amounts of new information tiring – their brain uses up a considerable amount of glucose getting that information embedded in the form of neural connections. The inevitable delay between information coming into the brain and being embedded in long-term memory suggests that down-time is as important as learning time – calling into question the assumption that the longer children spend actively ‘learning’ the more they will know.

Final thoughts

If I were forced to choose between constructivist, discovery, problem-based, experiential and inquiry-based approach to learning or explicit, direct instruction, I’d plump for explicit, direct instruction because the world we live in works according to discoverable principles and it makes sense to teach kids what those principles are, rather than to expect them to figure them out for themselves. However, it would have to be a forced choice, because we do learn through constructing our knowledge and through discovery, problem-solving, experiencing and inquiring as well as by explicit, direct instruction. The most appropriate learning strategy will depend on the knowledge or skill being learned.

The Kirschner, Sweller & Clark paper left me feeling perplexed and rather uneasy. I couldn’t understand why the authors frame the debate about educational approaches in terms of minimal guidance ‘on one side’ and direct instructional guidance ‘on the other’, when self-evidently the debate is more complex than that. Nor why they refer to Atkinson & Shiffrin’s model of working memory when Baddeley & Hitch’s more complex model is so widely accepted as more accurate. Nor why they omit any mention of the biological mechanisms involved in learning; not only are the biological mechanisms responsible for the way working memory and long-term memory operate, they also shed light on why any single educational approach doesn’t work for all knowledge, all skills – or even all students.

I felt it was ironic that the authors place so much emphasis on the way novices think but present a highly complex debate in binary terms – a classic feature of the way novices organise their knowledge. What was also ironic was that despite their emphasis on explicit, direct instruction, they failed to mention several important features of memory that would have helped a lay readership understand how memory works. This is all the more puzzling because some of these omissions (and a more nuanced model of instruction) are referred to in a paper on cognitive load by Paul Kirschner published four years earlier.

In order to fully understand what Kirschner, Sweller & Clark are saying, and to decide whether they were right or not, you’d need to have a fair amount of background knowledge about how brains work. To explain that clearly to a lay readership, and to address possible objections to their thesis, the authors would have had to extend the paper’s length by at least 50%. Their paper is just over 10 000 words long, suggesting that word-count issues might have resulted in them having to omit some points. That said, Educational Psychologist doesn’t currently apply a word limit, so maybe the authors were trying to keep the concepts as simple as possible.

Simplifying complex concepts for the benefit of a lay readership can certainly make things clearer, but over-simplifying them runs the risk of giving the wrong impression, and I think there’s a big risk of that happening here. Although the authors make it clear that explicit direct instruction can take many forms, they do appear to be proposing a one-size fits all approach that might not be appropriate for all knowledge, all skills or all students.

References

Clark, RE, Kirschner, PA & Sweller, J (2012). Putting students on the path to learning: The case for fully guided instruction, American Educator, Spring.

Kirschner, PA (2002). Cognitive load theory: implications of cognitive load theory on the design of learning, Learning and Instruction, 12 1–10.

Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

how working memory works

In my previous post I wondered why Kirschner, Sweller & Clark based their objections to minimal guidance in education on Atkinson & Schiffrin’s 1968 model of memory; it’s a model that assumes a mechanism for memory that’s now considerably out of date. A key factor in Kirschner, Sweller & Clark’s advocacy of direct instructional guidance is the limited capacity of working memory, and that’s what I want to look at in this post.

Other models are available

Atkinson & Shiffrin describe working memory as a ‘short-term store’. It has a limited capacity (around 4-9 bits of information) that it can retain for only a few seconds. It’s also a ‘buffer’; unless information in the short-term store is actively maintained, by rehearsal for example, it will be displaced by incoming information. Kirschner, Sweller & Clark note that ‘two well-known characteristics’ of working memory are its limited duration and capacity when ‘processing novel information’ (p.77), suggesting that their model of working memory is very similar to Atkinson & Shiffrin’s short-term store.

Slide1

In 1974 Alan Baddeley and Graham Hitch proposed a more sophisticated model for working memory that included dedicated auditory and visual information processing components. Their model has been revised in the light of more recent discoveries relating to the function of the prefrontal areas of the brain – the location of ‘working memory’. The Baddeley and Hitch model now looks a bit more complex than Atkinson & Shiffrin’s.

Baddeley & Hitch model

Baddeley & Hitch model

You could argue that it doesn’t matter how complex working memory is, or how the prefrontal areas of the brain work; neither alters the fact that the capacity of working memory is limited. Kirschner, Sweller & Clark question the effectiveness of educational methods involving minimal guidance because they increase cognitive load beyond the capacity of working memory. But Kirschner, Sweller & Clark’s model of working memory appears to be oversimplified and doesn’t take into account the biological mechanisms involved in learning.

Biological mechanisms involved in learning

Making connections

Learning is about associating one thing with another, and making associations is what the human brain does for a living. Associations are represented in the brain by connections formed between neurons; the ‘information’ is carried in the pattern of connections. A particular stimulus will trigger a series of electrical impulses through a particular network of connected neurons. So, if I spot my cat in the garden, that sight will trigger a series of electrical impulses that activates a particular network of neurons; the connections between the neurons represent all the information I’ve ever acquired about my cat. If I see my neighbour’s cat, much of the same neural pathway will be triggered because both cats are cats, it will then diverge slightly because I have acquired different information about each cat.

Novelty value

Neurons make connections with other neurons via synapses. Our current understanding of the role of synapses in information storage and retrieval suggests that new information triggers the formation of new synapses between neurons. If the same associations are encountered repeatedly, the relevant synapses are used repeatedly and those connections between neurons are strengthened, but if synapses aren’t active for a while, they are ‘pruned’. Toddlers form huge numbers of new synapses, but from the age of three through to adulthood, the number reduces dramatically as pruning takes place. It’s not clear whether synapse formation and pruning are pre-determined developmental phases or whether they happen in response to the kind of information that the brain is processing. Toddlers are exposed to vast amounts of novel information, but novelty rapidly tails off as they get older. Older adults tend to encounter very little novel information, often complaining that they’ve ‘seen it all before’.

The way working memory works

Most of the associations made by the brain occur in the cortex, the outer layer of the brain. Sensory information processed in specialised areas of cortex is ‘chunked’ into coherent wholes – what we call ‘perception’. Perceptual information is further chunked in the frontal areas of the brain to form an integrated picture of what’s going on around and within us. The picture that’s emerging from studies of prefrontal cortex is that this area receives, attends to, evaluates and responds to information from many other areas of the brain. It can do this because patterns of the electrical activity from other brain areas are maintained in prefrontal areas for a short time whilst evaluation takes place. As Antonio Damasio points out in Descartes’ Error, the evaluation isn’t always an active, or even a conscious process; there’s no little homunculus sitting at the front of the brain figuring out what information should take priority. What does happen is that streams of incoming information compete for attention. What gets attention depends on what information is coming in at any one time. If something happens that makes you angry during a maths lesson, you’re more likely to pay attention to that than to solving equations. During an exam, you might be concentrating so hard that you are unaware of anything happening around you.

The information coming into prefrontal cortex varies considerably. There’s a constant inflow from three main sources, of:

• real-time information from the environment via the sense organs;
• information about the physiological state of the body, including emotional responses to incoming information;
• information from the neural pathways formed by previous experience and activated by that sensory and physiological input (Kirschner, Sweller & Clark would call this long-term memory).

Working memory and long-term memory

‘Information’ and models of information processing are abstract concepts. You can’t pick them up or weigh them, so it’s tempting to think of information processing in the brain as an abstract process, involving rather abstract forces like electrical impulses. It would be easy to form the impression from Kirschner, Sweller & Clark’s model that well-paced, bite-sized chunks of novel information will flow smoothly from working memory to long-term memory, like water between two tanks. But the human brain is a biological organ, and it retains and accesses information using some very biological processes. Developing new synapses involves physical changes to the structure of neurons, and those changes take time, resources and energy. I’ll return to that point later, but first I want to focus on something that Kirschner, Sweller & Clark say about the relationship between working memory and long-term memory that struck me as a bit odd;

The limitations of working memory only apply to new, yet to be learned information that has not been stored in long-term memory. New information such as new combinations of numbers or letters can only be stored for brief periods with severe limitations on the amount of such information that can be dealt with. In contrast, when dealing with previously learned information stored in long-term memory, these limitations disappear.” (p77)

This statement is odd because it doesn’t tally with Atkinson & Shiffrin’s concept of the short-term store, and isn’t supported by decades of experimental work that show that capacity limitations apply to all information in working memory, regardless of its source. But Kirschner, Sweller & Clark go on to qualify their claim;

In the sense that information can be brought back from long-term memory to working memory over indefinite periods of time, the temporal limits of working memory become irrelevant.” (p77).

I think I can see what they’re getting at; because information is stored permanently in long-term memory it doesn’t rapidly fade away and you can access it any time you need to. But you have to access it via working memory, so it’s still subject to working memory constraints. I think the authors are referring implicitly to two ways in which the brain organizes information and which increase the capacity of working memory – chunking and schemata.

Chunking

If the brain frequently encounters small items of information that are usually associated with each other, it eventually ‘chunks’ them together and then processes them automatically as single units. George Miller, who in the 1950s did some pioneering research into working memory capacity, noted that people familiar with the binary notation then in widespread use by computer programmers, didn’t memorise random lists of 1s and 0s as random lists, but as numbers in the decimal system. So 10 would be remembered as 2, 100 as 8, 101 as 9 and so on. In this way, very long strings of 1s and 0s could be held in working memory in the form of decimal numbers that would automatically be translated back into 1s and 0s when the people taking part in the experiments were asked to recall the list. Morse code experts do the same; they don’t read messages as a series of dots and dashes, but chunk up the patterns of dots and dashes into letters and then into words. Exactly the same process occurs in reading, but we don’t call it chunking, we call it learning to read. Chunking effectively increases the capacity of working memory – but it doesn’t increase it by very much. Curiously, although Kirschner, Sweller & Clark refer to a paper by Egan and Schwartz that’s explicitly about chunking, they don’t mention chunking as such.

Schemata

What they do mention is the concept of the schema, particularly those of chess players. In the 1940s Adriaan de Groot discovered that expert chess players memorise a vast number of configurations of chess pieces on a board; he called each particular configuration a schema. I get the impression that Kirschner, Sweller & Clark see schemata and chunking as synonymous, even though a schema usually refers to a meta-level way of organising information, like a life-script or an overview, rather than an automatic processing of several bits of information as one unit. It’s quite possible that expert chess players do automatically read each configuration of chess pieces as one unit, but de Groot didn’t call it ‘chunking’ because his research was carried out a decade before George Miller coined the term.

Thinking about everything at once

Whether you call them chunks or schemata, what’s clear is that the brain has ways of increasing the amount of information held in working memory. Expert chess players aren’t limited to thinking about the four or five possible moves for one piece, but can think about four or five possible configurations for all pieces. But it doesn’t follow that the limitations of working memory in relation to long-term memory disappear as a result.

I mentioned in my previous post what information is made accessible via my neural networks if I see an apple. If I free-associate, I think of apples – apple trees – should we cover our apple trees if it’s wet and windy after they blossom? – will there be any bees to pollinate them? – bee viruses – viruses in ancient bodies found in melted permafrost – bodies of climbers found in melted glaciers, and so on. Because my neural connections represent multiple associations I can indeed access vast amounts of information stored in my brain. But I don’t access it all simultaneously. That’s just as well, because if I could access all that information at once my attempts to decide what to do with our remaining windfall apples would be thwarted by totally irrelevant thoughts about mountain rescue teams and St Bernard dogs. In short, if information stored in long-term memory weren’t subject to the capacity constraints of working memory, we’d never get anything done.

Chess masters (or ornithologists or brain surgeons) have access to vast amounts of information, but in any given situation they don’t need to access it all at once. In fact, accessing it all at once would be disastrous because it would take forever to eliminate information they didn’t need. At any point in any chess game, only a few configurations of pieces are possible, and that number is unlikely to exceed the capacity of working memory. Similarly, even if an ornithologist/brain surgeon can recognise thousands of species of birds/types of brain injury, in any given environment, most of those species/injuries are likely to be irrelevant, so don’t even need to be considered. There’s a good reason for working memory’s limited capacity and why all the information we process is subject to that limit.

In the next post, I want to look at how the limits of working memory impact on learning.

References

Atkinson, R, & Shiffrin, R (1968). Human memory: A proposed system and its control processes. In K. Spence & J. Spence (Eds.), The psychology of learning and motivation (Vol. 2, pp. 89–195). New York: Academic Press
Damasio, A (1994). Descartes’ Error, Vintage Books.
Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.