cognitive load and learning

In the previous two posts I discussed the model of working memory used by Kirschner, Sweller & Clark and how working memory and long-term memory function. The authors emphasise that their rejection of minimal guidance approaches to teaching is based on the limited capacity of working memory in respect of novel information, and that even if experts might not need much guidance “…nearly everyone else thrives when provided with full, explicit instructional guidance (and should not be asked to discover any essential content or skills)” (Clark, Kirschner & Sweller, p.6) Whether they are right or not depends on what they mean by ‘novel’ information.

So what’s new?

Kirschner, Sweller & Clark define novel information as ‘new, yet to be learned’ information that has not been stored in long-term memory (p.77). But novelty isn’t a simple case of information either being yet–to-be-learned or stored-in-long-term memory. If I see a Russian sentence written in Cyrillic script, its novelty value to me on a scale of 1-10 would be about 9. I can recognise some Cyrillic letters and know a few Russian words, but my working memory would be overloaded after about the third letter because of the multiple operations involved in decoding, blending and translating. A random string of Arabic numerals would have a novelty value of about 4, however, because I am very familiar with Arabic numerals; the only novelty would be in their order in the string. The sentence ‘the cat sat on the mat’ would have a novelty value close to zero because I’m an expert at chunking the letter patterns in English and I’ve encountered that sentence so many times.

Because novelty isn’t an either/or thing but sits on a sliding scale, and because the information coming into working memory can vary between simple and complex, that means that ‘new, yet to be learned’ information can vary in both complexity and novelty.

You could map it on a 2×2 matrix like this;

novelty, complexity & cognitive load

novelty, complexity & cognitive load

A sentence such as ‘the monopsonistic equilibrium at M should now be contrasted with the equilibrium that would obtain under competitive conditions’ is complex (it contains many bits of information) but its novelty content would depend on the prior knowledge of the reader. It would score high on both the novelty and complexity scales of the average 5 year old. I don’t understand what the sentence means, but I do understand many of the words, so it would be mid-range in both novelty and complexity for me. An economist would probably give it a 3 for complexity but 0 for novelty. Trying to teach a 5 year-old what the sentence meant would completely overload their working memory. But it would be a manageable challenge for mine, and an economist would probably feel bored.

Kirschner, Sweller & Clark reject ‘constructivist, discovery, problem-based, experiential and inquiry-based approaches’ on the basis that they overload working memory and the excessive cognitive load means that learners don’t learn as efficiently as they would using explicit direct instruction. If only it were that simple.

‘Constructivist, discovery, problem-based, experiential and inquiry-based approaches’ were adopted initially not because teachers preferred them or because philosophers thought they were a good idea, but because by the end of the 19th century explicit, direct instruction – the only game in town for fledgling mass education systems – clearly wasn’t as effective as people had thought it would be. Alternative approaches were derived from three strategies that young children apply when learning ‘naturally’.

How young children learn

Human beings are mammals and young mammals learn by applying three key learning strategies which I’ll call ‘immersion’, trial-and-error and modelling (imitating the behaviour of other members of their species). By ‘strategy’, I mean an approach that they use, not that the baby mammals sit down and figure things out from first principles; all three strategies are outcomes of how mammals’ brains work.


Most young children learn to walk, talk, feed and dress themselves and acquire a vast amount of information about their environment with very little explicit, direct instruction. And they acquire those skills pretty quickly and apparently effortlessly. The theory was that if you put school age children in a suitable environment, they would pick up other skills and knowledge equally effortlessly, without the boredom of rote-learning and the grief of repeated testing. Unfortunately, what advocates of discovery, problem-based, experiential and inquiry-based learning overlooked was the sheer amount of repetition involved in young children learning ‘naturally’.

Although babies’ learning is kick-started by some hard-wired processes such as reflexes, babies have to learn to do almost everything. They repeatedly rehearse their gross motor skills, fine motor skills and sensory processing. They practice babbling, crawling, toddling and making associations at every available opportunity. They observe things and detect patterns. A relatively simple skill like face-recognition, grasping an object or rolling over might only take a few attempts. More complex skills like using a spoon, crawling or walking take more. Very complex skills like using language require many thousands of rehearsals; it’s no coincidence that children’s speech and reading ability take several years to mature and their writing ability (an even more complex skill) doesn’t usually mature until adulthood.

The reason why children don’t learn to read, do maths or learn foreign languages as ‘effortlessly’ as they learn to walk or speak in their native tongue is largely because of the number of opportunities they have to rehearse those skills. An hour a day of reading or maths and a couple of French lessons a week bears no resemblance to the ‘immersion’ in motor development and their native language that children are exposed to. Inevitably, it will take them longer to acquire those skills. And if they take an unusually long time, it’s the child, the parent, the teacher or the method of that tends to be blamed, not the mechanism by which the skill is acquired.


The second strategy is trial-and-error. It plays a key role in the rehearsals involved in immersion, because it provides feedback to the brain about how the skill or knowledge is developing. Some skills, like walking, talking or handwriting, can only be acquired through trial-and-error because of the fine-grained motor feedback that’s required. Learning by trial-and-error can offer very vivid, never-forgotten experiences, regardless of whether the initial outcome is success or failure.


The third strategy is modelling – imitating the behaviour of other members of the species (and sometimes other species or inanimate objects). In some cases, modelling is the most effective way of teaching because it’s difficult to explain (or understand) a series of actions in verbal terms.

Cognitive load

This brings us back to the issue of cognitive load. It isn’t the case that immersion, trial-and-error and modelling or discovery, problem-based, experiential and inquiry-based approaches always impose a high cognitive load, and that explicit direct instruction doesn’t. If that were true, young children would have to be actively taught to walk and talk and older ones would never forget anything. The problem with all these educational approaches is that they have all initially been seen as appropriate for teaching all knowledge and skills and have subsequently been rejected as ineffective. That’s not at all surprising, because different types of knowledge and skill require different strategies for effective learning.

Cognitive load is also affected by the complexity of incoming information and how novel it is to the learner. Nor is cognitive load confined to the capacity of working memory. 40 minutes of explicit, direct novel instruction, even if presented in well-paced working-memory-sized chunks, would pose a significant challenge to most brains. The reason, as I pointed out previously, is because the transfer of information from working memory to long-term memory is a biological process that takes time, resources and energy. Research into changes in the motor cortex suggests that the time involved might be as little as hours, but even that has implications for the pace at which students are expected to learn and how much new information they can process. There’s a reason why someone would find acquiring large amounts of new information tiring – their brain uses up a considerable amount of glucose getting that information embedded in the form of neural connections. The inevitable delay between information coming into the brain and being embedded in long-term memory suggests that down-time is as important as learning time – calling into question the assumption that the longer children spend actively ‘learning’ the more they will know.

Final thoughts

If I were forced to choose between constructivist, discovery, problem-based, experiential and inquiry-based approach to learning or explicit, direct instruction, I’d plump for explicit, direct instruction because the world we live in works according to discoverable principles and it makes sense to teach kids what those principles are, rather than to expect them to figure them out for themselves. However, it would have to be a forced choice, because we do learn through constructing our knowledge and through discovery, problem-solving, experiencing and inquiring as well as by explicit, direct instruction. The most appropriate learning strategy will depend on the knowledge or skill being learned.

The Kirschner, Sweller & Clark paper left me feeling perplexed and rather uneasy. I couldn’t understand why the authors frame the debate about educational approaches in terms of minimal guidance ‘on one side’ and direct instructional guidance ‘on the other’, when self-evidently the debate is more complex than that. Nor why they refer to Atkinson & Shiffrin’s model of working memory when Baddeley & Hitch’s more complex model is so widely accepted as more accurate. Nor why they omit any mention of the biological mechanisms involved in learning; not only are the biological mechanisms responsible for the way working memory and long-term memory operate, they also shed light on why any single educational approach doesn’t work for all knowledge, all skills – or even all students.

I felt it was ironic that the authors place so much emphasis on the way novices think but present a highly complex debate in binary terms – a classic feature of the way novices organise their knowledge. What was also ironic was that despite their emphasis on explicit, direct instruction, they failed to mention several important features of memory that would have helped a lay readership understand how memory works. This is all the more puzzling because some of these omissions (and a more nuanced model of instruction) are referred to in a paper on cognitive load by Paul Kirschner published four years earlier.

In order to fully understand what Kirschner, Sweller & Clark are saying, and to decide whether they were right or not, you’d need to have a fair amount of background knowledge about how brains work. To explain that clearly to a lay readership, and to address possible objections to their thesis, the authors would have had to extend the paper’s length by at least 50%. Their paper is just over 10 000 words long, suggesting that word-count issues might have resulted in them having to omit some points. That said, Educational Psychologist doesn’t currently apply a word limit, so maybe the authors were trying to keep the concepts as simple as possible.

Simplifying complex concepts for the benefit of a lay readership can certainly make things clearer, but over-simplifying them runs the risk of giving the wrong impression, and I think there’s a big risk of that happening here. Although the authors make it clear that explicit direct instruction can take many forms, they do appear to be proposing a one-size fits all approach that might not be appropriate for all knowledge, all skills or all students.


Clark, RE, Kirschner, PA & Sweller, J (2012). Putting students on the path to learning: The case for fully guided instruction, American Educator, Spring.

Kirschner, PA (2002). Cognitive load theory: implications of cognitive load theory on the design of learning, Learning and Instruction, 12 1–10.

Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

11 thoughts on “cognitive load and learning

  1. Ok. I am still non the wiser. My issues with discovery learning systems are:

    They provide a very rich environment. This rich environment puts novices in the position of having to choose which parts to pay attention to without the means to know which bits matter. The richness itself provides that problem. It will provide a high cognitive load. I get the bit about reducing cognitive load by having greater familiarity.

    They assume that learning is a natural process and that is just not true for the kinds of learning we expect children to do in school.

    Trial and error is a very inefficient way and can lead to ‘wrong’ ways of doing things being learned. Eg trial and error with holding a pen can lead to problems with writing later.

    But thanks again for a piece that I have to think about and probably will need to reread.

    • Hi Peter

      My apologies for missing your comment earlier.

      I accept your point about rich environments. But isn’t that a problem with the design of the learning situation, rather than the principle? My concept of discovery learning is that one observes something and then gets more information about it. Maybe the orthodox view is different. Since most children’s day-to-day environments are familiar to them, and what catches their attention is something that’s novel, if you limit the novelty, then they know what to focus on.

      Learning *is* a natural process. The education system in England is only 150 years old, but English people still managed to learn a fair amount in the thousands of years before it arrived. For example, I’ve been using computers regularly for 25 years. During that time, I’ve had the grand total of two half-days formal training – for software that’s now obsolete. But I, like millions of other adults who attended school when computers were things that filled entire rooms and were programmed using punch cards, still manage to use a PC competently. And some people have even found out for themselves, how PCs work.

      Whether the ‘kinds of learning we expect children to do in school’ are natural processes or not is another matter.

      The core of the problem, to me, isn’t the methods used, it’s the methods being used inappropriately. If you teach everything by discovery learning, or everything by trial-and-error, of course they won’t be successful because they’re not appropriate for learning everything. Any more that teaching everything by direct instruction will be.

  2. I would agree with much of this apart from this:

    “If I were forced to choose between constructivist, discovery, problem-based, experiential and inquiry-based approach to learning or explicit, direct instruction, I’d plump for explicit, direct instruction because the world we live in works according to discoverable principles and it makes sense to teach kids what those principles are, rather than to expect them to figure them out for themselves”

    Here you seem to be engaging with one of Kirschner et al’s straw men. No body ever said that “kids” should be left to discover stuff for themselves. Rather that teachers should be sympathetic to the fact that cognition works by re-constructing information subjectively. Even if that information is based on information constructed socially.

    I would suggest that direct instruction is more of a threat to working memory than discovery based learning for obvious reasons.

  3. Isn’t the point that you can have too much cognitive load but also too little? Some load (more often delivered by activity) is required in order to push that information through into long-term memory? Which supports the position you are arguing – that these are complementary, not antagonistic techniques.

    It also suggests to me the critical importance of sequencing – important in the light of the contribution that can be made in this respect by adaptive ed-tech systems.

    My argument with progressive, discovery learning is when it goes beyond being a technique for laying down long-term memory – i.e. a form of practice – and seeks to challenge the idea of formal learning objectives. If its all about discovery, then who are the teachers to tell the students what it is they should be discovering?

  4. One point Re CLT concerns the fact that since the introduction of the germane load category, CLT can’t be falsified any more. Lets say we act under the assumption that CLT is true and do a quasi experiment. We tweak a variable concerning cognitive load. First a low load. They improve: the load probably wasn’t too high. They don’t improve: the load might still have been too high. Now a high load. They improve: what!? They must have used schemas because the load ought have been too high. Phew, CLT is still correct, because they just were keen ‘schemata’ makers. They don’t improve: yes, you see, that’s because the load was too high. Whatever the outcome it’s win-win, and it doesn’t help us any further. Since germane load was introduced we had better focus on these schemata and refrain from too quick judgments saying ‘too much load’. There are more good points and links, especially to papers from De Jong and Moreno, on

  5. Ok…I need help! I have read, read and re read so much information on Information Processing and found it so conflicting. I am hoping that someone can help clarify the implications associated with the acceptance of the Atkinson & Shiffrin Information Processing Model versus that of Baddeley and Hitch. I understand the more complex explanation of working memory based on available research but I am confused with respect to the implications of this change on long term memory storage and learning.

    As far a CLT is concerned, my understanding was that working memory is limited but if related schema can be accessed, limitations are reduced (not eliminated) as one schemata can represent hundreds of pieces of information – thus significantly impacting processing capacity.

    • Hi Lane

      It’s not so much a case of Atkinson & Schiffrin’s model *versus* Baddeley and Hitch’s as A&S *followed by* B&H.

      The B&H model was important for two reasons. First it showed that WM wasn’t just a simple information processing unit but involved independent component subsystems. Second, the independent nature of the subsystems implied that the capacity of WM was less limited than previously thought. People might only be able to memorise 7+/- 2 digit strings, but whilst memorising them they are simultaneously aware of information coming in from other sensory domains.

      Psychology students need to know about the A&S model so they can see how our understanding of WM has developed over time. But I have no idea why Kirschner Sweller and Clark used it instead of the B&H model to explain WM to teachers. The A&S model is certainly simpler, but it’s also misleading because it gives the impression that WM is part of a neat, linear information processing system involving clearly defined ‘stores’ of information, rather than it being part of a complex network involving multiple feedback and feedforward loops.

      Why is this relevant to teachers?

      1. Teachers need to know that the A&S model has been superseded by a more complex model. Although they might only need a simplified model in order to understand that WM capacity is limited, they shouldn’t assume that the limited capacity of WM is the only cognitive factor they need to take into account.

      2. Teachers need to pay attention to the sensory modes they are using to teach. If all information is delivered via ‘teacher talk’, there’s a risk of overloading students’ auditory processing capacity even if teachers focus on only three or four new items of information. Auditory information can by definition be processed only in sequence (that includes reading), but a great deal of visual information can be taken in at a glance. The limits of WM need to be weighed against students getting either bored or ‘listened out’. Audio-visual presentation, for example, can maximise learning without overloading WM.

      3. Students’ brains are constantly monitoring their external and internal environment and relevant information finds its way into WM. Students’ physical health, their emotional state, previous episodic memories and the conditions in the classroom all impact on WM capacity, but you wouldn’t know that to look at the A&S model. The B&H model isn’t exhaustive – it explicitly focuses on cognitive factors in WM – but it does remind us that students are biological organisms not computers.

      As for schemata… A schema is essentially a framework, an overview of the way items of information are linked together. Although it shapes the way the items are linked and is shaped by them, a schema doesn’t consist of the items of information themselves.

      For example, my schema for chemistry consists of few interrelated concepts; subatomic entities, atoms, molecules, bonding, emergent properties and periodicity. That gives me a framework that I can retain in WM for understanding any new item of information about chemicals and it helps me retrieve items of information from LTM. It doesn’t guarantee that I’ll remember all the facts about chemistry I’ve ever learned, or that I can hold hundreds of pieces of information about chemistry in WM simultaneously.

      There seems to be a perception doing the rounds that schemata are essentially chunking on a larger scale. They’re not, even though the same underlying biological mechanism appears to be responsible for both.

      Chunking involves bits of information being so tightly coupled that they are always accessed simultaneously and treated as single items by WM. A schema is a framework that links related items but the linkages vary considerably in strength and so does the time taken to retrieve them. A schema can activate representations of items of related information so retrieval is faster, and that reduces cognitive load because WM isn’t clogged up for several seconds at a time with a search task.

      Going back to my chemistry example, if someone says the word ‘chlorine’, I immediately recall some of the properties of chlorine and the names of the other halogens. But I can’t retain all of those items in WM at once – there are too many of them. If I were a research chemist, I might chunk the properties of chlorine or the names of the halogens into one item, but I’ve never seen any evidence that that happens. Chunking appears to apply only to low level, highly consistent bits of information, not to more complex, more variable bits of information like chemical properties.

      I hope that makes sense.

      • Wow! Thank you so much for the thorough response! I was hoping for a couple of sentences and perhaps guidance in terms of some reading I should do. It all made so much sense and has enabled me to engage in further reading with greater understanding.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s