direct instruction: the evidence

A discussion on Twitter raised a lot of questions about working memory and the evidence supporting direct instruction cited by Kirschner, Sweller and Clark. I couldn’t answer in 140 characters, so here’s my response. I hope it covers all the questions.

Kirschner Sweller & Clark’s thesis is;

• working memory capacity is limited
• constructivist, discovery, problem-based, experiential, and inquiry-based teaching (minimal guidance) all overload working memory and
• evidence from studies investigating efficacy of different methods supports the superiority of direct instruction.
Therefore, “In so far as there is any evidence from controlled studies, it almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners.” (p.83)

Sounds pretty unambiguous – but it isn’t.

1. Working memory (WM) isn’t simple. It includes several ‘dissociable’ sensory buffers and a central executive that monitors, attends to and responds to sensory information, information from the body and information from long term memory (LTM) (Wagner, Bunge & Badre, 2004; Damasio, 2006).

2. Studies comparing minimal guidance with direct instruction are based on ‘pure’ methods. Sweller’s work on cognitive load theory (CLT) (Sweller, 1988) was based on problems involving use of single buffer/loop e.g. mazes, algebra. New items coming into the buffer displace older items, so buffer capacity would be limiting factor. But real-world problems tend to involve different buffers, so items in the buffers can be easily maintained while they are manipulated by the central executive. For example, I can’t write something complex and listen to Radio 4 at the same time because my phonological loop can’t cope. But I can write and listen to music, or listen to Radio 4 whilst I cook a new recipe because I’m using different buffers. Discovery, problem-based, experiential, and inquiry-based teaching in classrooms tends to more closely resemble real world situations than the single-buffer problems used by Sweller to demonstrate the concept of cognitive load, so the impact of the buffer limit would be lessened.

3. For example, Klahr & Nigam (2004) point out that because there’s no clear definition of discovery learning, in their experiment involving a scientific concept they ‘magnified the difference between the two instructional treatments’ – ie used an ‘extreme type’ of both methods – that’s unlikely to occur in any classroom. Essentially they disproved the hypothesis that children always learn better by discovering things for themselves; but children are unlikely to ‘discover things for themselves’ in circumstances like those in the Klahr & Nigam study.

It’s worth noting that 8 of the children in their study figured out what to do at the outset, so were excluded from the results. And 23% of the direct instruction children didn’t master the concept well enough to transfer it.

That finding – that some learners failed to learn even when direct instruction was used, and that some learners might benefit from less direct instruction, comes up time and again in the evidence cited by Kirschner, Sweller and Clark, but gets overlooked in their conclusion.

I can quite see why educational methods using ‘minimal instruction’ might fail, and agree that proponents of such methods don’t appear to have taken much notice of such research findings as there are. But the findings are not unambiguous. It might be true that the evidence ‘almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners’ [my emphasis] but teachers aren’t faced with that forced choice. Also the evidence doesn’t show that direct, strong instructional guidance is always effective for all learners. I’m still not convinced that Kirschner, Sweller & Clark’s conclusion is justified.


References

Damasio, A (2006) Descartes’ Error. Vintage Books
Klahr, D & Klahr, D, & Nigam, M. (2004). The equivalence of learning paths in early
science instruction: Effects of direct instruction and discovery learning.
Psychological Science, 15, 661–667.
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning.
Cognitive Science, 12, 257–285.
Wagner, A.D., Bunge, S.A. & Badre, D. (2004). Cognitive control, semantic memory and priming: Contributions from prefontal cortex. In M. S. Gazzaniga (Ed.) The Cognitive Neurosciences (3rd edn.). Cambridge, MA: MIT Press.

Advertisements

education reform: evaluating the evidence

Over the last few weeks I’ve found myself moving from being broadly sympathetic to educational ‘reform’ to being quite critical of it. One comment on my blog was “You appear to be doing that thing where you write loads, but it is hard to identify any clear points.” Point taken. I’ll see what I can do in this post.

my search for the evidence

I’ve been perplexed by the ideas underpinning the current English education system since my children started encountering problems with it about a decade ago. After a lot of searching, I came to the conclusion that the entire system was lost in a constructivist wilderness. I joined the TES forum to find out more, and discovered that on the whole, teachers weren’t – lost, that is. I came across references to evidence-based educational research and felt hopeful.

Some names were cited; Engelmann, Hirsch, Hattie, Willingham. I pictured a growing body of rigorous research and searched for the authors’ work. Apart from Hattie’s, I couldn’t find much. Willingham was obviously a cognitive psychologist but I couldn’t find his research either. I was puzzled. Most of the evidence seemed to come from magazine articles and a few large-scale studies – notorious for methodological problems. I then heard about Daisy Christodoulou’s book Seven Myths about Education and thought that might give me some pointers. I searched her blog.

In one post, Daisy cites work from the field of information theory by Kirschner, Sweller & Clark, Herb Simon and John Anderson. I was familiar with the last two researchers, but couldn’t open the Simon papers and Anderson’s seemed a bit technical for a general readership. I hadn’t come across the Kirschner, Sweller and Clark reference so I read it. I could see what they were getting at, but thought their reasoning was flawed.

Then it dawned on me. This was the evidence bit of the evidence-based research. It consisted of some early cognitive science/information theory, some large-scale studies and a meta-analysis, together with a large amount of opinion. To me that didn’t constitute a coherent body of evidence. But I was told that there was more to it, which is why I attended the ResearchED conference last weekend. There was more to it, but the substantial body of research didn’t materialise. So where does that leave me?

I still agree with some points that the educational reformers make;

• English-speaking education systems are dominated by constructivist pedagogical approaches
• the implementation of ‘minimal guidance’ approaches has failed to provide children with a good education
• we have a fairly reliable, valid body of knowledge about the world and children should learn about it
• skills tend to be domain-specific
• cognitive science can tell us a lot about how children learn
• the capacity of working memory is limited
• direct instruction is an effective way of teaching.

But I have several reservations that make me uneasy about the education reform ‘movement’.

1. the evidence.

Some is cited frequently. Here’s a summary.

If I’ve understood it correctly, Engelmann and Becker’s DISTAR programme (Direct Instruction System for Teaching Arithmetic and Reading) had far better outcomes for basic maths and reading, higher order cognitive skills (in reading and maths) and responsibility and self-esteem than any other programme in the Project Follow-Through evaluation carried out in 1977.

At around the same time, ED Hirsch had realised that his students’ comprehension of texts was impaired by their poor general knowledge, and in 1983 he published an outline of his concept of what he called ‘cultural literacy’.

A couple of decades later, Daniel Willingham, a cognitive psychologist, started to apply theory from cognitive science to education.

In 2008, John Hattie published Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement – the result of 15 years’ work. The effect sizes Hattie found for various educational factors are ranked here.

Kirschner, Sweller and Clark’s
2006 paper Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching is also often cited. John Sweller developed the concept of ‘cognitive load’ in the 1980s, based on the limited capacity of working memory.

2. the conclusions that can be drawn from the evidence

The DISTAR programme, often referred to as Direct Instruction (capitalised), is clearly very effective for teaching basic maths and literacy. This is an outcome not to be sniffed at, so it would be worth exploring why DISTAR hasn’t been more widely adopted. Proponents of direct instruction often claim it’s because of entrenched ideological opposition; it might also be to do with the fact that it’s a proprietary programme, that teacher input is highly constrained, and that schools have to teach more than basic maths and literacy.

ED Hirsch’s observation that students need prior knowledge before they can comprehend texts involving that knowledge is a helpful one, but has more to say about curriculum design than pedagogy. There are some major issues around all schools using an identical curriculum, who controls the content and how children’s knowledge of the curriculum is assessed.

Daniel Willingham
has written extensively on how findings from cognitive science can be applied to education. Cognitive science is clearly a rich source of useful information. The reason I couldn’t find his research (mainly about procedural memory) appears to be because at some point he changed his middle initial from B to T. I’d assumed it was by someone else.

Although I have doubts about Kirschner Sweller and Clark’s paper, again the contribution from cognitive science is potentially valuable.

John Hattie’s meta-analyses provide some very useful insights into the effectiveness of educational influences.

The most substantial bodies of evidence cited are clearly cognitive science and Hattie’s meta-analyses, which provide a valuable starting point for further exploration of the influences he ranks. Those are my conclusions.

But other conclusions are being drawn – often that the evidence cited above supports the view that direct instruction is the most effective way of teaching and that traditional educational methods (however they are defined) are superior to progressive ones (however they are defined). Those conclusions seem to me to be using the evidence to support beliefs about educational methods, rather than deriving beliefs about educational methods from the evidence.

3. who’s evaluating the evidence?

A key point made by proponents of direct instruction is that students need to have knowledge before they can do anything effective with it. Obviously they do. But this principle appears to be being overlooked by the very people who are emphasizing it.

If you want to understand and apply findings from a meta-analysis you need to be aware of common problems with meta-analyses, how reliable they are, what you need to bear in mind about complex constructs etc. You don’t need to have read everything there is to read about meta-analyses, just to be aware of potential pitfalls. If you want to apply findings from cognitive science, it would help to have at least a broad overview of cognitive science first. That’s because, if you don’t have much prior knowledge, you have no way of knowing how reliable or valid information is. If it’s from a peer-reviewed paper, there’s a good chance it’s reliable because the reviewers would have looked at the theory, the data, the analysis and conclusions. How valid it is (ie how well it maps on to the real world) is another matter. I want to look at some of what ED Hirsch has written to illustrate the point.

Hirsch on psychology and science

Hirsch’s work is often referred to by education reformers. I think he’s right to emphasise the importance of students’ knowledge and I’m impressed by his Core Knowledge framework. There’s now a UK version (slightly less impressive) and his work has influenced the new English National Curriculum. But when I started to check out some of what Hirsch has written I was disconcerted to find that he doesn’t seem to practice what he preaches. In an article in Policy Review he sets out seven ‘reliable general principles’ derived from cognitive science to guide teachers. The principles are sound, even if he has misconstrued ‘chunking’ and views rehearsal as a ‘disagreeable need’.

But Hirsch’s misunderstanding of the history of psychology suggests that not everything he says about psychology might be entirely reliable. He says;

Fifty years ago [the article is dated 2002] psychology was dominated by the guru principle. One declared an allegiance to B.F. Skinner and behaviorism, or to Piaget and stage theory, or to Vygotsky and social theory. Today, by contrast, a new generation of “cognitive scientists,” while duly respectful of these important figures, have leavened their insights with further evidence (not least, thanks to new technology), and have been able to take a less speculative and guru-dominated approach. This is not to suggest that psychology has now reached the maturity and consensus level of solid-state physics. But it is now more reliable than it was, say, in the Thorndike era with its endless debates over “transfer of training.””

This paragraph is riddled with misconceptions. Skinner was indeed an influential psychologist, but behaviourism was controversial – Noam Chomsky was a high profile critic. Piaget was influential in educational circles – but children’s cognitive development formed one small strand of the wide range of areas being investigated by psychologists. Vygotsky’s work has also been influential in education, but it didn’t become widely known in the West until after the publication in 1978 of Mind in Society – a collection of his writings translated into English – so he couldn’t have had ‘guru’ status in psychology in the 1950s. And to suggest that cognitive scientists are ‘duly respectful’ of Skinner, Piaget and Vygotsky as ‘important figures’ in their field, suggests a complete misunderstanding of the roots of cognitive science and of what matters to cognitive scientists. But you wouldn’t be able to question what Hirsch is saying if you had no prior information. And in this article, Hirsch doesn’t support his assertions with references, so you couldn’t check them out.

In a conference address that also forms a chapter in book entitled The Great Curriculum Debate, Hirsch attributes progressive educational methods to the Romantic movement and in turn to religious beliefs, completely overlooking the origins in psychological research of ‘progressive’ educational methodologies and, significantly, the influence of Freud’s work.

In the grand scheme of things, of course, Hirsch’s view of psychology in the 1950s, or his view of the origins of progressive education don’t matter that much. What does matter is that Hirsch himself is seen as something of a guru largely because of his emphasis on students needing to have sound prior knowledge, but here he clearly hasn’t checked out his own.

What’s more important is Hirsch’s view of science. In the last section of his essay Classroom research and cargo cults, entitled ‘on convergence and consensus’, in which he compares classroom research with that from cognitive psychology, he says “independent convergence has always been the hallmark of dependable science“. That’s true in the sense that if several researchers approaching a problem from different directions all come to the same conclusion, they would be reasonably confident that their conclusion was a valid one.

Hirsch illustrates the role of convergence using the example of germ theory. He says “in the nineteenth century, for example, evidence from many directions converged on the germ theory of disease. Once policymakers accepted that consensus, hospital operating rooms, under penalty of being shut down, had to meet high standards of cleanliness.” What’s interesting is that Hirsch slips, almost imperceptibly, from ‘convergence’ into ‘consensus’. In scientific research, convergence is important, but consensus can be extremely misleading because it can be, and often has been, wrong. Ironically, not long before high standards of cleanliness were imposed on hospitals, the consensus had been that cross-contamination theory was wrong, as Semmelweis discovered to his cost. Reliable findings aren’t the same as valid ones.

Hirsch then goes on to say “What policymakers should demand from the [education] research community is consensus.” No they shouldn’t. Consensus can be wrong. What policymakers need to demand from education research is methodological rigour. We already have the relevant expertise, it just needs to be applied to education. Again, if you have no frame of reference against which you can evaluate what Hirsch is saying, you’d be quite likely to assume that he’s right about convergence and consensus – and you’d be none the wiser about the importance of good research design.

what the teachers say

I’m genuinely enthusiastic about teachers wanting to base their practice on evidence. I recognize that this is a work in progress and it’s only just begun. I can quite understand why someone whose teaching has been transformed by a finding from cognitive science might want to share that information as widely as possible. But ironically, some of the teachers involved appear to be doing exactly the opposite of what they recommend teachers do with students.

If you’re not familiar with a knowledge domain, but want to use findings from it, it’s worth getting an overview of it first. This doesn’t involve learning loads of concrete facts, it involves getting someone with good domain knowledge to give you an outline of how it works, so you can see how the concrete facts fit in. It also involves making sure you know what domain-specific skills are required to handle the concrete facts, and whether or not you have them. It also means not making overstated claims. Applying seven principles from cognitive science means you are applying seven principles from cognitive science. That’s all. It’s important to avoid making claims that aren’t supported by the evidence.

What struck me about the supporters of educational reform is that science teachers are noticeable by their absence. Most of the complaints about progressive education seem to relate to English, Mathematics and History. These are all fields that deal with highly abstracted information that is especially vulnerable to constructivist worldviews, so they might have been disproportionately influenced by ‘minimal guidance’ methods. It’s a bit more difficult to take an extreme constructivist approach to physics, chemistry, biology or physical geography because reality tends to intervene quite early on. The irony is that science teachers might be in a better position than teachers of English, Maths or History to evaluate evidence from educational research. And psychology teachers and educational psychologists would have the relevant domain knowledge, which would help avoid reinventing the wheel. I’d recommend getting some of them on board.

“waiter’s memory”

At the ResearchED conference last Saturday, when I queried the usefulness of the diagram of working memory that was being used, I was asked two questions. Here’s the first:

What’s wrong with Willingham’s model of working memory?

Nothing’s wrong with Willingham’s model. As far as I can tell, the diagram of working memory that was being used by teachers at the ResearchED conference had been simplified to illustrate two key points; that working memory has limited capacity and that information can be transferred from working memory to long-term memory and vice-versa.

My reservation about it is that if it’s the only model of working memory you’ve seen, you won’t know what Willingham has left out, nor how working memory fits into the way the brain processes information. And over-simplified models of things, if unconstrained by reality, tend to take on a life of their own which doesn’t help anyone. The left-brain right-brain mythology is a case in point. An oversimplified understanding of the differences between right and left hemispheres followed by a process of Chinese whispers ended up producing some bizarre educational practices.

The second question was this:

What difference would it make if we knew more about how information is processed in the brain?

It’s a good question. The short answer is that if you rely on Willingham’s diagram for your understanding of working memory, you could conclude, as some people have done, that direct instruction is the only way students should be taught. As I hope I showed in my previous post, the way information is processed is more complex than the diagram suggests. I think there are three key points that are worth bearing in mind.

Long-term memory is constantly being updated by incoming sensory information

Children are learning all the time. They learn implicitly, informally and incidentally from their environment as well as explicitly when being taught. It’s well worth utilising that ability to learn from ‘background’ information. Posters, displays, playground activities, informal conversations, and dvds and books used primarily for entertainment, can all exploit implicit, informal and incidental learning that will support and extend and reinforce explicit learning.

We’re not always aware that we are learning

I only need two or three exposures to an unfamiliar place, or face or song before I can recognise it again, and I don’t need to actively pay attention to, or put any effort into recalling, the place, face or song in order to do so. I would have reliably learned new things, but my learning would be implicit. I wouldn’t be able to give accurate directions, describe the face so that someone else would recognise it, or hum the tune. (Daniel Willingham suggests that implicit memory doesn’t exist, but he’s talking about the classification rather than the phenomenon.)

Peter Blenkinsop and I found that we were using different definitions of learning. My definition was; long-term changes to the brain as a result of incoming information. His was; being able to explicitly recall information from long-term memory. Both definitions are valid, but they are different.

Working memory is complex

George Miller’s paper ‘The magical number seven, plus or minus two’ is well worth reading. What’s become clear since Miller wrote it is that his finding that working memory can handle only 7±2 bits of information at once applies to the loops/sketchpads/buffers in working memory. At first, it was assumed there was only one loop/sketchpad/buffer. Since then more have been discovered. In addition, due to information being chunked, the amount of information in the loops/sketchpads/buffers can actually be quite large. On top of that, the central executive is simultaneously monitoring information from the environment, the body and long-term memory. That’s quite a lot of information flowing through working memory all the time. We don’t actively pay attention to all of it, but it doesn’t follow that anything we don’t pay attention to disappears forever. In addition to working memory capacity there are several other things the brain does that make it easier, or harder, for people to learn.

Things that make learning easier (and harder)

1. Pre-existing information

People learn by extending their existing mental schemata. This involves extending neural networks – literally. If information is totally novel to us, it won’t mean anything to us and we’re unlikely to remember it. Because each human being has had a unique set of life experiences, each of us has a unique set of neural networks and the way we structure our knowledge is also unique. It doesn’t follow that everybody’s knowledge framework is equally valid. The way the world is structured and the way it functions are pretty reliable and we know quite a lot about both. Students do need to acquire core knowledge about the world and it is possible to teach it. Having said that, there are often fundamental disagreements within knowledge domains about the nature of that core knowledge, so students also need to know how to look at knowledge from different perspectives and how to test its reliability and validity.

Tapping into children’s existing schemata, not just those relating to what they are supposed to be learning in school but what they know about the world in general, can provide hooks on which to hang tricky concepts. Schemata from football, pop culture or Dr Who can be exploited, not in order to make learning ‘fun’, but to make sense of it. That doesn’t mean that teachers have to refer to pop culture, or that they should do so if it’s likely to prove a distraction.

2. Multi-sensory input

Because learning is about the real world and takes place in the real world, it usually involves more than one sensory modality – human beings rely most heavily on the visual, auditory and tactile senses. Neural connections linking information from several sensory modalities make things we’ve learned more secure because they can be accessed via several different sensory routes. It also makes sense to map the way information is presented as accurately as possible onto what it relates to in the real world. Visits, audio-visuals, high quality illustrations and physical activities can convey information that chalk-and-talk and a focus on abstract information can’t. Again, the job of multi-sensory vehicles for learning isn’t to make the learning ‘fun’ (although they might do that) or to distract the learner, but to increase the amount of information available.

3. Trial-and-error

The brain relies on trial-and-error feedback to fine-tune skills and ensure that knowledge is fit for purpose. We call trial-and-error learning in young children ‘play’. Older children and adults also use play to learn – if they get the opportunity. In more formal educational settings, formative assessment that gives feedback to individual students is a form of trial-and-error learning. It’s important to note that human beings tend to attach greater weight to the risk of failure and sanctions than they do to opportunities for success and reward. This means that tasks need to be challenging but not too challenging. Too many failures – or too many successes – can reduce interest and motivation.

4. Rehearsal

Willingham emphasises the importance of rehearsal in learning. The more times neural networks are activated, the stronger the connections become within them, and the more easily information will be recalled. Rehearsal at intervals is more effective than ‘cramming’. That’s because the connections between neurons have to be formed, physically, and there’s no opportunity for that to happen if the network is being constantly activated by incoming information. There’s a reason why human beings need rest and relaxation.

5. Problem-solving

Willingham is often quoted as saying ‘the brain is not designed for thinking’. That’s true in the sense that our brains default to quick-and-dirty solutions to problems rather than using logical, rational thought. What’s also true is what Willingham goes on to say; ‘people like to solve problems, but not to work on unsolveable problems’ (p.3). The point he’s making is that our problem-solving capacity is limited. Nonetheless, human technology bears witness to the fact that human beings are problem-solvers extraordinaire, and the attempts to resolve problems have resulted in a vast body of knowledge about how the world works. It’s futile to expect children to do all their learning by problem-solving, but because problem-solving involves researching, re-iterating, testing and reconfiguring knowledge it can be an effective way of acquiring new information and making it very memorable.

6. Writing things down

Advocates of direct instruction place a lot of emphasis on the importance of long-term memory; the impression one gets is that if factual information is memorised it can be recalled whenever it’s needed. Unfortunately, long-term memory doesn’t work like that. Over time information fades if it’s not used very often and memories can become distorted (assuming they were accurate in the first place). If we’ve acquired a great deal of factual information, we won’t have time to keep rehearsing all of it to keep it all easily accessible. Memorising factual information we currently need makes sense, but what we need long-term is factual information to hand when required, and that’s why we invented writing. And books. And the internet, although that has some of the properties of long-term memory. Recording information enormously increases the capacity and reliability of long-term memory.

grover

In a classic Sesame Street sketch, Mr Johnson the restaurant customer suggests that Grover the waiter write down his order. Grover is affronted: “Sir! I am a trained professional! I do not need to write things down. Instead, I use my ‘waiter’s memory’.” Waiters are faced with an interesting memory challenge; they need to remember a customer’s order for longer than is usually possible in working memory, but don’t need to remember the order long-term. So they tend to use technical support in the form of a written note. Worth watching the sketch, because it’s a beautiful illustration of how a great deal of information can be packed into a small timeframe, without any obvious working memory overload. (First time round most children would miss some of it, but Sesame Street repeats sketches for that reason.)

Conclusion

It won’t have escaped the attention of some readers that I have offered evidence from cognitive science to support educational methods lumped together as ‘minimal guidance’ and described as ‘failing’ by Kirschner, Sweller and Clark; constructivist, discovery, problem-based, experiential, and inquiry-based teaching. A couple of points are worth noting in relation to these approaches.

The first is that they didn’t appear suddenly out of the blue. Each of them has emerged at different points in time from 150 years of research into how human beings learn. We do learn by experiencing, inquiring, discovering, problem-solving and constructing our knowledge in different ways. There is no doubt about that. There’s also no doubt that we can learn by direct instruction.

The second point is that the reason why these approaches have demonstrably failed to ensure that all children have a good knowledge of how the world works, is because they have been extended beyond what George Kelley called their range of convenience.

In other words they’ve been applied inappropriately. You can’t just construct your own understanding of the world and expect the world to conform to it. Trying to learn everything by experience, discovery, inquiry or problem-solving is a waste of effort if someone’s already experienced, discovered or inquired about it, or if a problem’s already been solved. Advocates of direct instruction are quite right to point out that you usually need prior knowledge before you can solve a problem, and a good understanding of a knowledge domain before you know what you need to inquire about, and that many failures in education have come about because novices have been expected to mimic the surface features of experts’ behavior without having the knowledge of experts.

Having said that, relying on an oversimplified model of working memory introduces the risk of exactly the same thing happening with direct instruction. The way the brain processes information is complex, but not so complex it can’t be summarised in a few key principles. Human beings acquire information in multiple ways, but not in so many ways we can’t keep track of them. Figuring out what teaching approaches are best used for what knowledge might take a bit of time, but it’s a worthwhile investment, and should help to avoid the one-size-fits-all approach that has bedevilled the education system for too long.

Acknowledgements

Image of Grover from Muppet Wiki http://muppet.wikia.com/wiki/Grover

there’s more to working memory than meets the eye

I’ve had several conversations on Twitter with Peter Blenkinsop about learning and the brain. At the ResearchEd conference on Saturday, we continued the conversation and discovered that much of our disagreement was because we were using different definitions of learning. Peter’s definition is that learning involves being able to actively recall information; mine is that it involves changes to the brain in response to information.

working memory

Memory is obviously essential to learning. One thing that’s emerged clearly from years of research into how memory works is that the brain retains information for a very short time in what’s known as working memory, and indefinitely in what’s called long-term memory – but that’s not all there is to it. I felt that advocates of direct instruction at the conference were relying on a model of working memory that was oversimplified and could be misleading. The diagram they were using looked like this;

simple model of memory

simple model of memory

This model is attributed to Daniel Willingham. From what the teachers were saying, the diagram is simpler than most current representations of working memory because its purpose is to illustrate three key points;

• the capacity of working memory is limited and it holds information for a short time
• information in long-term memory is available for recall indefinitely and
• information can be transferred from working memory to long-term memory and vice versa.

So far, so good.

My reservation about the diagram is that if it’s the only diagram of working memory you’ve ever seen, you might get the impression that it shows the path information follows when it’s processed by the brain. From it you might conclude that;

• information from the environment goes directly into working memory
• if you pay attention to that information, it will be stored permanently in long-term memory
• if you don’t pay attention to it it will be lost forever, and
• there’s a very low limit to how much information from the environment you can handle at any one time.

But that’s not quite what happens to information coming into the brain. As Peter pointed out during our conversation, simplifying things appropriately is challenging; you want to simplify enough to avoid confusing people, but not so much that they might misunderstand.

In this post, I’m going to try to explain the slightly bigger picture of how brains process information, and where working memory and long-term memory fit in.

sensory information from the external environment

All information from the external environment comes into the brain via the sense organs. The incoming sensory information is on a relatively large scale, particularly if it’s visual or auditory information; you can see an entire classroom at once and hear simultaneously all the noises emanating from it. But individual cells within the retina or the cochlea respond to tiny fragments of that large-scale information; lines at different angles, areas of light and dark and colour, minute changes in air pressure. Information from the fragments is transmitted via tiny electrical impulses, from the sense organs to the brain. The brain then chunks the fragments together to build larger-scale representations that closely match the information coming in from the environment. As a result, what we perceive is a fairly accurate representation of what’s actually out there. I say ‘fairly accurate’ because perception isn’t 100% accurate, but that’s another story.

chunking

The chunking of sensory information takes place via networks of interconnected neurons (long spindly brain cells). The brain forms physical connections (synapses) between neighbouring neurons in response to novel information. The connections allow electrical activation to pass from one neuron to another. The connections work on a use-it-or-lose-it principle; the more they are used the stronger they get, and if they’re not used much they weaken and disappear. Not surprisingly, toddlers have vast numbers of connections, but that number diminishes considerably during childhood and adolescence. That doesn’t mean we have to keep remembering everything we ever learned or we’ll forget it, it’s a way of ensuring that the brain can process efficiently the types of information from the environment that it’s most likely to encounter.

working memory

Broadly speaking, incoming sensory information is processed in the brain from the back towards the front. It’s fed forward into areas that Alan Baddeley has called variously a ‘loop’, ‘sketchpad’ and ‘buffer’. Whatever you call them, they are areas where very limited amounts of information can be held for very short periods while we decide what to do with it. Research evidence suggests there are different loops/sketchpads/buffers for different types of sensory information – for example Baddeley’s most recent model of working memory includes temporary stores for auditory, visuospatial and episodic information.

Baddeley's working memory model

Baddeley’s working memory model

The incoming information held briefly in the loops/sketchpads/buffers is fed forward again to frontal areas of the brain where it’s constantly monitored by what’s called the central executive – an area that deals with attention and decision-making. The central executive and the loops/sketchpads/buffers together make up working memory.

long-term memory

The information coming into working memory activates the more permanent neural networks that carry information relevant to it – what’s called long-term memory. The neural networks that make up long-term memory are distributed throughout the brain. Several different types of long-term memory have been identified but the evidence points increasingly to the differences being due to where neural networks are located, not to differences in the biological mechanisms involved.

Information in the brain is carried in the pattern of connections between neurons. The principle is similar to the way pixels represent information on a computer screen; that information is carried in the patterns of pixels that are activated. This makes computer screens – and brains – very versatile; they can carry a huge range of different types of information in a relatively small space. One important difference between the two processes is that pixels operate independently, whereas brain cells form physical connections if they are often activated at the same time. The connections allow fast, efficient processing of information that’s encountered frequently.

For example, say I’m looking out of my window at a pigeon. The image of the pigeon falling on my retina will activate the neural networks in my brain that carry information about pigeons; what they look like, sound like, feel like, their flight patterns and feeding habits. My thoughts might then wander off on to related issues; other birds in my garden, when to prune the cherry tree, my neighbour repairing her fence. If I glance away from the pigeon and look at my blank computer screen, other neural networks will be activated, those that carry information about computers, technology, screens and rectangles in general. I will no longer be thinking about pigeons, but my pigeon networks will still be active enough for me to recall that I was looking at a pigeon previously and I might glance out of the window to see if it is still there.

Every time my long-term neural networks are activated by incoming sensory information, they are updated. If the same information comes in repeatedly the connections within the network are strengthened. What’s not clear is how much attention needs to be paid to incoming information in order for it to update long-term memory. Large amounts of information about the changing environment are flowing through working memory all the time, and evidence from brain-damaged patients suggests that long-term memory can be changed even if we’re not paying attention to the information that activates it.

the central executive

Information from incoming sensory information and from long-term memory is fed forward to the central executive. The function of the central executive is a bit like the function of a CCTV control room. According to Antonio Damasio it monitors, evaluates and responds to information from three main sources;

• the external environment (sensory information)
• the internal environment (body states) and
• previous representations of the external and internal environments (carried in the pattern of connections in neural networks).

One difference is that loops/sketchpads/buffers and the system that monitors them consist of networks of interconnected neurons, not TV screens (obviously). Another is that there isn’t anybody watching the brain’s equivalent of the CCTV screens – it’s an automated process. We become aware of information in the loops/sketchpads/buffers only if we need to be aware of it – so we are usually conscious of what’s happening in the external environment or if there are significant changes internally or externally.

The central executive constantly compares the streams of incoming information. It responds to it via networks of neurons that feed back information to other areas of the brain. If the environment has changed significantly, or an interesting or threatening event occurs, or we catch sight of something moving on the periphery of our field of vision, or experience sudden discomfort or pain, the feedback from the central executive ensures that we pay attention to that, rather than anything else. It’s important to note that information from the body includes information about our overall physiological state, including emotions.

So a schematic general diagram of how working memory fits in with information processing in the brain would look something like this:

Slide1

It’s important to note that we still don’t have a clear map of the information processing pathways. Researchers keep coming across different potential loops/sketchpads/buffers and there’s evidence that the feedback and feed-forward pathways are more complex than this diagram shows.

I began this post by suggesting that an over-simplified model of working memory could be misleading. I’ll explain my reasons in more detail in the next post, but first I want to highlight an important implication of the way incoming sensory information is handled by the brain.

pre-conscious processing

A great deal of sensory information is processed by the brain pre-consciously. Advocates of direct instruction emphasise the importance of chunking information because it increases the capacity of working memory. A popular example is the way expert chess players can hold simultaneously in working memory several different configurations of chess pieces, chunking being seen as something ‘experts’ do. But it’s important to remember that the brain chunks information automatically if we’re exposed to it frequently enough. That’s how we recognise faces, places and things – most three year-olds are ‘experts’ in their day-to-day surroundings because they have had thousands of exposures to familiar faces, places and things. They don’t have to sit down and study these things in order to chunk the fragments of information that make up faces, places and things – their visual cortex does it automatically.

This means that a large amount of information going through young children’s working memory is already chunked. We don’t know to what extent the central executive has to actively pay attention to that information in order for it to change long-term memory, but pre-conscious chunking does suggest that a good deal of learning happens implicitly. I’ll comment on this in more detail in my next post.