seven myths about education: a knowledge framework

In Seven Myths about Education Daisy Christodoulou refers to Bloom’s taxonomy of educational objectives as a metaphor that leads to two false conclusions; that skills are separate from knowledge and that knowledge is ‘somehow less worthy and important’ (p.21). Bloom’s taxonomy was developed in the 1950s as a way of systematising what students need to do with their knowledge. At the time, quite a lot was known about what people did with knowledge because they usually process it actively and explicitly. Quite a lot less was known about how people acquire knowledge, because much of that process is implicit; students usually ‘just learned’ – or they didn’t. Daisy’s book focuses on how students acquire knowledge, but her framework is an implicit one; she doesn’t link up the various stages of acquiring knowledge in an explicit formal model like Bloom’s. Although I think Daisy makes some valid points about the educational orthodoxy, some features of her model lead to conclusions that are open to question. In this post, I compare the model of cognition that Daisy describes with an established framework for analysing knowledge with origins outside the education sector.

a framework for knowledge

Researchers from a variety of disciplines have proposed frameworks involving levels of abstraction in relation to how knowledge is acquired and organised. The frameworks are remarkably similar. Although there are differences of opinion about terminology and how knowledge is organised at higher levels, there’s general agreement that knowledge is processed along the lines of the catchily named DIKW pyramid – DIKW stands for data, information, knowledge and wisdom. The Wikipedia entry gives you a feel for the areas of agreement and disagreement involved. In the pyramid, each level except the data level involves the extraction of information from the level below. I’ll start at the bottom.


As far as the brain is concerned, data don’t actually tell us anything except whether something is there or not. For computers, data are a series of 0s and 1s; for the brain data is largely in the form of sensory input – light, dark and colour, sounds, tactile sensations, etc.

It’s only when we spot patterns within data that the data can tell us anything. Information consists of patterns that enable us to identify changes, identify connections and make predictions. For computers, information involves detecting patterns in all the 0s and 1s. For the brain it involves detecting patterns in sensory input.

Knowledge has proved more difficult to define, but involves the organisation of information.

Although several researchers have suggested that knowledge is also organised at a meta-level, this hasn’t been extensively explored.

The processes involved in the lower levels of the hierarchy – data and information – are well-established thanks to both computer modelling and brain research. We know a fair bit about the knowledge level largely due to work on how experts and novices think, but how people organise knowledge at a meta-level isn’t so clear.

The key concept in this framework is information. Used in this context, ‘information’ tells you whether something has changed or not, whether two things are the same or not, and identifies patterns. The DIKW hierarchy is sometimes summarised as; information is information about data, knowledge is information about information, and wisdom is information about knowledge.

a simple theory of complex cognition

Daisy begins her exploration of cognitive psychology with a quote by John Anderson, from his paper ACT: A simple theory of complex cognition (p.20). Anderson’s paper tackles the mystique often attached to human intelligence when compared to that of other species. He demonstrates that it isn’t as sophisticated or as complex as it appears, but is derived from a simple underlying principle. He goes on to explain how people extract information from data, deduce production rules and make predictions about commonly occurring patterns, which suggests that the more examples of particular data the brain perceives, the more quickly and accurately it learns. He demonstrates the principle using examples from visual recognition, mathematical problem solving and prediction of word endings.

natural learning

What Anderson describes is how human beings learn naturally; the way brains automatically process any information that happens to come their way unless something interferes with that process. It’s the principle we use to recognise and categorise faces, places and things. It’s the one we use when we learn to talk, solve problems and associate cause with effect. Scattergrams provide a good example of how we extract information from data in this way.

Scatterplot of longitudinal measurements of total brain volume for males (N=475 scans, shown in dark blue) and females (N=354 scans, shown in red).  From Lenroot et al (2007).

Scatterplot of longitudinal measurements of total brain volume for
males (N=475 scans, shown in dark blue) and females (N=354 scans,
shown in red). From Lenroot et al (2007).

Although the image consists of a mass of dots and lines in two colours, we can see at a glance that the different coloured dots and lines form two clusters.

Note that I’m not making the same distinction that Daisy makes between ‘natural’ and ‘not natural’ learning (p.36). Anderson is describing the way the brain learns, by default, when it encounters data. Daisy, in contrast, claims that we learn things like spoken language without visible effort because language is ‘natural’ whereas we need to be taught ‘formally and explicitly’, inventions like the alphabet and numbers. That distinction, although frequently made, isn’t necessarily a valid one. It’s based on an assumption that the brain has evolved mechanisms to process some types of data e.g. to recognise faces and understand speech, but can’t have had time to evolve mechanisms to process recent inventions like writing and mathematics. This assumption about brain hardwiring is a contentious one, and the evidence about how brains learn (including the work that’s developed from Anderson’s theory) makes it look increasingly likely that it’s wrong. If formal and explicit instruction are necessary in order to learn man-made skills like writing and mathematics, it begs the question of how these skills were invented in the first place, and Anderson would not have been able to use mathematical problem-solving and word prediction as his examples of the underlying mechanism of human learning. The theory that the brain is hardwired to process some types of information but not others, and the theory that the same mechanism processes all information, both explain how people appear to learn some things automatically and ‘naturally’. Which theory is right (or whether both are right) is still the subject of intense debate. I’ll return to the second theory later when I discuss schemata.

data, information and chunking

Chunking is a core concept in Daisy’s model of cognition. Chunking occurs when the brain links together several bits of data it encounters frequently and treats them as a single item – groups of letters that frequently co-occur are chunked into words. Anderson’s paper is about the information processing involved in chunking. One of his examples is how the brain chunks the three lines that make up an upper case H. Although Anderson doesn’t make an explicit distinction between data and information, in his examples the three lines would be categorised as data in the DIKW framework, as would be the curves and lines that make up numerals. When the brain figures out the production rule for the configuration of the lines in the letter H, it’s extracting information from the data – spotting a pattern. Because the pattern is highly consistent – H is almost always written using this configuration of lines – the brain can chunk the configuration of lines into the single unit we call the letter H. The letters A and Z also consist of three lines, but have different production rules for their configurations. Anderson shows that chunking can also occur at a slightly higher level; letters (already chunked) can be chunked again into words that are processed as single units, and numerals (already chunked) can be chunked into numbers to which production rules can be applied to solve problems. Again, chunking can take place because the patterns of letters in the words, and the patterns of numerals in Anderson’s mathematical problems are highly consistent. Anderson calls these chunked units and production rules ‘units of knowledge’. He doesn’t use the same nomenclature as the DIKW model, but it’s clear from his model that initial chunking occurs at the data level and further chunking can occur at the information level.

The brain chunks data and low-level units of information automatically; evidence for this comes from research showing that babies begin to identify and categorise objects using visual features and categorise speech sounds using auditory features by about the age of 9 months (Younger, 2003). Chunking also occurs pre-consciously (e.g. Lamme 2003); we know that people are often aware of changes to a chunked unit like a face, a landscape or a piece of music, but don’t know what has changed – someone has shaved off their moustache, a tree has been felled, the song is a cover version with different instrumentation. In addition, research into visual and auditory processing shows that sensory information initially feeds forward in the brain; a lot of processing occurs before the information reaches the location of working memory in the frontal lobes. So at this level, what we are talking about is an automatic, usually pre-conscious process that we use by default.

knowledge – the organisation of information

Anderson’s paper was written in 1995 – twenty years ago – at about the time the DIKW framework was first proposed, which explains why he doesn’t used the same terminology. He calls the chunked units and production rules ‘units of knowledge’ rather than ‘units of information’ because they are the fundamental low-level units from which higher-level knowledge is formed.

Although Anderson’s model of information processing for low-level units still holds true, what has puzzled researchers in the intervening couple of decades is why that process doesn’t scale up. The way people process low-level ‘units of knowledge’ is logical and rational enough to be accurately modelled using computer software, but when handling large amounts of information, such as the concepts involved in day-to-day life, or trying to comprehend, apply, analyse, synthesise or evaluate it, the human brain goes a bit haywire. People (including experts) exhibit a number of errors and biases in their thinking. These aren’t just occasional idiosyncrasies – everybody shows the same errors and biases to varying extents. Since complex information isn’t inherently different to simple information – there’s just more of it – researchers suspected that the errors and biases were due to the wiring of the brain. Work on judgement and decision-making and on the biological mechanisms involved in processing information at higher levels has demonstrated that brains are indeed wired up differently to computers. The reason is that what has shaped the evolution of the human brain isn’t the need to produce logical, rational solutions to problems, but the need to survive, and overall quick-and-dirty information processing tends to result in higher survival rates than slow, precise processing.

What this means is that Anderson’s information processing principle can be applied directly to low-level units of information, but might not be directly applicable to the way people process information at a higher-level, the way they process facts, for example. Facts are the subject of the next post.

Anderson, J (1996) ACT: A simple theory of complex cognition, American Psychologist, 51, 355-365.
Lamme, VAF (2003) Why visual attention and awareness are different, TRENDS in Cognitive Sciences, 7, 12-18.
Lenroot,RK, Gogtay, N, Greenstein, DK, Molloy, E, Wallace, GL, Clasen, LS, Blumenthal JD, Lerch,J, Zijdenbos, AP, Evans, AC, Thompson, PM & Giedd, JN (2007). Sexual dimorphism of brain developmental trajectories during childhood and adolescence. NeuroImage, 36, 1065–1073.
Younger, B (2003). Parsing objects into categories: Infants’ perception and use of correlated attributes. In Rakison & Oakes (eds.) Early Category and Concept development: Making sense of the blooming, buzzing confusion, Oxford University Press.

“waiter’s memory”

At the ResearchED conference last Saturday, when I queried the usefulness of the diagram of working memory that was being used, I was asked two questions. Here’s the first:

What’s wrong with Willingham’s model of working memory?

Nothing’s wrong with Willingham’s model. As far as I can tell, the diagram of working memory that was being used by teachers at the ResearchED conference had been simplified to illustrate two key points; that working memory has limited capacity and that information can be transferred from working memory to long-term memory and vice-versa.

My reservation about it is that if it’s the only model of working memory you’ve seen, you won’t know what Willingham has left out, nor how working memory fits into the way the brain processes information. And over-simplified models of things, if unconstrained by reality, tend to take on a life of their own which doesn’t help anyone. The left-brain right-brain mythology is a case in point. An oversimplified understanding of the differences between right and left hemispheres followed by a process of Chinese whispers ended up producing some bizarre educational practices.

The second question was this:

What difference would it make if we knew more about how information is processed in the brain?

It’s a good question. The short answer is that if you rely on Willingham’s diagram for your understanding of working memory, you could conclude, as some people have done, that direct instruction is the only way students should be taught. As I hope I showed in my previous post, the way information is processed is more complex than the diagram suggests. I think there are three key points that are worth bearing in mind.

Long-term memory is constantly being updated by incoming sensory information

Children are learning all the time. They learn implicitly, informally and incidentally from their environment as well as explicitly when being taught. It’s well worth utilising that ability to learn from ‘background’ information. Posters, displays, playground activities, informal conversations, and dvds and books used primarily for entertainment, can all exploit implicit, informal and incidental learning that will support and extend and reinforce explicit learning.

We’re not always aware that we are learning

I only need two or three exposures to an unfamiliar place, or face or song before I can recognise it again, and I don’t need to actively pay attention to, or put any effort into recalling, the place, face or song in order to do so. I would have reliably learned new things, but my learning would be implicit. I wouldn’t be able to give accurate directions, describe the face so that someone else would recognise it, or hum the tune. (Daniel Willingham suggests that implicit memory doesn’t exist, but he’s talking about the classification rather than the phenomenon.)

Peter Blenkinsop and I found that we were using different definitions of learning. My definition was; long-term changes to the brain as a result of incoming information. His was; being able to explicitly recall information from long-term memory. Both definitions are valid, but they are different.

Working memory is complex

George Miller’s paper ‘The magical number seven, plus or minus two’ is well worth reading. What’s become clear since Miller wrote it is that his finding that working memory can handle only 7±2 bits of information at once applies to the loops/sketchpads/buffers in working memory. At first, it was assumed there was only one loop/sketchpad/buffer. Since then more have been discovered. In addition, due to information being chunked, the amount of information in the loops/sketchpads/buffers can actually be quite large. On top of that, the central executive is simultaneously monitoring information from the environment, the body and long-term memory. That’s quite a lot of information flowing through working memory all the time. We don’t actively pay attention to all of it, but it doesn’t follow that anything we don’t pay attention to disappears forever. In addition to working memory capacity there are several other things the brain does that make it easier, or harder, for people to learn.

Things that make learning easier (and harder)

1. Pre-existing information

People learn by extending their existing mental schemata. This involves extending neural networks – literally. If information is totally novel to us, it won’t mean anything to us and we’re unlikely to remember it. Because each human being has had a unique set of life experiences, each of us has a unique set of neural networks and the way we structure our knowledge is also unique. It doesn’t follow that everybody’s knowledge framework is equally valid. The way the world is structured and the way it functions are pretty reliable and we know quite a lot about both. Students do need to acquire core knowledge about the world and it is possible to teach it. Having said that, there are often fundamental disagreements within knowledge domains about the nature of that core knowledge, so students also need to know how to look at knowledge from different perspectives and how to test its reliability and validity.

Tapping into children’s existing schemata, not just those relating to what they are supposed to be learning in school but what they know about the world in general, can provide hooks on which to hang tricky concepts. Schemata from football, pop culture or Dr Who can be exploited, not in order to make learning ‘fun’, but to make sense of it. That doesn’t mean that teachers have to refer to pop culture, or that they should do so if it’s likely to prove a distraction.

2. Multi-sensory input

Because learning is about the real world and takes place in the real world, it usually involves more than one sensory modality – human beings rely most heavily on the visual, auditory and tactile senses. Neural connections linking information from several sensory modalities make things we’ve learned more secure because they can be accessed via several different sensory routes. It also makes sense to map the way information is presented as accurately as possible onto what it relates to in the real world. Visits, audio-visuals, high quality illustrations and physical activities can convey information that chalk-and-talk and a focus on abstract information can’t. Again, the job of multi-sensory vehicles for learning isn’t to make the learning ‘fun’ (although they might do that) or to distract the learner, but to increase the amount of information available.

3. Trial-and-error

The brain relies on trial-and-error feedback to fine-tune skills and ensure that knowledge is fit for purpose. We call trial-and-error learning in young children ‘play’. Older children and adults also use play to learn – if they get the opportunity. In more formal educational settings, formative assessment that gives feedback to individual students is a form of trial-and-error learning. It’s important to note that human beings tend to attach greater weight to the risk of failure and sanctions than they do to opportunities for success and reward. This means that tasks need to be challenging but not too challenging. Too many failures – or too many successes – can reduce interest and motivation.

4. Rehearsal

Willingham emphasises the importance of rehearsal in learning. The more times neural networks are activated, the stronger the connections become within them, and the more easily information will be recalled. Rehearsal at intervals is more effective than ‘cramming’. That’s because the connections between neurons have to be formed, physically, and there’s no opportunity for that to happen if the network is being constantly activated by incoming information. There’s a reason why human beings need rest and relaxation.

5. Problem-solving

Willingham is often quoted as saying ‘the brain is not designed for thinking’. That’s true in the sense that our brains default to quick-and-dirty solutions to problems rather than using logical, rational thought. What’s also true is what Willingham goes on to say; ‘people like to solve problems, but not to work on unsolveable problems’ (p.3). The point he’s making is that our problem-solving capacity is limited. Nonetheless, human technology bears witness to the fact that human beings are problem-solvers extraordinaire, and the attempts to resolve problems have resulted in a vast body of knowledge about how the world works. It’s futile to expect children to do all their learning by problem-solving, but because problem-solving involves researching, re-iterating, testing and reconfiguring knowledge it can be an effective way of acquiring new information and making it very memorable.

6. Writing things down

Advocates of direct instruction place a lot of emphasis on the importance of long-term memory; the impression one gets is that if factual information is memorised it can be recalled whenever it’s needed. Unfortunately, long-term memory doesn’t work like that. Over time information fades if it’s not used very often and memories can become distorted (assuming they were accurate in the first place). If we’ve acquired a great deal of factual information, we won’t have time to keep rehearsing all of it to keep it all easily accessible. Memorising factual information we currently need makes sense, but what we need long-term is factual information to hand when required, and that’s why we invented writing. And books. And the internet, although that has some of the properties of long-term memory. Recording information enormously increases the capacity and reliability of long-term memory.


In a classic Sesame Street sketch, Mr Johnson the restaurant customer suggests that Grover the waiter write down his order. Grover is affronted: “Sir! I am a trained professional! I do not need to write things down. Instead, I use my ‘waiter’s memory’.” Waiters are faced with an interesting memory challenge; they need to remember a customer’s order for longer than is usually possible in working memory, but don’t need to remember the order long-term. So they tend to use technical support in the form of a written note. Worth watching the sketch, because it’s a beautiful illustration of how a great deal of information can be packed into a small timeframe, without any obvious working memory overload. (First time round most children would miss some of it, but Sesame Street repeats sketches for that reason.)


It won’t have escaped the attention of some readers that I have offered evidence from cognitive science to support educational methods lumped together as ‘minimal guidance’ and described as ‘failing’ by Kirschner, Sweller and Clark; constructivist, discovery, problem-based, experiential, and inquiry-based teaching. A couple of points are worth noting in relation to these approaches.

The first is that they didn’t appear suddenly out of the blue. Each of them has emerged at different points in time from 150 years of research into how human beings learn. We do learn by experiencing, inquiring, discovering, problem-solving and constructing our knowledge in different ways. There is no doubt about that. There’s also no doubt that we can learn by direct instruction.

The second point is that the reason why these approaches have demonstrably failed to ensure that all children have a good knowledge of how the world works, is because they have been extended beyond what George Kelley called their range of convenience.

In other words they’ve been applied inappropriately. You can’t just construct your own understanding of the world and expect the world to conform to it. Trying to learn everything by experience, discovery, inquiry or problem-solving is a waste of effort if someone’s already experienced, discovered or inquired about it, or if a problem’s already been solved. Advocates of direct instruction are quite right to point out that you usually need prior knowledge before you can solve a problem, and a good understanding of a knowledge domain before you know what you need to inquire about, and that many failures in education have come about because novices have been expected to mimic the surface features of experts’ behavior without having the knowledge of experts.

Having said that, relying on an oversimplified model of working memory introduces the risk of exactly the same thing happening with direct instruction. The way the brain processes information is complex, but not so complex it can’t be summarised in a few key principles. Human beings acquire information in multiple ways, but not in so many ways we can’t keep track of them. Figuring out what teaching approaches are best used for what knowledge might take a bit of time, but it’s a worthwhile investment, and should help to avoid the one-size-fits-all approach that has bedevilled the education system for too long.


Image of Grover from Muppet Wiki