seven myths about education: facts and schemata

Knowledge occupies the bottom level of Bloom’s taxonomy of educational objectives. In the 1950s, Bloom and his colleagues would have known a good deal about the strategies teachers use to help students to acquire knowledge. What they couldn’t have known is how students formed their knowledge; how they extracted information from data and knowledge from information. At the time cognitive psychologists knew a fair amount about learning but had only a hazy idea about how it all fitted together. The DIKW pyramid I referred to in the previous post explains how the bottom layer of Bloom’s taxonomy works – how students extract information and knowledge during learning. Anderson’s simple theory of cognition explains how people extract low-level information. More recent research at the knowledge and wisdom levels is beginning to shed light on Bloom’s higher-level skills, why people organise the same body of knowledge in different ways and why they misunderstand and make mistakes.

Seven Myths about Education addresses the knowledge level of Bloom’s taxonomy. Daisy Christodoulou presents a model of cognition that she feels puts the higher-level skills in Bloom’s taxonomy firmly into context. Her model also forms the basis for a pedagogical approach and a structure for a curriculum, which I’ll discuss in another post. Facts are a core feature of Daisy’s model. I’ve mentioned previously that many disciplines find facts problematic because facts, by definition, have to be valid (true), and it’s often difficult to determine their validity. In this post I want to focus instead on the information processing entailed in learning facts.

a simple theory of cognition

Having explained the concept of chunking and the relationship between working and long-term memory, Daisy introduces Anderson’s paper;

So when we commit facts to long-term memory, they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition. Anderson puts it thus:

‘All that there is to intelligence is the simple accrual and tuning of many small units of knowledge that in total produce complex cognition. The whole is no more than the sum of its parts, but it has a lot of parts.’”

She then says “a lot is no exaggeration. Long-term memory is capable of storing thousands of facts, and when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20).


This was one of the points where I began to lose track of Daisy’s argument. I think she’s saying this:

Anderson shows that low-level data can be chunked into a ‘unit of knowledge’ that is then treated as one item by WM – in effect increasing the capacity of WM. In the same way, thousands of memorised facts can be chunked into a more complex unit (a schema) that is then treated as one item by WM – this essentially bypasses the limitations of WM.

I think Daisy assumes that the principle Anderson found pertaining to low-level ‘units of knowledge’ applies to all units of knowledge at whatever level of abstraction. It doesn’t. Before considering why it doesn’t, it’s worth noting a problem with the use of the word ‘facts’ when describing data. Some researchers have equated data with ‘raw facts’. The difficulty with defining data as ‘facts’ is that by definition a fact has to be valid (true) and not all data is valid, as the GIGO (garbage-in-garbage-out) principle that bedevils computer data processing and the human brain’s often flaky perception of sensory input demonstrate. In addition, ‘facts’ are more complex than raw (unprocessed) data or raw (unprocessed) sensory input.

It’s clear from Daisy’s examples of facts that she isn’t referring to raw data or raw sensory input. Her examples include the date of the battle of Waterloo, key facts about numerous historical events and ‘all of the twelve times tables’. She makes it clear in the rest of the book that in order to understand such facts, students need prior knowledge. In terms of the DIKW hierarchy, Daisy’s ‘facts’ are at a higher level to Anderson’s ‘units of knowledge’ and are unlikely to be processed automatically and pre-consciously in the same way as Anderson’s units. To understand why, we need to take another look at Anderson’s units of knowledge and why chunking happens.

chunking revisited

Data that can be chunked easily have two key characteristics; they involve small amounts of information and the patterns within them are highly consistent. As I mentioned in the previous post, one of Anderson’s examples of chunking is the visual features of upper case H. As far as the brain is concerned, the two parallel vertical lines and linking horizontal line that make up the letter H don’t involve much information. Also, although fonts and handwriting vary, the core features of all the Hs the brain perceives are highly consistent. So the brain soon starts perceiving all Hs as the same thing and chunks up the core features into a single unit – the letter H. If H could also be written Ĥ and Ħ in English, it would take a bit longer for the brain to chunk the three different configurations of lines and to learn the association between them, but not much longer, since the three variants involve little information and are still highly consistent.

understanding facts

But the letter H isn’t a fact, it’s a symbol. So are + and the numerals 1 and 2. ‘1+2’ isn’t a fact in the sense that Daisy uses the term, it’s a series of symbols. ‘1+2=3’ could be considered a fact because it consists of symbols representing two entities and the relationship between them. If you know what the symbols refer to, you can understand it. It could probably be chunked because it contains a small amount of information and has consistent visual features. Each multiplication fact in multiplication tables could probably be chunked, too, since they meet the same criteria. But that’s not true for all the facts that Daisy refers to, because they are more complex and less consistent.

‘The cat is on the mat’ is a fact, but in order to understand it, you need some prior knowledge about cats, mats and what ‘on’ means. These would be treated by working memory as different items. Most English-speaking 5 year-olds would understand the ‘cat is on the mat’ fact, but because there are different sorts of cats, different sorts of mats and different ways in which the cat could be on the mat, each child could have a different mental image of the cat on the mat. A particular child might conjure up a different mental image each time he or she encountered the fact, meaning that different sensory data were involved each time, the mental representations of the fact would be low in consistency, and the fact’s component parts couldn’t be chunked into a single unit in the same way as lower-level more consistent representations. Consequently the fact is less likely to be treated as one item in working memory.

Similarly, in order to understand a fact like ‘the battle of Waterloo was in 1815’ you’d need to know what a battle is, where Waterloo is (or at least that it’s a place), what 1815 means and how ‘of’ links a battle and a place name. If you’re learning about the Napoleonic wars, your perception of the battle is likely to keep changing and the components of the facts would have low consistency meaning that it couldn’t be chunked in the way Anderson describes.

The same problem involving inconsistency would prevent two or more facts being chunked into a single unit. But clearly people do mentally link facts and the components of facts. They do it using a schema, but not quite in the way Daisy describes.


Before discussing how people use schemata (schemas), a comment on the biological structures that enable us to form them. I mentioned in an earlier post that the neurons in the brain form complex networks a bit like the veins in a leaf. Physical connections are formed between neighbouring neurons when the neurons are activated simultaneously by incoming data. If the same or very similar data are encountered repeatedly, the same neurons are activated repeatedly, connections between them are strengthened and eventually networks of neurons are formed that can carry a vast amount of information in their patterns of connections. The patterns of connections between the neurons represent the individual’s perception of the patterns in the data.

So if I see a cat on a mat, or read a sentence about a cat on a mat, or imagine a cat on a mat, my networks of neurons carrying information about cats and mats will be activated. Facts and concepts about cats, mats and things related to them will readily spring to mind. But I won’t have access to all of those facts and concepts at once. That would completely overload my working memory. Instead, what I recall is a stream of facts and concepts about cats and mats that takes time to access. It’s only a short time, but it doesn’t happen all at once. Also, some facts and concepts will be activated immediately and strongly and others will take longer and might be a bit hazy. In essence, a schema is a network of related facts and concepts, not a chunked ‘unit of knowledge’.

Daisy says “when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20). It doesn’t work quite like that, for several reasons.

the structure of a schema A schema is what it sounds like – a schematic plan or framework. It doesn’t consist of facts or concepts, but it’s a representation of how someone mentally arranges facts or concepts. In the same way the floor-plan of a building doesn’t consist of actual walls, doors and windows, but it does show you where those things are in the building in relation to each other. The importance of this apparently pedantic point will become clear when I discuss deep structure.

implicit and explicit schemata Schemata can be implicit – the brain organises facts and concepts in a particular way but we’re not aware of what it is – or explicit – we actively organise facts and concepts in a particular way and we aware of how they are organised.

the size of a schema Schemata can vary in size and complexity. The configuration of the three lines that make up the letter H is a schema, so is the way a doctor organises his or her knowledge about the human circulatory system. A schema doesn’t have to represent all the facts or concepts it links together. If it did, a schema involving thousands of facts would be so complex it wouldn’t be much help in showing how the facts were related. And in order to encompass all the different relationships between thousands of facts, a single schema for them would need to be very simple.

For example, a simple schema for chemistry would be that different chemicals are formed from different configurations of the sub-atomic ‘particles’ that make up atoms and configurations of atoms that form molecules. Thousands of facts can be fitted into that schema. In order to have a good understanding of chemistry, students would need to know about schemata other than just that simple one, and would need to know thousands of facts about chemistry before they would qualify as experts, but the simple schema plus a few examples would give them a basic understanding of what chemistry was about.

experts’ schemata Research into expertise (e.g. Chi et al, 1981) shows that experts don’t usually have one single schema for all the facts they know, but instead use different schemata for different aspects of their body of knowledge. Sometimes those schemata are explicitly linked, but sometimes they’re not. Sometimes they can’t be linked because no one knows how the linkage works yet.

chess experts

Daisy refers to research showing that expert chess players memorise thousands of different configurations of chess pieces (p.78). This is classic chunking; although in different chess sets specific pieces vary in appearance, their core visual features and the moves they can make are highly consistent, so frequently-encountered configurations of pieces are eventually treated by the brain as single units – the brain chunks the positions of the chess pieces in essentially the same way as it chunks letters into words.

De Groot’s work showed that chess experts initially identified the configurations of pieces that were possible as a next move, and then went through a process of eliminating the possibilities. The particular configuration of pieces on the board would activate several associated schemata involving possible next and subsequent moves.

So, each of the different configurations of chess pieces that are encountered so frequently they are chunked, has an underlying (simple) schema. Expert chess players then access more complex schemata for next and subsequent possible moves. Even if they have an underlying schema for chess as a whole, it doesn’t follow that they treat chess as a single unit or that they recall all possible configurations at once. Most people can reliably recognise thousands of faces and thousands of words and have schemata for organising them, but when thinking about faces or words, they don’t recall all faces or all words simultaneously. That would rapidly overload working memory.

Compared to most knowledge domains, chess is pretty simple. Chess expertise consists of memorising a large but limited number of configurations and having schemata that predict the likely outcomes from a selection of them. Because of the rules of chess, although lots of moves are possible, the possibilities are clearly defined and limited. Expertise in medicine, say, or history, is considerably more complex and less certain. A doctor might have many schemata for human biology; one for each of the skeletal, nervous, circulatory, respiratory and digestive systems, for cell metabolism, biochemistry and genetics etc. Not only is human biology more complex than chess, there’s also more uncertainty involved. Some of those schemata we’re pretty sure about, some we’re not so sure about and some we know very little about. There’s even more uncertainty involved in history. Evaluating evidence about how the human body works might be difficult, but the evidence itself is readily available in the form of human bodies. Historical evidence is often absent and likely to stay that way, which makes establishing facts and developing schemata more challenging.

To illustrate her point about schemata Daisy claims that learning couple of key facts about 150 historical events from 3000BC to the present, will form “the fundamental chronological schema that is the basis of all historical understanding” (p.20). Chronological sequencing could certainly form a simple schema for history, but you don’t need to know about many events in order to grasp that principle – two or three would suffice. Again, although this simple schema would give students a basic understanding of what history was about, in order to have a good understanding of history, students would need to know not only thousands of facts, but to develop many schemata about how those facts were linked before they would qualify as experts. This brings us on to the deep structure of knowledge, the subject of the next post.

Chi, MTH, Feltovich, PJ & Glaser, R (1981). Categorisation and Representation of Physics Problems by Experts and Novices, Cognitive Science, 5, 121-152
de Groot, AD (1978). Thought in Chess. Mouton.

Edited for clarity 8/1/17.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s