In Seven Myths about Education Daisy Christodoulou refers to Bloom’s taxonomy of educational objectives as a metaphor that leads to two false conclusions; that skills are separate from knowledge and that knowledge is ‘somehow less worthy and important’ (p.21). Bloom’s taxonomy was developed in the 1950s as a way of systematising what students need to do with their knowledge. At the time, quite a lot was known about what people did with knowledge because they usually process it actively and explicitly. Quite a lot less was known about how people acquire knowledge, because much of that process is implicit; students usually ‘just learned’ – or they didn’t. Daisy’s book focuses on how students acquire knowledge, but her framework is an implicit one; she doesn’t link up the various stages of acquiring knowledge in an explicit formal model like Bloom’s. Although I think Daisy makes some valid points about the educational orthodoxy, some features of her model lead to conclusions that are open to question. In this post, I compare the model of cognition that Daisy describes with an established framework for analysing knowledge with origins outside the education sector.
a framework for knowledge
Researchers from a variety of disciplines have proposed frameworks involving levels of abstraction in relation to how knowledge is acquired and organised. The frameworks are remarkably similar. Although there are differences of opinion about terminology and how knowledge is organised at higher levels, there’s general agreement that knowledge is processed along the lines of the catchily named DIKW pyramid – DIKW stands for data, information, knowledge and wisdom. The Wikipedia entry gives you a feel for the areas of agreement and disagreement involved. In the pyramid, each level except the data level involves the extraction of information from the level below. I’ll start at the bottom.
As far as the brain is concerned, data don’t actually tell us anything except whether something is there or not. For computers, data are a series of 0s and 1s; for the brain data is largely in the form of sensory input – light, dark and colour, sounds, tactile sensations, etc.
It’s only when we spot patterns within data that the data can tell us anything. Information consists of patterns that enable us to identify changes, identify connections and make predictions. For computers, information involves detecting patterns in all the 0s and 1s. For the brain it involves detecting patterns in sensory input.
Knowledge has proved more difficult to define, but involves the organisation of information.
Although several researchers have suggested that knowledge is also organised at a meta-level, this hasn’t been extensively explored.
The processes involved in the lower levels of the hierarchy – data and information – are well-established thanks to both computer modelling and brain research. We know a fair bit about the knowledge level largely due to work on how experts and novices think, but how people organise knowledge at a meta-level isn’t so clear.
The key concept in this framework is information. Used in this context, ‘information’ tells you whether something has changed or not, whether two things are the same or not, and identifies patterns. The DIKW hierarchy is sometimes summarised as; information is information about data, knowledge is information about information, and wisdom is information about knowledge.
a simple theory of complex cognition
Daisy begins her exploration of cognitive psychology with a quote by John Anderson, from his paper ACT: A simple theory of complex cognition (p.20). Anderson’s paper tackles the mystique often attached to human intelligence when compared to that of other species. He demonstrates that it isn’t as sophisticated or as complex as it appears, but is derived from a simple underlying principle. He goes on to explain how people extract information from data, deduce production rules and make predictions about commonly occurring patterns, which suggests that the more examples of particular data the brain perceives, the more quickly and accurately it learns. He demonstrates the principle using examples from visual recognition, mathematical problem solving and prediction of word endings.
What Anderson describes is how human beings learn naturally; the way brains automatically process any information that happens to come their way unless something interferes with that process. It’s the principle we use to recognise and categorise faces, places and things. It’s the one we use when we learn to talk, solve problems and associate cause with effect. Scattergrams provide a good example of how we extract information from data in this way.
Although the image consists of a mass of dots and lines in two colours, we can see at a glance that the different coloured dots and lines form two clusters.
Note that I’m not making the same distinction that Daisy makes between ‘natural’ and ‘not natural’ learning (p.36). Anderson is describing the way the brain learns, by default, when it encounters data. Daisy, in contrast, claims that we learn things like spoken language without visible effort because language is ‘natural’ whereas we need to be taught ‘formally and explicitly’, inventions like the alphabet and numbers. That distinction, although frequently made, isn’t necessarily a valid one. It’s based on an assumption that the brain has evolved mechanisms to process some types of data e.g. to recognise faces and understand speech, but can’t have had time to evolve mechanisms to process recent inventions like writing and mathematics. This assumption about brain hardwiring is a contentious one, and the evidence about how brains learn (including the work that’s developed from Anderson’s theory) makes it look increasingly likely that it’s wrong. If formal and explicit instruction are necessary in order to learn man-made skills like writing and mathematics, it begs the question of how these skills were invented in the first place, and Anderson would not have been able to use mathematical problem-solving and word prediction as his examples of the underlying mechanism of human learning. The theory that the brain is hardwired to process some types of information but not others, and the theory that the same mechanism processes all information, both explain how people appear to learn some things automatically and ‘naturally’. Which theory is right (or whether both are right) is still the subject of intense debate. I’ll return to the second theory later when I discuss schemata.
data, information and chunking
Chunking is a core concept in Daisy’s model of cognition. Chunking occurs when the brain links together several bits of data it encounters frequently and treats them as a single item – groups of letters that frequently co-occur are chunked into words. Anderson’s paper is about the information processing involved in chunking. One of his examples is how the brain chunks the three lines that make up an upper case H. Although Anderson doesn’t make an explicit distinction between data and information, in his examples the three lines would be categorised as data in the DIKW framework, as would be the curves and lines that make up numerals. When the brain figures out the production rule for the configuration of the lines in the letter H, it’s extracting information from the data – spotting a pattern. Because the pattern is highly consistent – H is almost always written using this configuration of lines – the brain can chunk the configuration of lines into the single unit we call the letter H. The letters A and Z also consist of three lines, but have different production rules for their configurations. Anderson shows that chunking can also occur at a slightly higher level; letters (already chunked) can be chunked again into words that are processed as single units, and numerals (already chunked) can be chunked into numbers to which production rules can be applied to solve problems. Again, chunking can take place because the patterns of letters in the words, and the patterns of numerals in Anderson’s mathematical problems are highly consistent. Anderson calls these chunked units and production rules ‘units of knowledge’. He doesn’t use the same nomenclature as the DIKW model, but it’s clear from his model that initial chunking occurs at the data level and further chunking can occur at the information level.
The brain chunks data and low-level units of information automatically; evidence for this comes from research showing that babies begin to identify and categorise objects using visual features and categorise speech sounds using auditory features by about the age of 9 months (Younger, 2003). Chunking also occurs pre-consciously (e.g. Lamme 2003); we know that people are often aware of changes to a chunked unit like a face, a landscape or a piece of music, but don’t know what has changed – someone has shaved off their moustache, a tree has been felled, the song is a cover version with different instrumentation. In addition, research into visual and auditory processing shows that sensory information initially feeds forward in the brain; a lot of processing occurs before the information reaches the location of working memory in the frontal lobes. So at this level, what we are talking about is an automatic, usually pre-conscious process that we use by default.
knowledge – the organisation of information
Anderson’s paper was written in 1995 – twenty years ago – at about the time the DIKW framework was first proposed, which explains why he doesn’t used the same terminology. He calls the chunked units and production rules ‘units of knowledge’ rather than ‘units of information’ because they are the fundamental low-level units from which higher-level knowledge is formed.
Although Anderson’s model of information processing for low-level units still holds true, what has puzzled researchers in the intervening couple of decades is why that process doesn’t scale up. The way people process low-level ‘units of knowledge’ is logical and rational enough to be accurately modelled using computer software, but when handling large amounts of information, such as the concepts involved in day-to-day life, or trying to comprehend, apply, analyse, synthesise or evaluate it, the human brain goes a bit haywire. People (including experts) exhibit a number of errors and biases in their thinking. These aren’t just occasional idiosyncrasies – everybody shows the same errors and biases to varying extents. Since complex information isn’t inherently different to simple information – there’s just more of it – researchers suspected that the errors and biases were due to the wiring of the brain. Work on judgement and decision-making and on the biological mechanisms involved in processing information at higher levels has demonstrated that brains are indeed wired up differently to computers. The reason is that what has shaped the evolution of the human brain isn’t the need to produce logical, rational solutions to problems, but the need to survive, and overall quick-and-dirty information processing tends to result in higher survival rates than slow, precise processing.
What this means is that Anderson’s information processing principle can be applied directly to low-level units of information, but might not be directly applicable to the way people process information at a higher-level, the way they process facts, for example. Facts are the subject of the next post.
Anderson, J (1996) ACT: A simple theory of complex cognition, American Psychologist, 51, 355-365.
Lamme, VAF (2003) Why visual attention and awareness are different, TRENDS in Cognitive Sciences, 7, 12-18.
Lenroot,RK, Gogtay, N, Greenstein, DK, Molloy, E, Wallace, GL, Clasen, LS, Blumenthal JD, Lerch,J, Zijdenbos, AP, Evans, AC, Thompson, PM & Giedd, JN (2007). Sexual dimorphism of brain developmental trajectories during childhood and adolescence. NeuroImage, 36, 1065–1073.
Younger, B (2003). Parsing objects into categories: Infants’ perception and use of correlated attributes. In Rakison & Oakes (eds.) Early Category and Concept development: Making sense of the blooming, buzzing confusion, Oxford University Press.