seven myths about education – what’s missing?

Old Andrew has raised a number of objections to my critique of Seven Myths about Education. In his most recent comment on my previous (and I had hoped, last) post about it, he says I should be able to easily identify evidence that shows ‘what in the cognitive psychology Daisy references won’t scale up’.

One response would be to provide a list of references showing step-by-step the problems that artificial intelligence researchers ran into. That would take me hours, if not days, because I would have to trawl through references I haven’t looked at for over 20 years. Most of them are not online anyway because of their age, which means Old Andrew would be unlikely to be able to access them.

What is more readily accessible is information about concepts that have emerged from those problems, for example; personal construct theory, schema theory, heuristics and biases, bounded rationality and indexing, connectionist models of cognition and neuroconstructivism. Unfortunately, none of the researchers says “incidentally, this means that students might not develop the right schemata when they commit facts to long-term memory” or “the implications for a curriculum derived from cultural references are obvious”, because they are researching cognition not education, and probably wouldn’t have anticipated anyone suggesting either of these ideas. Whether Old Andrew sees the relevance of these emergent issues or not is secondary, in my view, to how Daisy handles evidence in her book.

concepts and evidence

In the last section of her chapter on Myth 1, Daisy takes us through the concepts of the limited capacity of working memory and chunking. These are well-established, well-tested hypotheses and she cites evidence to support them.

concepts but no evidence

Daisy also appears to introduce two hypotheses of her own. The first is that “we can summon up the information from long-term memory to working memory without imposing a cognitive load” (p.19). The second is that the characteristics of chunking can be extrapolated to all facts, regardless of how complex or inconsistent they might be; “So, when we commit facts to long-term memory they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition” (p.20). The evidence she cites to support this extrapolation is Anderson’s paper – the one about simple, consistent information. I couldn’t find any other evidence cited to support either idea.

evidence but no concepts

Daisy does cite Frantz’s paper about Simon’s work on intuition. Two important concepts of Simon’s that Daisy doesn’t mention but Frantz does, are bounded rationality and the idea of indexing.

Bounded rationality refers to the fact that people can only make sense of the information they have. This supports Daisy’s premise that knowledge is necessary for understanding. But it also supports Freire’s complaint about which facts were being presented to Brazilian schoolchildren. Bounded rationality is also relevant to the idea of the breadth of a curriculum being determined by the frequency of cultural references. Simon used it to challenge economic and political theory.

Simon also pointed out that not only do experts have access to more information than novices do, they can access it more quickly because of their mental cross-indexing, ie the schemata that link relevant information. Rapid speed of access reduces cognitive load, but it doesn’t eliminate it. Chess experts can determine the best next move within seconds, but for most other experts, their knowledge is considerably more complex and less well-defined. A surgeon or an engineer is likely to take days rather than seconds to decide on the best procedure or design to resolve a difficult problem. That implies that quite a heavy cognitive load is involved.

Daisy does mention schemata but doesn’t go into detail about how they are formed or how they influence thinking and understanding. She refers to deep learning in passing but doesn’t tackle the issue Willingham raises about students’ problems with deep structure.

burden of proof

Old Andrew appears to be suggesting that I should assume that Daisy’s assertions are valid unless I can produce evidence to refute them. The burden of proof for a theory usually rests with the person making the claims, for obvious reasons. Daisy cites evidence to support some of her claims, but not all of them. She doesn’t evaluate that evidence by considering its reliability or validity or by taking into account contradictory evidence.

If Daisy had written a book about her musings on cognitive psychology and education, or about how findings from cognitive psychology had helped her teaching, I wouldn’t be writing this. But that’s not what she’s done. She’s used theory from one knowledge domain to challenge theory in another. That can be a very fruitful strategy; the application of game theory and ecological systems theory has transformed several fields. But it’s not helpful simply to take a few concepts out of context from one domain and apply them out of context to another domain.

The reason is that theoretical concepts aren’t free-standing; they are embedded in a conceptual framework. If you’re challenging theory with theory, you need to take a long hard look at both knowledge domains first to get an idea of where particular concepts fit in. You can’t just say “I’m going to apply the concepts of chunking and the limited capacity of working memory to education, but I shan’t bother with schema theory or bounded rationality or heuristics and biases because I don’t think they’re relevant.” Well, you can say that, but it’s not a helpful way to approach problems with learning, because all of these concepts are integral to human cognition. Students don’t leave some of them in the cloakroom when they come into class.

On top of that, the model for pedagogy and the curriculum that Daisy supports is currently influencing international educational policy. If the DfE considers the way evidence has been presented by Hirsch, Willingham and presumably Daisy, as ‘rigorous’, as Michael Gove clearly did, then we’re in trouble.

For Old Andrew’s benefit, I’ve listed some references. Most of them are about things that Daisy doesn’t mention. That’s the point.

references

Axelrod, R (1973). Schema Theory: An Information Processing Model of Perception and Cognition, The American Political Science Review, 67, 1248-1266.
Elman, J et al (1998). Rethinking Innateness: Connectionist Perspective on Development. MIT Press.
Frantz, R (2003). Herbert Simon. Artificial intelligence as a framework for understanding intuition, Journal of Economic Psychology, 24, 265–277.
Kahneman, D., Slovic, P & Tversky A (1982). Judgement under Uncertainty: Heuristics and Biases. Cambridge University Press.
Karmiloff-Smith, A (2009). Nativism Versus Neuroconstructivism: Rethinking the Study of
Developmental Disorders. Developmental Psychology, 45, 56–63.
Kelly, GA (1955). The Psychology of Personal Constructs. New York: Norton.

Advertisements

seven myths about education: finally…

When I first heard about Daisy Christodoulou’s myth-busting book in which she adopts an evidence-based approach to education theory, I assumed that she and I would see things pretty much the same way. It was only when I read reviews (including Daisy’s own summary) that I realised we’d come to rather different conclusions from what looked like the same starting point in cognitive psychology. I’ve been asked several times why, if I have reservations about the current educational orthodoxy, think knowledge is important, don’t have a problem with teachers explaining things and support the use of systematic synthetic phonics, I’m critical of those calling for educational reform rather than those responsible for a system that needs reforming. The reason involves the deep structure of the models, rather than their surface features.

concepts from cognitive psychology

Central to Daisy’s argument is the concept of the limited capacity of working memory. It’s certainly a core concept in cognitive psychology. It explains not only why we can think about only a few things at once, but also why we oversimplify and misunderstand, are irrational, are subject to errors and biases and use quick-and-dirty rules of thumb in our thinking. And it explains why an emphasis on understanding at the expense of factual information is likely to result in students not knowing much and, ironically, not understanding much either.

But what students are supposed to learn is only one of the streams of information that working memory deals with; it simultaneously processes information about students’ internal and external environment. And the limited capacity of working memory is only one of many things that impact on learning; a complex array of environmental factors is also involved. So although you can conceptually isolate the material students are supposed to learn and the limited capacity of working memory, in the classroom neither of them can be isolated from all the other factors involved. And you have to take those other factors into account in order to build a coherent, workable theory of learning.

But Daisy doesn’t introduce only the concept of working memory. She also talks about chunking, schemata and expertise. Daisy implies (although she doesn’t say so explicitly) that schemata are to facts what chunking is to low-level data. That just as students automatically chunk low-level data they encounter repeatedly, so they will automatically form schemata for facts they memorise, and the schemata will reduce cognitive load in the same way that chunking does (p.20). That’s a possibility, because the brain appears to use the same underlying mechanism to represent associations between all types of information – but it’s unlikely. We know that schemata vary considerably between individuals, whereas people chunk information in very similar ways. That’s not surprising if the information being chunked is simple and highly consistent, whereas schemata often involve complex, inconsistent information.

Experimental work involving priming suggests that schemata increase the speed and reliability of access to associated ideas and that would reduce cognitive load, but students would need to have the schemata that experts use explained to them in order to avoid forming schemata of their own that were insufficient or misleading. Daisy doesn’t go into detail about deep structure or schemata, which I think is an oversight, because the schemata students use to organise facts are crucial to their understanding of how the facts relate to each other.

migrating models

Daisy and teachers taking a similar perspective frequently refer approvingly to ‘traditional’ approaches to education. It’s been difficult to figure out exactly what they mean. Daisy focuses on direct instruction and memorising facts, Old Andrew’s definition is a bit broader and Robert Peal’s appears to include cultural artefacts like smart uniforms and school songs. What they appear to have in common is a concept of education derived from the behaviourist model of learning that dominated psychology in the inter-war years. In education it focused on what was being learned; there was little consideration of the broader context involving the purpose of education, power structures, socioeconomic factors, the causes of learning difficulties etc.

Daisy and other would-be reformers appear to be trying to update the behaviourist model of education with concepts that, ironically, emerged from cognitive psychology not long after it switched focus from behaviourist model of learning to a computational one; the point at which the field was first described as ‘cognitive’. The concepts the educational reformers focus on fit the behaviourist model well because they are strongly mechanistic and largely context-free. The examples that crop up frequently in the psychology research Daisy cites usually involve maths, physics and chess problems. These types of problems were chosen deliberately by artificial intelligence researchers because they were relatively simple and clearly bounded; the idea was that once the basic mechanism of learning had been figured out, the principles could then be extended to more complex, less well-defined problems.

Researchers later learned a good deal about complex, less well-defined problems, but Daisy doesn’t refer to that research. Nor do any of the other proponents of educational reform. What more recent research has shown is that complex, less well-defined knowledge is organised by the brain in a different way to simple, consistent information. So in cognitive psychology the computational model of cognition has been complemented by a constructivist one, but it’s a different constructivist model to the social constructivism that underpins current education theory. The computational model never quite made it across to education, but early constructivist ideas did – in the form of Piaget’s work. At that point, education theory appears to have grown legs and wandered off in a different direction to cognitive psychology. I agree with Daisy that education theorists need to pay attention to findings from cognitive psychology, but they need to pay attention to what’s been discovered in the last half century not just to the computational research that superseded behaviourism.

why criticise the reformers?

So why am I critical of the reformers, but not of the educational orthodoxy? When my children started school, they, and I, were sometimes perplexed by the approaches to learning they encountered. Conversations with teachers painted a picture of educational theory that consisted of a hotch-potch of valid concepts, recent tradition, consequences of policy decisions and ideas that appeared to have come from nowhere like Brain Gym and Learning Styles. The only unifying feature I could find was a social constructivist approach and even on that opinions seemed to vary. It was difficult to tell what the educational orthodoxy was, or even if there was one at all. It’s difficult to critique a model that might not be a model. So I perked up when I heard about teachers challenging the orthodoxy using the findings from scientific research and calling for an evidence-based approach to education.

My optimism was short-lived. Although the teachers talked about evidence from cognitive psychology and randomised controlled trials, the model of learning they were proposing appeared as patchy, incomplete and incoherent as the model they were criticising – it was just different. So here are my main reservations about the educational reformers’ ideas:

1. If mainstream education theorists aren’t aware of working memory, chunking, schemata and expertise, that suggests there’s a bigger problem than just their ignorance of these particular concepts. It suggests that they might not be paying enough attention to developments in some or all of the knowledge domains their own theory relies on. Knowing about working memory, chunking, schemata and expertise isn’t going to resolve that problem.

2. If teachers don’t know about working memory, chunking, schemata and expertise, that suggests there’s a bigger problem than just their ignorance of these particular concepts. It suggests that teacher training isn’t providing teachers with the knowledge they need. To some extent this would be an outcome of weaknesses in educational theory, but I get the impression that trainee teachers aren’t expected or encouraged to challenge what they’re taught. Several teachers who’ve recently discovered cognitive psychology have appeared rather miffed that they hadn’t been told about it. They were all Teach First graduates; I don’t know if that’s significant.

3. A handful of concepts from cognitive psychology doesn’t constitute a robust enough foundation for developing a pedagogical approach or designing a curriculum. Daisy essentially reiterates what Daniel Willingham has to say about the breadth and depth of the curriculum in Why Don’t Students Like School?. He’s a cognitive psychologist and well-placed to show how models of cognition could inform education theory. But his book isn’t about the deep structure of theory, it’s about applying some principles from cognitive psychology in the classroom in response to specific questions from teachers. He explores ideas about pedagogy and the curriculum, but that’s as far as it goes. Trying to develop a model of pedagogy and design a curriculum based on a handful of principles presented in a format like this is like trying to devise courses of treatment and design a health service based on the information gleaned from a GP’s problem page in a popular magazine. But I might be being too charitable; Willingham is a trustee of the Core Knowledge Foundation, after all.

4. Limited knowledge Rightly, the reforming teachers expect students to acquire extensive factual knowledge and emphasise the differences between experts and novices. But Daisy’s knowledge of cognitive psychology appears to be limited to a handful of principles discovered over thirty years ago. She, Robert Peal and Toby Young all quote Daniel Willingham on research in cognitive psychology during the last thirty years, but none of them, Willingham included, tell us what it is. If they did, it would show that the principles they refer to don’t scale up when it comes to complex knowledge. Nor do most of the teachers writing about educational reform appear to have much teaching experience. That doesn’t mean they are wrong, but it does call into question the extent of their expertise relating to education.

Some of those supporting Daisy’s view have told me they are aware that they don’t know much about cognitive psychology, but have argued that they have to start somewhere and it’s important that teachers are made aware of concepts like the limits of working memory. That’s fine if that’s all they are doing, but it’s not. Redesigning pedagogy and the curriculum on the basis of a handful of facts makes sense if you think that what’s important is facts and that the brain will automatically organise those facts into a coherent schema. The problem is of course that that rarely happens in the absence of an overview of all the relevant facts and how they fit together. Cognitive psychology, like all other knowledge domains, has incomplete knowledge but it’s not incomplete in the same way as the reforming teachers’ knowledge. This is classic Sorcerer’s Apprentice territory; a little knowledge, misapplied, can do a lot of damage.

5. Evaluating evidence Then there’s the way evidence is handled. Evidence-based knowledge domains have different ways of evaluating evidence, but they all evaluate it. That means weighing up the pros and cons, comparing evidence for and against competing hypotheses and so on. Evaluating evidence does not mean presenting only the evidence that supports whatever view you want to get across. That might be a way of making your case more persuasive, but is of no use to anyone who wants to know about the reliability of your hypothesis or your evidence. There might be a lot of evidence telling you your hypothesis is right – but a lot more telling you it’s wrong. But Daisy, Robert Peal and Toby Young all present supporting evidence only. They make no attempt to test the hypotheses they’re proposing or the evidence cited, and much of the evidence is from secondary sources – with all due respect to Daniel Willingham, just because he says something doesn’t mean that’s all there is to say on the matter.

cargo-cult science

I suggested to a couple of the teachers who supported Daisy’s model that ironically it resembled Feynman’s famous cargo-cult analogy (p. 97). They pointed out that the islanders were using replicas of equipment, whereas the concepts from cognitive psychology were the real deal. I suggest that even the Americans had left their equipment on the airfield and the islanders knew how to use it, that wouldn’t have resulted in planes bringing in cargo – because there were other factors involved.

My initial response to reading Seven Myths about Education was one of frustration that despite making some good points about the educational orthodoxy and cognitive psychology, Daisy appeared to have got hold of the wrong ends of several sticks. This rapidly changed to concern that a handful of misunderstood concepts is being used as ‘evidence’ to support changes in national education policy.

In Michael Gove’s recent speech at the Education Reform Summit, he refers to the “solidly grounded research into how children actually learn of leading academics such as ED Hirsch or Daniel T Willingham”. Daniel Willingham has published peer-reviewed work, mainly on procedural learning, but I could find none by ED Hirsch. It would be interesting to know what the previous Secretary of State for Education’s criteria for ‘solidly grounded research’ and ‘leading academic’ were. To me the educational reform movement doesn’t look like an evidence-based discipline but bears all the hallmarks of an ideological system looking for evidence that affirms its core beliefs. This is no way to develop public policy. Government should know better.

seven myths about education: facts and schemata

Knowledge occupies the bottom level of Bloom’s taxonomy of educational objectives. In the 1950s, Bloom and his colleagues would have known a good deal about the strategies teachers use to help students to acquire knowledge. What they couldn’t have known is how students formed their knowledge; how they extracted information from data and knowledge from information. At the time cognitive psychologists knew a fair amount about learning but had only a hazy idea about how it all fitted together. The DIKW pyramid I referred to in the previous post explains how the bottom layer of Bloom’s taxonomy works – how students extract information and knowledge during learning. Anderson’s simple theory of cognition explains how people extract low-level information. More recent research at the knowledge and wisdom levels is beginning to shed light on Bloom’s higher-level skills, why people organise the same body of knowledge in different ways and why they misunderstand and make mistakes.

Seven Myths about Education addresses the knowledge level of Bloom’s taxonomy. Daisy Christodoulou presents a model of cognition that she feels puts the higher-level skills in Bloom’s taxonomy firmly into context. Her model also forms the basis for a pedagogical approach and a structure for a curriculum, which I’ll discuss in another post. Facts are a core feature of Daisy’s model. I’ve mentioned previously that many disciplines find facts problematic because facts, by definition, have to be valid (true), and it’s often difficult to determine their validity. In this post I want to focus instead on the information processing entailed in learning facts.

a simple theory of cognition

Having explained the concept of chunking and the relationship between working and long-term memory, Daisy introduces Anderson’s paper;

So when we commit facts to long-term memory, they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition. Anderson puts it thus:

‘All that there is to intelligence is the simple accrual and tuning of many small units of knowledge that in total produce complex cognition. The whole is no more than the sum of its parts, but it has a lot of parts.’”

She then says “a lot is no exaggeration. Long-term memory is capable of storing thousands of facts, and when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20).

facts

This was one of the points where I began to lose track of Daisy’s argument. I think she’s saying this:

Anderson shows that low-level data can be chunked into a ‘unit of knowledge’ that is then treated as one item by WM – in effect increasing the capacity of WM. In the same way, thousands of memorised facts can be chunked into a more complex unit (a schema) that is then treated as one item by WM – this essentially bypasses the limitations of WM.

I think Daisy assumes that the principle Anderson found pertaining to low-level ‘units of knowledge’ applies to all units of knowledge at whatever level of abstraction. It doesn’t. Before considering why it doesn’t, it’s worth noting a problem with the use of the word ‘facts’ when describing data. Some researchers have equated data with ‘raw facts’. The difficulty with defining data as ‘facts’ is that by definition a fact has to be valid (true) and not all data is valid, as the GIGO (garbage-in-garbage-out) principle that bedevils computer data processing and the human brain’s often flaky perception of sensory input demonstrate. In addition, ‘facts’ are more complex than raw (unprocessed) data or raw (unprocessed) sensory input.

It’s clear from Daisy’s examples of facts that she isn’t referring to raw data or raw sensory input. Her examples include the date of the battle of Waterloo, key facts about numerous historical events and ‘all of the twelve times tables’. She makes it clear in the rest of the book that in order to understand such facts, students need prior knowledge. In terms of the DIKW hierarchy, Daisy’s ‘facts’ are at a higher level to Anderson’s ‘units of knowledge’ and are unlikely to be processed automatically and pre-consciously in the same way as Anderson’s units. To understand why, we need to take another look at Anderson’s units of knowledge and why chunking happens.

chunking revisited

Data that can be chunked easily have two key characteristics; they involve small amounts of information and the patterns within them are highly consistent. As I mentioned in the previous post, one of Anderson’s examples of chunking is the visual features of upper case H. As far as the brain is concerned, the two parallel vertical lines and linking horizontal line that make up the letter H don’t involve much information. Also, although fonts and handwriting vary, the core features of all the Hs the brain perceives are highly consistent. So the brain soon starts perceiving all Hs as the same thing and chunks up the core features into a single unit – the letter H. If H could also be written Ĥ and Ħ in English, it would take a bit longer for the brain to chunk the three different configurations of lines and to learn the association between them, but not much longer, since the three variants involve little information and are still highly consistent.

understanding facts

But the letter H isn’t a fact, it’s a symbol. So are + and the numerals 1 and 2. ‘1+2’ isn’t a fact in the sense that Daisy uses the term, it’s a series of symbols. ‘1+2=3’ could be considered a fact because it consists of symbols representing two entities and the relationship between them. If you know what the symbols refer to, you can understand it. It could probably be chunked because it contains a small amount of information and has consistent visual features. Each multiplication fact in multiplication tables could probably be chunked, too, since they meet the same criteria. But that’s not true for all the facts that Daisy refers to, because they are more complex and less consistent.

‘The cat is on the mat’ is a fact, but in order to understand it, you need some prior knowledge about cats, mats and what ‘on’ means. These would be treated by working memory as different items. Most English-speaking 5 year-olds would understand the ‘cat is on the mat’ fact, but because there are different sorts of cats, different sorts of mats and different ways in which the cat could be on the mat, each child could have a different mental image of the cat on the mat. A particular child might conjure up a different mental image each time he or she encountered the fact, meaning that different sensory data were involved each time, the mental representations of the fact would be low in consistency, and the fact’s component parts couldn’t be chunked into a single unit in the same way as lower-level more consistent representations. Consequently the fact is less likely to be treated as one item in working memory.

Similarly, in order to understand a fact like ‘the battle of Waterloo was in 1815’ you’d need to know what a battle is, where Waterloo is (or at least that it’s a place), what 1815 means and how ‘of’ links a battle and a place name. If you’re learning about the Napoleonic wars, your perception of the battle is likely to keep changing and the components of the facts would have low consistency meaning that it couldn’t be chunked in the way Anderson describes.

The same problem involving inconsistency would prevent two or more facts being chunked into a single unit. But clearly people do mentally link facts and the components of facts. They do it using a schema, but not quite in the way Daisy describes.

schemata

Before discussing how people use schemata (schemas), a comment on the biological structures that enable us to form them. I mentioned in an earlier post that the neurons in the brain form complex networks a bit like the veins in a leaf. Physical connections are formed between neighbouring neurons when the neurons are activated simultaneously by incoming data. If the same or very similar data are encountered repeatedly, the same neurons are activated repeatedly, connections between them are strengthened and eventually networks of neurons are formed that can carry a vast amount of information in their patterns of connections. The patterns of connections between the neurons represent the individual’s perception of the patterns in the data.

So if I see a cat on a mat, or read a sentence about a cat on a mat, or imagine a cat on a mat, my networks of neurons carrying information about cats and mats will be activated. Facts and concepts about cats, mats and things related to them will readily spring to mind. But I won’t have access to all of those facts and concepts at once. That would completely overload my working memory. Instead, what I recall is a stream of facts and concepts about cats and mats that takes time to access. It’s only a short time, but it doesn’t happen all at once. Also, some facts and concepts will be activated immediately and strongly and others will take longer and might be a bit hazy. In essence, a schema is a network of related facts and concepts, not a chunked ‘unit of knowledge’.

Daisy says “when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20). It doesn’t work quite like that, for several reasons.

the structure of a schema A schema is what it sounds like – a schematic plan or framework. It doesn’t consist of facts or concepts, but it’s a representation of how someone mentally arranges facts or concepts. In the same way the floor-plan of a building doesn’t consist of actual walls, doors and windows, but it does show you where those things are in the building in relation to each other. The importance of this apparently pedantic point will become clear when I discuss deep structure.

implicit and explicit schemata Schemata can be implicit – the brain organises facts and concepts in a particular way but we’re not aware of what it is – or explicit – we actively organise facts and concepts in a particular way and we aware of how they are organised.

the size of a schema Schemata can vary in size and complexity. The configuration of the three lines that make up the letter H is a schema, so is the way a doctor organises his or her knowledge about the human circulatory system. A schema doesn’t have to represent all the facts or concepts it links together. If it did, a schema involving thousands of facts would be so complex it wouldn’t be much help in showing how the facts were related. And in order to encompass all the different relationships between thousands of facts, a single schema for them would need to be very simple.

For example, a simple schema for chemistry would be that different chemicals are formed from different configurations of the sub-atomic ‘particles’ that make up atoms and configurations of atoms that form molecules. Thousands of facts can be fitted into that schema. In order to have a good understanding of chemistry, students would need to know about schemata other than just that simple one, and would need to know thousands of facts about chemistry before they would qualify as experts, but the simple schema plus a few examples would give them a basic understanding of what chemistry was about.

experts’ schemata Research into expertise (e.g. Chi et al, 1981) shows that experts don’t usually have one single schema for all the facts they know, but instead use different schemata for different aspects of their body of knowledge. Sometimes those schemata are explicitly linked, but sometimes they’re not. Sometimes they can’t be linked because no one knows how the linkage works yet.

chess experts

Daisy refers to research showing that expert chess players memorise thousands of different configurations of chess pieces (p.78). This is classic chunking; although in different chess sets specific pieces vary in appearance, their core visual features and the moves they can make are highly consistent, so frequently-encountered configurations of pieces are eventually treated by the brain as single units – the brain chunks the positions of the chess pieces in essentially the same way as it chunks letters into words.

De Groot’s work showed that chess experts initially identified the configurations of pieces that were possible as a next move, and then went through a process of eliminating the possibilities. The particular configuration of pieces on the board would activate several associated schemata involving possible next and subsequent moves.

So, each of the different configurations of chess pieces that are encountered so frequently they are chunked, has an underlying (simple) schema. Expert chess players then access more complex schemata for next and subsequent possible moves. Even if they have an underlying schema for chess as a whole, it doesn’t follow that they treat chess as a single unit or that they recall all possible configurations at once. Most people can reliably recognise thousands of faces and thousands of words and have schemata for organising them, but when thinking about faces or words, they don’t recall all faces or all words simultaneously. That would rapidly overload working memory.

Compared to most knowledge domains, chess is pretty simple. Chess expertise consists of memorising a large but limited number of configurations and having schemata that predict the likely outcomes from a selection of them. Because of the rules of chess, although lots of moves are possible, the possibilities are clearly defined and limited. Expertise in medicine, say, or history, is considerably more complex and less certain. A doctor might have many schemata for human biology; one for each of the skeletal, nervous, circulatory, respiratory and digestive systems, for cell metabolism, biochemistry and genetics etc. Not only is human biology more complex than chess, there’s also more uncertainty involved. Some of those schemata we’re pretty sure about, some we’re not so sure about and some we know very little about. There’s even more uncertainty involved in history. Evaluating evidence about how the human body works might be difficult, but the evidence itself is readily available in the form of human bodies. Historical evidence is often absent and likely to stay that way, which makes establishing facts and developing schemata more challenging.

To illustrate her point about schemata Daisy claims that learning couple of key facts about 150 historical events from 3000BC to the present, will form “the fundamental chronological schema that is the basis of all historical understanding” (p.20). Chronological sequencing could certainly form a simple schema for history, but you don’t need to know about many events in order to grasp that principle – two or three would suffice. Again, although this simple schema would give students a basic understanding of what history was about, in order to have a good understanding of history, students would need to know not only thousands of facts, but to develop many schemata about how those facts were linked before they would qualify as experts. This brings us on to the deep structure of knowledge, the subject of the next post.

references
Chi, MTH, Feltovich, PJ & Glaser, R (1981). Categorisation and Representation of Physics Problems by Experts and Novices, Cognitive Science, 5, 121-152
de Groot, AD (1978). Thought in Chess. Mouton.

Edited for clarity 8/1/17.

seven myths about education: a knowledge framework

In Seven Myths about Education Daisy Christodoulou refers to Bloom’s taxonomy of educational objectives as a metaphor that leads to two false conclusions; that skills are separate from knowledge and that knowledge is ‘somehow less worthy and important’ (p.21). Bloom’s taxonomy was developed in the 1950s as a way of systematising what students need to do with their knowledge. At the time, quite a lot was known about what people did with knowledge because they usually process it actively and explicitly. Quite a lot less was known about how people acquire knowledge, because much of that process is implicit; students usually ‘just learned’ – or they didn’t. Daisy’s book focuses on how students acquire knowledge, but her framework is an implicit one; she doesn’t link up the various stages of acquiring knowledge in an explicit formal model like Bloom’s. Although I think Daisy makes some valid points about the educational orthodoxy, some features of her model lead to conclusions that are open to question. In this post, I compare the model of cognition that Daisy describes with an established framework for analysing knowledge with origins outside the education sector.

a framework for knowledge

Researchers from a variety of disciplines have proposed frameworks involving levels of abstraction in relation to how knowledge is acquired and organised. The frameworks are remarkably similar. Although there are differences of opinion about terminology and how knowledge is organised at higher levels, there’s general agreement that knowledge is processed along the lines of the catchily named DIKW pyramid – DIKW stands for data, information, knowledge and wisdom. The Wikipedia entry gives you a feel for the areas of agreement and disagreement involved. In the pyramid, each level except the data level involves the extraction of information from the level below. I’ll start at the bottom.



Data

As far as the brain is concerned, data don’t actually tell us anything except whether something is there or not. For computers, data are a series of 0s and 1s; for the brain data is largely in the form of sensory input – light, dark and colour, sounds, tactile sensations, etc.

Information
It’s only when we spot patterns within data that the data can tell us anything. Information consists of patterns that enable us to identify changes, identify connections and make predictions. For computers, information involves detecting patterns in all the 0s and 1s. For the brain it involves detecting patterns in sensory input.

Knowledge
Knowledge has proved more difficult to define, but involves the organisation of information.

Wisdom
Although several researchers have suggested that knowledge is also organised at a meta-level, this hasn’t been extensively explored.

The processes involved in the lower levels of the hierarchy – data and information – are well-established thanks to both computer modelling and brain research. We know a fair bit about the knowledge level largely due to work on how experts and novices think, but how people organise knowledge at a meta-level isn’t so clear.

The key concept in this framework is information. Used in this context, ‘information’ tells you whether something has changed or not, whether two things are the same or not, and identifies patterns. The DIKW hierarchy is sometimes summarised as; information is information about data, knowledge is information about information, and wisdom is information about knowledge.

a simple theory of complex cognition

Daisy begins her exploration of cognitive psychology with a quote by John Anderson, from his paper ACT: A simple theory of complex cognition (p.20). Anderson’s paper tackles the mystique often attached to human intelligence when compared to that of other species. He demonstrates that it isn’t as sophisticated or as complex as it appears, but is derived from a simple underlying principle. He goes on to explain how people extract information from data, deduce production rules and make predictions about commonly occurring patterns, which suggests that the more examples of particular data the brain perceives, the more quickly and accurately it learns. He demonstrates the principle using examples from visual recognition, mathematical problem solving and prediction of word endings.

natural learning

What Anderson describes is how human beings learn naturally; the way brains automatically process any information that happens to come their way unless something interferes with that process. It’s the principle we use to recognise and categorise faces, places and things. It’s the one we use when we learn to talk, solve problems and associate cause with effect. Scattergrams provide a good example of how we extract information from data in this way.

Scatterplot of longitudinal measurements of total brain volume for males (N=475 scans, shown in dark blue) and females (N=354 scans, shown in red).  From Lenroot et al (2007).

Scatterplot of longitudinal measurements of total brain volume for
males (N=475 scans, shown in dark blue) and females (N=354 scans,
shown in red). From Lenroot et al (2007).

Although the image consists of a mass of dots and lines in two colours, we can see at a glance that the different coloured dots and lines form two clusters.

Note that I’m not making the same distinction that Daisy makes between ‘natural’ and ‘not natural’ learning (p.36). Anderson is describing the way the brain learns, by default, when it encounters data. Daisy, in contrast, claims that we learn things like spoken language without visible effort because language is ‘natural’ whereas we need to be taught ‘formally and explicitly’, inventions like the alphabet and numbers. That distinction, although frequently made, isn’t necessarily a valid one. It’s based on an assumption that the brain has evolved mechanisms to process some types of data e.g. to recognise faces and understand speech, but can’t have had time to evolve mechanisms to process recent inventions like writing and mathematics. This assumption about brain hardwiring is a contentious one, and the evidence about how brains learn (including the work that’s developed from Anderson’s theory) makes it look increasingly likely that it’s wrong. If formal and explicit instruction are necessary in order to learn man-made skills like writing and mathematics, it begs the question of how these skills were invented in the first place, and Anderson would not have been able to use mathematical problem-solving and word prediction as his examples of the underlying mechanism of human learning. The theory that the brain is hardwired to process some types of information but not others, and the theory that the same mechanism processes all information, both explain how people appear to learn some things automatically and ‘naturally’. Which theory is right (or whether both are right) is still the subject of intense debate. I’ll return to the second theory later when I discuss schemata.

data, information and chunking

Chunking is a core concept in Daisy’s model of cognition. Chunking occurs when the brain links together several bits of data it encounters frequently and treats them as a single item – groups of letters that frequently co-occur are chunked into words. Anderson’s paper is about the information processing involved in chunking. One of his examples is how the brain chunks the three lines that make up an upper case H. Although Anderson doesn’t make an explicit distinction between data and information, in his examples the three lines would be categorised as data in the DIKW framework, as would be the curves and lines that make up numerals. When the brain figures out the production rule for the configuration of the lines in the letter H, it’s extracting information from the data – spotting a pattern. Because the pattern is highly consistent – H is almost always written using this configuration of lines – the brain can chunk the configuration of lines into the single unit we call the letter H. The letters A and Z also consist of three lines, but have different production rules for their configurations. Anderson shows that chunking can also occur at a slightly higher level; letters (already chunked) can be chunked again into words that are processed as single units, and numerals (already chunked) can be chunked into numbers to which production rules can be applied to solve problems. Again, chunking can take place because the patterns of letters in the words, and the patterns of numerals in Anderson’s mathematical problems are highly consistent. Anderson calls these chunked units and production rules ‘units of knowledge’. He doesn’t use the same nomenclature as the DIKW model, but it’s clear from his model that initial chunking occurs at the data level and further chunking can occur at the information level.

The brain chunks data and low-level units of information automatically; evidence for this comes from research showing that babies begin to identify and categorise objects using visual features and categorise speech sounds using auditory features by about the age of 9 months (Younger, 2003). Chunking also occurs pre-consciously (e.g. Lamme 2003); we know that people are often aware of changes to a chunked unit like a face, a landscape or a piece of music, but don’t know what has changed – someone has shaved off their moustache, a tree has been felled, the song is a cover version with different instrumentation. In addition, research into visual and auditory processing shows that sensory information initially feeds forward in the brain; a lot of processing occurs before the information reaches the location of working memory in the frontal lobes. So at this level, what we are talking about is an automatic, usually pre-conscious process that we use by default.

knowledge – the organisation of information

Anderson’s paper was written in 1995 – twenty years ago – at about the time the DIKW framework was first proposed, which explains why he doesn’t used the same terminology. He calls the chunked units and production rules ‘units of knowledge’ rather than ‘units of information’ because they are the fundamental low-level units from which higher-level knowledge is formed.

Although Anderson’s model of information processing for low-level units still holds true, what has puzzled researchers in the intervening couple of decades is why that process doesn’t scale up. The way people process low-level ‘units of knowledge’ is logical and rational enough to be accurately modelled using computer software, but when handling large amounts of information, such as the concepts involved in day-to-day life, or trying to comprehend, apply, analyse, synthesise or evaluate it, the human brain goes a bit haywire. People (including experts) exhibit a number of errors and biases in their thinking. These aren’t just occasional idiosyncrasies – everybody shows the same errors and biases to varying extents. Since complex information isn’t inherently different to simple information – there’s just more of it – researchers suspected that the errors and biases were due to the wiring of the brain. Work on judgement and decision-making and on the biological mechanisms involved in processing information at higher levels has demonstrated that brains are indeed wired up differently to computers. The reason is that what has shaped the evolution of the human brain isn’t the need to produce logical, rational solutions to problems, but the need to survive, and overall quick-and-dirty information processing tends to result in higher survival rates than slow, precise processing.

What this means is that Anderson’s information processing principle can be applied directly to low-level units of information, but might not be directly applicable to the way people process information at a higher-level, the way they process facts, for example. Facts are the subject of the next post.

References
Anderson, J (1996) ACT: A simple theory of complex cognition, American Psychologist, 51, 355-365.
Lamme, VAF (2003) Why visual attention and awareness are different, TRENDS in Cognitive Sciences, 7, 12-18.
Lenroot,RK, Gogtay, N, Greenstein, DK, Molloy, E, Wallace, GL, Clasen, LS, Blumenthal JD, Lerch,J, Zijdenbos, AP, Evans, AC, Thompson, PM & Giedd, JN (2007). Sexual dimorphism of brain developmental trajectories during childhood and adolescence. NeuroImage, 36, 1065–1073.
Younger, B (2003). Parsing objects into categories: Infants’ perception and use of correlated attributes. In Rakison & Oakes (eds.) Early Category and Concept development: Making sense of the blooming, buzzing confusion, Oxford University Press.

there’s more to working memory than meets the eye

I’ve had several conversations on Twitter with Peter Blenkinsop about learning and the brain. At the ResearchEd conference on Saturday, we continued the conversation and discovered that much of our disagreement was because we were using different definitions of learning. Peter’s definition is that learning involves being able to actively recall information; mine is that it involves changes to the brain in response to information.

working memory

Memory is obviously essential to learning. One thing that’s emerged clearly from years of research into how memory works is that the brain retains information for a very short time in what’s known as working memory, and indefinitely in what’s called long-term memory – but that’s not all there is to it. I felt that advocates of direct instruction at the conference were relying on a model of working memory that was oversimplified and could be misleading. The diagram they were using looked like this;

simple model of memory

simple model of memory

This model is attributed to Daniel Willingham. From what the teachers were saying, the diagram is simpler than most current representations of working memory because its purpose is to illustrate three key points;

• the capacity of working memory is limited and it holds information for a short time
• information in long-term memory is available for recall indefinitely and
• information can be transferred from working memory to long-term memory and vice versa.

So far, so good.

My reservation about the diagram is that if it’s the only diagram of working memory you’ve ever seen, you might get the impression that it shows the path information follows when it’s processed by the brain. From it you might conclude that;

• information from the environment goes directly into working memory
• if you pay attention to that information, it will be stored permanently in long-term memory
• if you don’t pay attention to it it will be lost forever, and
• there’s a very low limit to how much information from the environment you can handle at any one time.

But that’s not quite what happens to information coming into the brain. As Peter pointed out during our conversation, simplifying things appropriately is challenging; you want to simplify enough to avoid confusing people, but not so much that they might misunderstand.

In this post, I’m going to try to explain the slightly bigger picture of how brains process information, and where working memory and long-term memory fit in.

sensory information from the external environment

All information from the external environment comes into the brain via the sense organs. The incoming sensory information is on a relatively large scale, particularly if it’s visual or auditory information; you can see an entire classroom at once and hear simultaneously all the noises emanating from it. But individual cells within the retina or the cochlea respond to tiny fragments of that large-scale information; lines at different angles, areas of light and dark and colour, minute changes in air pressure. Information from the fragments is transmitted via tiny electrical impulses, from the sense organs to the brain. The brain then chunks the fragments together to build larger-scale representations that closely match the information coming in from the environment. As a result, what we perceive is a fairly accurate representation of what’s actually out there. I say ‘fairly accurate’ because perception isn’t 100% accurate, but that’s another story.

chunking

The chunking of sensory information takes place via networks of interconnected neurons (long spindly brain cells). The brain forms physical connections (synapses) between neighbouring neurons in response to novel information. The connections allow electrical activation to pass from one neuron to another. The connections work on a use-it-or-lose-it principle; the more they are used the stronger they get, and if they’re not used much they weaken and disappear. Not surprisingly, toddlers have vast numbers of connections, but that number diminishes considerably during childhood and adolescence. That doesn’t mean we have to keep remembering everything we ever learned or we’ll forget it, it’s a way of ensuring that the brain can process efficiently the types of information from the environment that it’s most likely to encounter.

working memory

Broadly speaking, incoming sensory information is processed in the brain from the back towards the front. It’s fed forward into areas that Alan Baddeley has called variously a ‘loop’, ‘sketchpad’ and ‘buffer’. Whatever you call them, they are areas where very limited amounts of information can be held for very short periods while we decide what to do with it. Research evidence suggests there are different loops/sketchpads/buffers for different types of sensory information – for example Baddeley’s most recent model of working memory includes temporary stores for auditory, visuospatial and episodic information.

Baddeley's working memory model

Baddeley’s working memory model

The incoming information held briefly in the loops/sketchpads/buffers is fed forward again to frontal areas of the brain where it’s constantly monitored by what’s called the central executive – an area that deals with attention and decision-making. The central executive and the loops/sketchpads/buffers together make up working memory.

long-term memory

The information coming into working memory activates the more permanent neural networks that carry information relevant to it – what’s called long-term memory. The neural networks that make up long-term memory are distributed throughout the brain. Several different types of long-term memory have been identified but the evidence points increasingly to the differences being due to where neural networks are located, not to differences in the biological mechanisms involved.

Information in the brain is carried in the pattern of connections between neurons. The principle is similar to the way pixels represent information on a computer screen; that information is carried in the patterns of pixels that are activated. This makes computer screens – and brains – very versatile; they can carry a huge range of different types of information in a relatively small space. One important difference between the two processes is that pixels operate independently, whereas brain cells form physical connections if they are often activated at the same time. The connections allow fast, efficient processing of information that’s encountered frequently.

For example, say I’m looking out of my window at a pigeon. The image of the pigeon falling on my retina will activate the neural networks in my brain that carry information about pigeons; what they look like, sound like, feel like, their flight patterns and feeding habits. My thoughts might then wander off on to related issues; other birds in my garden, when to prune the cherry tree, my neighbour repairing her fence. If I glance away from the pigeon and look at my blank computer screen, other neural networks will be activated, those that carry information about computers, technology, screens and rectangles in general. I will no longer be thinking about pigeons, but my pigeon networks will still be active enough for me to recall that I was looking at a pigeon previously and I might glance out of the window to see if it is still there.

Every time my long-term neural networks are activated by incoming sensory information, they are updated. If the same information comes in repeatedly the connections within the network are strengthened. What’s not clear is how much attention needs to be paid to incoming information in order for it to update long-term memory. Large amounts of information about the changing environment are flowing through working memory all the time, and evidence from brain-damaged patients suggests that long-term memory can be changed even if we’re not paying attention to the information that activates it.

the central executive

Information from incoming sensory information and from long-term memory is fed forward to the central executive. The function of the central executive is a bit like the function of a CCTV control room. According to Antonio Damasio it monitors, evaluates and responds to information from three main sources;

• the external environment (sensory information)
• the internal environment (body states) and
• previous representations of the external and internal environments (carried in the pattern of connections in neural networks).

One difference is that loops/sketchpads/buffers and the system that monitors them consist of networks of interconnected neurons, not TV screens (obviously). Another is that there isn’t anybody watching the brain’s equivalent of the CCTV screens – it’s an automated process. We become aware of information in the loops/sketchpads/buffers only if we need to be aware of it – so we are usually conscious of what’s happening in the external environment or if there are significant changes internally or externally.

The central executive constantly compares the streams of incoming information. It responds to it via networks of neurons that feed back information to other areas of the brain. If the environment has changed significantly, or an interesting or threatening event occurs, or we catch sight of something moving on the periphery of our field of vision, or experience sudden discomfort or pain, the feedback from the central executive ensures that we pay attention to that, rather than anything else. It’s important to note that information from the body includes information about our overall physiological state, including emotions.

So a schematic general diagram of how working memory fits in with information processing in the brain would look something like this:

Slide1

It’s important to note that we still don’t have a clear map of the information processing pathways. Researchers keep coming across different potential loops/sketchpads/buffers and there’s evidence that the feedback and feed-forward pathways are more complex than this diagram shows.

I began this post by suggesting that an over-simplified model of working memory could be misleading. I’ll explain my reasons in more detail in the next post, but first I want to highlight an important implication of the way incoming sensory information is handled by the brain.

pre-conscious processing

A great deal of sensory information is processed by the brain pre-consciously. Advocates of direct instruction emphasise the importance of chunking information because it increases the capacity of working memory. A popular example is the way expert chess players can hold simultaneously in working memory several different configurations of chess pieces, chunking being seen as something ‘experts’ do. But it’s important to remember that the brain chunks information automatically if we’re exposed to it frequently enough. That’s how we recognise faces, places and things – most three year-olds are ‘experts’ in their day-to-day surroundings because they have had thousands of exposures to familiar faces, places and things. They don’t have to sit down and study these things in order to chunk the fragments of information that make up faces, places and things – their visual cortex does it automatically.

This means that a large amount of information going through young children’s working memory is already chunked. We don’t know to what extent the central executive has to actively pay attention to that information in order for it to change long-term memory, but pre-conscious chunking does suggest that a good deal of learning happens implicitly. I’ll comment on this in more detail in my next post.