seven myths about education – what’s missing?

Old Andrew has raised a number of objections to my critique of Seven Myths about Education. In his most recent comment on my previous (and I had hoped, last) post about it, he says I should be able to easily identify evidence that shows ‘what in the cognitive psychology Daisy references won’t scale up’.

One response would be to provide a list of references showing step-by-step the problems that AI researchers ran into. That would take me hours, if not days, because I would have to trawl through references I haven’t looked at for over 20 years. Most of them are not online anyway because of their age, which means Old Andrew would be unlikely to be able to access them.

What is more readily accessible is information about concepts that have emerged from those problems, for example; personal construct theory, schema theory, heuristics and biases, bounded rationality and indexing, connectionist models of cognition and neuroconstructivism. Unfortunately, none of the researchers says “incidentally, this means that students are not necessarily going to develop the right schemata when they commit facts to long-term memory” or “the implications for a curriculum derived from cultural references are obvious”, because they are researching cognition, not education and probably wouldn’t have anticipated anyone suggesting either of these ideas. Whether Old Andrew sees the relevance of these emergent issues or not is secondary, in my view, to how Daisy handles evidence in her book.

concepts and evidence

In the last section of her chapter on Myth 1, Daisy takes us through the concepts of the limited capacity of working memory and chunking. These are well-established, well-tested hypotheses and she cites evidence to support them.

concepts but no evidence

Daisy also appears to introduce two hypotheses of her own. The first is that “we can summon up the information from long-term memory to working memory without imposing a cognitive load” (p.19). The second is that the characteristics of chunking can be extrapolated to all facts, regardless of how complex or inconsistent they might be; “So, when we commit facts to long-term memory they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition” (p.20). The evidence she cites to support this extrapolation is Anderson’s paper – the one about simple, consistent information. I couldn’t find any other evidence cited to support either idea.

evidence but no concepts

Daisy does cite Frantz’s paper about Simon’s work on intuition. Two important concepts of Simon’s that Daisy doesn’t mention but Frantz does, are bounded rationality and the idea of indexing.

Bounded rationality refers to the fact that people can only make sense of the information they have. This supports Daisy’s premise that knowledge is necessary for understanding. But it also supports Friere’s complaint about which facts were being presented to Brazilian schoolchildren. Bounded rationality is also relevant to the idea of the breadth of a curriculum being determined by the frequency of cultural references. Simon used it to challenge economic and political theory.

Simon also pointed out that not only do experts have access to more information than novices do, they can access it more quickly because of their mental cross-indexing, ie the schemata that link relevant information. Rapid speed of access reduces cognitive load, but it doesn’t eliminate it. Chess experts can determine the best next move within seconds, but for most other experts, their knowledge is considerably more complex and less well-defined. A surgeon or an engineer is likely to take days rather than seconds to decide on the best procedure or design to resolve a difficult problem. That implies that quite a heavy cognitive load is involved.

Daisy does mention schemata but doesn’t go into detail about how they are formed or how they influence thinking and understanding. She refers to deep learning in passing but doesn’t tackle the issue Willingham raises about students’ problems with deep structure.

burden of proof

Old Andrew appears to be suggesting that I should assume that Daisy’s assertions are valid unless I can produce evidence to refute them. The burden of proof for a theory usually rests with the person making the claims, for obvious reasons. Daisy cites evidence to support some of her claims, but not all of them. She doesn’t evaluate that evidence by considering its reliability or validity or by taking into account contradictory evidence.

If Daisy had written a book about her musings on cognitive psychology and education, or about how findings from cognitive psychology had helped her teaching, I wouldn’t be writing this. But that’s not what she’s done. She’s used theory from one knowledge domain to challenge theory in another. That can be a very fruitful strategy; the application of game theory and ecological systems theory has transformed several fields. But it’s not helpful simply to take a few concepts out of context from one domain and apply them out of context to another domain.

The reason is that theoretical concepts aren’t free-standing; they are embedded in a conceptual framework. If you’re challenging theory with theory, you need to take a long hard look at both knowledge domains first to get an idea of where particular concepts fit in. You can’t just say “I’m going to apply the concepts of chunking and the limited capacity of working memory to education, but I shan’t bother with schema theory or bounded rationality or heuristics and biases because I don’t think they’re relevant.” Well, you can say that, but it’s not a helpful way to approach problems with learning, because all of these concepts are integral to human cognition. Students don’t leave some of them in the cloakroom when they come into class.

On top of that, the model for pedagogy and the curriculum that Daisy supports is currently influencing international educational policy. If the DfE considers the way evidence has been presented by Hirsch, Willingham and presumably Daisy, as ‘rigorous’, as Michael Gove clearly did, then we’re in trouble.

For Old Andrew’s benefit, I’ve listed some references. Most of them are about things that Daisy doesn’t mention. That’s the point.

references

Axelrod, R (1973). Schema Theory: An Information Processing Model of Perception and Cognition, The American Political Science Review, 67, 1248-1266.
Elman, J et al (1998). Rethinking Innateness: Connectionist Perspective on Development. MIT Press.
Frantz, R (2003). Herbert Simon. Artificial intelligence as a framework for understanding intuition, Journal of Economic Psychology, 24, 265–277.
Kahneman, D., Slovic, P & Tversky A (1982). Judgement under Uncertainty: Heuristics and Biases. Cambridge University Press.
Karmiloff-Smith, A (2009). Nativism Versus Neuroconstructivism: Rethinking the Study of
Developmental Disorders. Developmental Psychology, 45, 56–63.
Kelly, GA (1955). The Psychology of Personal Constructs. New York: Norton.

seven myths about education: finally…

When I first heard about Daisy Christodoulou’s myth-busting book in which she adopts an evidence-based approach to education theory, I assumed that she and I would see things pretty much the same way. It was only when I read reviews (including Daisy’s own summary) that I realised we’d come to rather different conclusions from what looked like the same starting point in cognitive psychology. I’ve been asked several times why, if I have reservations about the current educational orthodoxy, think knowledge is important, don’t have a problem with teachers explaining things and support the use of systematic synthetic phonics, I’m critical of those calling for educational reform, rather than those responsible for a system that needs reforming. The reason involves the deep structure of the models, rather than their surface features.

concepts from cognitive psychology

Central to Daisy’s argument is the concept of the limited capacity of working memory. It’s certainly a core concept in cognitive psychology. It explains not only why we can think about only a few things at once, but also why we oversimplify and misunderstand, are irrational, subject to errors and biases and use quick-and-dirty rules of thumb in our thinking. And it explains why an emphasis on understanding at the expense of factual information is likely to result in students not knowing much and, ironically, not understanding much either.

But what students are supposed to learn is only one of the streams of information that working memory deals with; it simultaneously processes information about students’ internal and external environment. And the limited capacity of working memory is only one of many things that impact on learning; a complex array of environmental factors is also involved. So although you can conceptually isolate the material students are supposed to learn and the limited capacity of working memory, in the classroom neither of them can be isolated from all the other factors involved. And you have to take those other factors into account in order to build a coherent, workable theory of learning.

But Daisy doesn’t introduce only the concept of working memory. She also talks about chunking, schemata and expertise. Daisy implies (although she doesn’t say so explicitly) that schemata are to facts what chunking is to low-level data . That just as students automatically chunk low-level data they encounter repeatedly, so they will automatically form schemata for facts they memorise, and the schemata will reduce cognitive load in the same way that chunking does (p.20). That’s a possibility, because the brain appears to use the same underlying mechanism to represent associations between all types of information – but it’s unlikely. We know that schemata vary considerably between individuals, whereas people chunk information in very similar ways. That’s not surprising if the information being chunked is simple and highly consistent, whereas schemata often involve complex, inconsistent information.

Experimental work involving priming suggests that schemata increase the speed and reliability of access to associated ideas and that would reduce cognitive load, but students would need to have the schemata that experts use explained to them in order to avoid forming schemata of their own that were insufficient or misleading. Daisy doesn’t go into detail about deep structure or schemata, which I think is an oversight, because the schemata students use to organise facts are crucial to their understanding of how the facts relate to each other.

migrating models

Daisy and teachers taking a similar perspective frequently refer approvingly to ‘traditional’ approaches to education. It’s been difficult to figure out exactly what they mean. Daisy focuses on direct instruction and memorising facts, Old Andrew’s definition is a bit broader and Robert Peal’s appears to include cultural artefacts like smart uniforms and school songs. What they appear to have in common is a concept of education derived from the behaviourist model of learning that dominated psychology in the inter-war years. In education it focused on what was being learned; there was little consideration of the broader context involving the purpose of education, power structures, socioeconomic factors, the causes of learning difficulties etc.

Daisy and other would-be reformers appear to be trying to update the behaviourist model of education with concepts that, ironically, emerged from cognitive psychology not long after it switched focus from behaviourist model of learning to a computational one; the point at which the field was first described as ‘cognitive’. The concepts the educational reformers focus on fit the behaviourist model well because they are strongly mechanistic and largely context-free. The examples that crop up frequently in the psychology research Daisy cites usually involve maths, physics and chess problems. These types of problems were chosen deliberately by artificial intelligence researchers because they were relatively simple and clearly bounded; the idea was that once the basic mechanism of learning had been figured out, the principles could then be extended to more complex, less well-defined problems.

Researchers later learned a good deal about complex, less well-defined problems, but Daisy doesn’t refer to that research. Nor do any of the other proponents of educational reform. What more recent research has shown is that complex, less well-defined knowledge is organised by the brain in a different way to simple, consistent information. So in cognitive psychology the computational model of cognition has been complemented by a constructivist one, but it’s a different constructivist model to the social constructivism that underpins current education theory. The computational model never quite made it across to education, but early constructivist ideas did – in the form of Piaget’s work. At that point, education theory appears to have grown legs and wandered off in a different direction to cognitive psychology. I agree with Daisy that education theorists need to pay attention to findings from cognitive psychology, but they need to pay attention to what’s been discovered in the last half century not just to the computational research that superseded behaviourism.

why criticise the reformers?

So why am I critical of the reformers, but not of the educational orthodoxy? When my children started school, they, and I, were sometimes perplexed by the approaches to learning they encountered. Conversations with teachers painted a picture of educational theory that consisted of a hotch-potch of valid concepts, recent tradition, consequences of policy decisions and ideas that appeared to have come from nowhere like Brain Gym and Learning Styles. The only unifying feature I could find was a social constructivist approach and even on that opinions seemed to vary. It was difficult to tell what the educational orthodoxy was, or even if there was one at all. It’s difficult to critique a model that might not be a model. So I perked up when I heard about teachers challenging the orthodoxy using the findings from scientific research and calling for an evidence-based approach to education.

My optimism was short-lived. Although the teachers talked about evidence from cognitive psychology and randomised controlled trials, the model of learning they were proposing appeared as patchy, incomplete and incoherent as the model they were criticising – it was just different. So here are my main reservations about the educational reformers’ ideas:

1. If mainstream education theorists aren’t aware of working memory, chunking, schemata and expertise that suggests there’s a bigger problem than just their ignorance of these particular concepts. It suggests that they might not be paying enough attention to developments in some or all of the knowledge domains their own theory relies on. Knowing about working memory, chunking, schemata and expertise isn’t going to resolve that problem.

2. If teachers don’t know about working memory, chunking, schemata and expertise that suggests there’s a bigger problem than just their ignorance of these particular concepts. It suggests that teacher training isn’t providing teachers with the knowledge they need. To some extent this would be an outcome of weaknesses in educational theory, but I get the impression that trainee teachers aren’t expected or encouraged to challenge what they’re taught. Several teachers who’ve recently discovered cognitive psychology have appeared rather miffed that they hadn’t been told about it. They were all Teach First graduates; I don’t know if that’s significant.

3. A handful of concepts from cognitive psychology doesn’t constitute a robust enough foundation for developing a pedagogical approach or designing a curriculum. Daisy essentially reiterates what Daniel Willingham has to say about the breadth and depth of the curriculum in Why Don’t Students Like School?. He’s a cognitive psychologist and well-placed to show how models of cognition could inform education theory. But his book isn’t about the deep structure of theory, it’s about applying some principles from cognitive psychology in the classroom in response to specific questions from teachers. He explores ideas about pedagogy and the curriculum, but that’s as far as it goes. Trying to develop a model of pedagogy and design a curriculum based on a handful of principles presented in a format like this is like trying to devise courses of treatment and design a health service based on the information gleaned from a GP’s problem page in a popular magazine. But I might be being too charitable; Willingham is a trustee of the Core Knowledge Foundation, after all.

4. Limited knowledge Rightly, the reforming teachers expect students to acquire extensive factual knowledge and emphasise the differences between experts and novices. But Daisy’s knowledge of cognitive psychology appears to be limited to a handful of principles discovered over thirty years ago. She, Robert Peal and Toby Young all quote Daniel Willingham on research in cognitive psychology during the last thirty years, but none of them, Willingham included, tell us what it is. If they did, it would show that the principles they refer to don’t scale up when it comes to complex knowledge. Nor do most of the teachers writing about educational reform appear to have much teaching experience. That doesn’t mean they are wrong, but it does call into question the extent of their expertise relating to education.

Some of those supporting Daisy’s view have told me they are aware that they don’t know much about cognitive psychology, but have argued that they have to start somewhere and it’s important that teachers are made aware of concepts like the limits of working memory. That’s fine if that’s all they are doing, but it’s not. Redesigning pedagogy and the curriculum on the basis if a handful of facts makes sense if you think that what’s important is facts and that the brain will automatically organise those facts into a coherent schema. The problem is of course that that rarely happens in the absence of an overview of all the relevant facts and how they fit together. Cognitive psychology, like all other knowledge domains, has incomplete knowledge but it’s not incomplete in the same way as the reforming teachers’ knowledge. This is classic Sorcerer’s Apprentice territory; a little knowledge, misapplied, can do a lot of damage.

5. Evaluating evidence Then there’s the way evidence is handled. Evidence-based knowledge domains have different ways of evaluating evidence, but they all evaluate it. That means weighing up the pros and cons, comparing evidence for and against competing hypotheses and so on. Evaluating evidence does not mean presenting only the evidence that supports whatever view you want to get across. That might be a way of making your case more persuasive, but is of no use to anyone who wants to know about the reliability of your hypothesis or your evidence. There might be a lot of evidence telling you your hypothesis is right – but a lot more telling you it’s wrong. But Daisy, Robert Peal and Toby Young all present supporting evidence only. They make no attempt to test the hypotheses they’re proposing or the evidence cited, and much of the evidence is from secondary sources – with all due respect to Daniel Willingham, just because he says something doesn’t mean that’s all there is to say on the matter.

cargo-cult science

I suggested to a couple of the teachers who supported Daisy’s model that ironically it resembled Feynman’s famous cargo-cult analogy (p. 97). They pointed out that the islanders were using replicas of equipment, whereas the concepts from cognitive psychology were the real deal. I suggest that even the Americans had left their equipment on the airfield and the islanders knew how to use it, that wouldn’t have resulted in planes bringing in cargo – because there were other factors involved.

My initial response to reading Seven Myths about Education was one of frustration that despite making some good points about the educational orthodoxy and cognitive psychology, Daisy appeared to have got hold of the wrong ends of several sticks. This rapidly changed to concern that a handful of misunderstood concepts is being used as ‘evidence’ to support changes in national education policy.

In Michael Gove’s recent speech at the Education Reform Summit, he refers to the “solidly grounded research into how children actually learn of leading academics such as ED Hirsch or Daniel T Willingham”. Daniel Willingham has published peer-reviewed work, mainly on procedural learning, but I could find none by ED Hirsch. It would be interesting to know what the previous Secretary of State for Education’s criteria for ‘solidly grounded research’ and ‘leading academic’ were. To me the educational reform movement doesn’t look like an evidence-based discipline but bears all the hallmarks of an ideological system looking for evidence that affirms its core beliefs. This is no way to develop public policy. Government should know better.

the MUSEC briefings and Direct Instruction

Yesterday, I got involved in a discussion on Twitter about Direct Instruction (DI). The discussion was largely about what I had or hadn’t said about DI. Twitter isn’t the best medium for discussing anything remotely complex, but there’s something about DI that brings out the pedant in people, me included.

The discussion, if you can call it that, was triggered by a tweet about the most recent MUSEC briefing. The briefings, from Macquarie University Special Education Centre, are a great idea. A one-page round-up of the evidence relating to a particular mode of teaching or treatment used in special education is exactly the sort of resource I’d use often. So why the discussion about this one?

the MUSEC briefings

I’ve bumped into the briefings before. I read one a couple of years ago on the recommendation of a synthetics phonics advocate. It was briefing no.18, Explicit instruction for students with special learning needs. At the time, I wasn’t aware that ‘explicit instruction’ had any particular significance in education – other than denoting instruction that was explicit. And that could involve anything from a teacher walking round the room checking that students understood what they were doing, to ‘talk and chalk’, reading a book or computer-aided learning. The briefing left me feeling bemused. It was packed with implicit assumptions and the references, presented online presumably for reasons of space, included one self-citation, a report that reached a different conclusion to the briefing, a 400-page book by John Hattie that doesn’t appear to reach the same conclusion either, and a paper by Kirschner Sweller and Clark that doesn’t mention children with special educational needs, The references form a useful reading list for teachers, but hardly constitute robust evidence for support the briefing’s conclusions.

My curiosity piqued, I took a look at another briefing, no.33 on behavioural optometry. I chose it because the SP advocates I’d encountered tended to be sceptical about visual impairments being a causal factor in reading difficulties, and I wondered what evidence they were relying on. I knew a bit about visual problems because of my son’s experiences. The briefing repeatedly lumped together things that should have been kept distinct and came to different conclusions to the evidence it cites. I think I was probably unlucky with these first two because some of the other briefings look fine. So what about the one on Direct Instruction, briefing no.39?

Direct Instruction and Project Follow Through

Direct Instruction (capitalized) is a now commercially available scripted learning programme developed by Siegfried Engelmann and Wesley Becker in the US in the 1960s that performed outstandingly well in Project Follow Through (PFT).

The DI programme involved the scripted teaching of reading, arithmetic, and language to children between kindergarten and third grade. The PFT evaluation of DI showed significant gains in basic skills (word knowledge, spelling, language and math computation); in cognitive-conceptual skills (reading comprehension, math concepts, math problem solving) and in affect measures (co-operation, self-esteem, intellectual achievement, responsibility). A high school follow-up study by the sponsors of the DI programme showed that was associated with positive long-term outcomes.

The Twitter discussion revolved around what I meant by ‘basic’ and ‘skills’. To clarify, as I understand it the DI programme itself involved teaching basic skills (reading, arithmetic, language) to quite young children (K-3). The evaluation assessed basic skills, cognitive-conceptual skills and affect measures. There is no indication in the evidence I’ve been able to access of how sophisticated the cognitive-conceptual skills or affect measures were. One would expect them to be typical of children in the K-3 age range. And we don’t know how long those outcomes persisted. The only evidence for long-term positive outcomes is from a study by the programme sponsors – not to be discounted, but not a reliable enough to form the basis for a pedagogical method.

In other words, the PFT evaluation tells us that there were several robust positive outcomes from the DI programme. What it doesn’t tell us is whether the DI approach has the same robust outcomes if applied to other areas of the curriculum and/or with older children. Because the results of the evaluation are aggregated, it doesn’t tell us whether the DI programme benefitted all children or only some, or if it had any negative effects, or what the outcomes were for children with specific special educational needs or learning difficulties – the focus of MUSEC. Nor does it tell us anything about the use of direct instruction in general – what the briefing describes as a “generic overarching concept, with DI as a more specific exemplar”.

the evidence

The briefing refers to “a large body of research evidence stretching back over four decades testifying to the efficacy of explicit/direct instruction methods including the specific DI programs.” So what is the evidence?

The briefing itself refers only to the PFT evaluation of the DI programme. The references, available online consist of:

• a summary of findings written by the authors of the DI programme, Becker & Engelmann,
• a book about DI – the first two authors were Engelmann’s students and worked on the original DI programme,
• an excerpt from the same book on a commercial site called education.com,
• an editorial from a journal called Effective School Practices, previously known as Direct Instruction News and published by the National Institute for Direct Instruction (Chairman S Engelmann)
• a paper about the different ways in which direct instruction is understood, published by the Center on Innovation and Improvement which is administered by the Academic Development Institute, one of whose partners is Little Planet Learning,
• the 400-page book referenced by briefing 18,
• the peer-reviewed paper also referenced by briefing 18.

The references, which I think most people would construe as evidence, include only one peer-reviewed paper. It cites research findings supporting the use of direct instruction in relation to particular types of material, but doesn’t mention children with special needs or learning difficulties. Another reference is a synthesis of peer-reviewed studies. All the other references involve organisations with a commercial interest in educational methods – not the sort of evidence I’d expect to see in a briefing published by a university.

My recommendation for the MUSEC briefings? Approach with caution.

seven myths about education: traditional subjects

In Seven Myths about Education, Daisy Christodoulou refers to the importance of ‘subjects’ and clearly doesn’t think much of cross-curricular projects. In the chapter on myth 5 ‘we should teach transferable skills’ she cites Daniel Willingham pointing out that the human brain isn’t like a calculator that can perform the same operations on any data. Willingham must be referring to higher-level information-processing because Anderson’s model of cognition makes it clear that at lower levels the brain is like a calculator and does perform essentially the same operations on any data; that’s Anderson’s point. Willingham’s point is that skills and knowledge are interdependent; you can’t acquire skills in the absence of knowledge and skills are often subject-specific and depend on the type of knowledge involved.

Daisy dislikes cross-curricular projects because students are unlikely to have the requisite prior knowledge from across several knowledge domains, are often expected to behave like experts when they are novices and get distracted by peripheral tasks. I would suggest those problems are indicators of poor project design rather than problems with cross-curricular work per se. Instead, Daisy would prefer teachers to stick to traditional subject areas.

traditional subjects

Daisy refers several times to traditional subjects, traditional bodies of knowledge and traditional education. The clearest explanation of what she means is on pp.117-119, when discussing the breadth and depth of the curriculum;

For many of the theorists we looked at, subject disciplines were themselves artificial inventions designed to enforce Victorian middle-class values … They may well be human inventions, but they are very useful … because they provide a practical way of teaching … important concepts …. The sentence in English, the place value in mathematics, energy in physics; in each case subjects provide a useful framework for teaching the concept.”

It’s worth considering how the subject disciplines the theorists complained about came into being. At the end of the 18th century, a well-educated, well-read person could have just about kept abreast of most advances in human knowledge. By the end of the 19th century that would have been impossible. The exponential growth of knowledge made increasing specialisation necessary; the names of many specialist occupations including the term ‘scientist’ were coined the 19th century. By the end of the 20th century, knowledge domains/subjects existed that hadn’t even been thought of 200 years earlier.

It makes sense for academic researchers to specialise and for secondary schools to employ teachers who are subject specialists because it’s essential to have good knowledge of a subject if you’re researching it or teaching it. The subject areas taught in secondary schools have been determined largely by the prior knowledge universities require from undergraduates. That determines A level content, which in turn determines GCSE content, which in turn determines what’s taught at earlier stages in school. That model also makes sense; if universities don’t know what’s essential in a knowledge domain, no one does.

The problem for schools is that they can’t teach everything, so someone has to decide on the subjects and subject content that’s included in the curriculum. The critics Daisy cites question traditional subject areas on the grounds that they reflect the interests of a small group of people with high social prestige (p.110-111).

criteria for the curriculum

Daisy doesn’t buy the idea that subject areas represent the interests of a social elite, but she does suggest an alternative criterion for curriculum content. Essentially, this is frequency of citation. In relation to the breadth of the curriculum, she adopts the principle espoused by ED Hirsch (and Daniel Willingham, Robert Peal and Toby Young), of what writers of “broadsheet newspapers and intelligent books” (p.116) assume their readers will know. The writers in question are exemplified by those contributing to the “Washington Post, Chicago Tribune and so on” (Willingham p.47). Toby Young suggests a UK equivalent – “Times leader writers and heavyweight political commentators” (Young p.34). Although this criterion for the curriculum is better than nothing, its limitations are obvious. The curriculum would be determined by what authors, editors and publishers knew about or thought was important. If there were subject areas crucial to human life that they didn’t know about, ignored or deliberately avoided, the next generation would be sunk.

When it comes to the depth of the curriculum, Daisy quotes Willingham; “cognitive science leads to the rather obvious conclusion that students must learn the concepts that come up again and again – the unifying ideas of each discipline” (Willingham p.48). My guess is that Willingham describes the ‘unifying ideas of each discipline’ as ‘concepts that come up again and again’ to avoid going into unnecessary detail about the deep structure of knowledge domains; he makes a clear distinction between the criteria for the breadth and depth of the curriculum in his book. But his choice of wording, if taken out of context, could give the impression that the unifying ideas of each discipline are the concepts that come up again and again in “broadsheet newspapers and intelligent books”.

One problem with the unifying ideas of each discipline is that they don’t always come up again and again. They certainly encompass “the sentence in English, place value in mathematics, energy in physics”, but sometimes the unifying ideas involve deep structure and schemata taken for granted by experts but not often made explicit, particularly to school students.

Daisy points out, rightly, that neither ‘powerful knowledge’ nor ‘high culture’ are owned by a particular social class or culture (p.118). But she apparently fails to see that using cultural references as a criterion for what’s taught in schools could still result in the content of the curriculum being determined by a small, powerful social group; exactly what the traditional subject critics and Daisy herself complain about, though they are referring to different groups.

dead white males

This drawback is illustrated by Willingham’s observation that using the cultural references criterion means “we may still be distressed that much of what writers assume their readers know seems to be touchstones of the culture of dead white males” (p.116). Toby Young turns them into ‘dead white, European males’ (Young p.34, my emphasis).

What advocates of the cultural references model for the curriculum appear to have overlooked is that the dead white males’ domination of cultural references is a direct result of the long period during which European nations colonised the rest of the world. This colonisation (or ‘trade’ depending on your perspective) resulted in Europe becoming wealthy enough to fund many white males (and some females) engaged in the pursuit of knowledge or in creating works of art. What also tends to be forgotten is that the foundation for their knowledge originated with males (and females) who were non-whites and non-Europeans living long before the Renaissance. The dead white guys would have had an even better foundation for their work if people of various ethnic origins hadn’t managed to destroy the library at Alexandria (and a renowned female scholar). The cognitive bias that edits out non-European and non-male contributions to knowledge is also evident in the US and UK versions of the Core Knowledge sequence.

Core Knowledge sequence

Determining the content of the curriculum by the use of cultural references has some coherence, but cultural references don’t necessarily reflect the deep structure of knowledge. Daisy comments favourably on ED Hirsch’s Core Knowledge sequence (p.121). She observes that “The history curriculum is designed to be coherent and cumulative… pupils start in first grade studying the first American peoples, they progress up to the present day, which they reach in the eighth grade. World history runs alongside this, beginning with the Ancient Greeks and progressing to industrialism, the French revolution and Latin American independence movements.”

Hirsch’s Core Knowledge sequence might encompass considerably more factual knowledge than the English national curriculum, but the example Daisy cites clearly leaves some questions unanswered. How did the first American peoples get to America and why did they go there? Who lived in Europe (and other continents) before the Ancient Greeks and why are the Ancient Greeks important? Obviously the further back we go, the less reliable evidence there is, but we know enough about early history and pre-history to be able to develop a reasonably reliable overview of what happened. It’s an overview that clearly demonstrates that the natural environment often had a more significant role than human culture in shaping history. And one that shows that ‘dead white males’ are considerably less important than they appear if the curriculum is derived from cultural references originating in the English-speaking world. Similar caveats apply to the UK equivalent of the Core Knowledge sequence published by Civitas, the one that recommends children in year 1 being taught about the Glorious Revolution and the significance of Robert Walpole.

It’s worth noting that few of the advocates of curriculum content derived from cultural references are scientists; Willingham is, but his background is in human cognition, not chemistry, biology, geology or geography. I think there’s a real risk of overlooking the role that geographical features, climate, minerals, plants and animals have played in human history, and of developing a curriculum that’s so Anglo-centric and culturally focused it’s not going to equip students to tackle the very concrete problems the world is currently facing. Ironically, Daisy and others are recommending that students acquire a strongly socially-constructed body of knowledge, rather than a body of knowledge determined by what’s out there in the real world.

knowledge itself

Michael Young, quoted by Daisy, aptly sums up the difference:

Although we cannot deny the sociality of all forms of knowledge, certain forms of knowledge which I find useful to refer to as powerful knowledge and are often equated with ‘knowledge itself’, have properties that are emergent from and not wholly dependent on their social and historical origins.” (p.118)

Most knowledge domains are pretty firmly grounded in the real world, which means that the knowledge itself has a coherent structure reflecting the real world and therefore, as Michael Young points out, it has emergent properties of its own, regardless of how we perceive or construct it.

So what criteria should we use for the curriculum? Generally, academics and specialist teachers have a good grasp of the unifying principles of their field – the ‘knowledge itself’. So their input would be essential. But other groups have an interest in the curriculum; notably the communities who fund and benefit from the education system and those involved on a day-to-day basis – teachers, parents and students. 100% consensus on a criterion is unlikely, but the outcome might not be any worse than the constant tinkering with the curriculum by government over the past three decades.

why subjects?

‘Subjects’ are certainly a convenient way of arranging our knowledge and they do enable a focus on the deep structure of a specific knowledge domain. But the real world, from which we get our knowledge, isn’t divided neatly into subject areas, it’s an interconnected whole. ‘Subjects’ are facets of knowledge about a world that in reality is highly integrated and interconnected. The problem with teaching along traditional subject area lines is that students are very likely to end up with a fragmented view of how the real world functions, and to miss important connections. Any given subject area might be internally coherent, but there’s often no apparent connection between subject areas, so the curriculum as a whole just doesn’t make sense to students. How does history relate to chemistry or RE to geography? It’s difficult to tell while you are being educated along ‘subject’ lines.

Elsewhere I’ve suggested that what might make sense would be a chronological narrative spine for the curriculum. Learning about the Big Bang, the formation of galaxies, elements, minerals, the atmosphere and supercontinents through the origins of life to early human groups, hunter-gatherer migration, agricultural settlement, the development of cities and so on, makes sense of knowledge that would otherwise be fragmented. And it provides a unifying, overarching framework for any knowledge acquired in the future.

Adopting a chronological curriculum would mean an initial focus on sciences and physical geography; the humanities and the arts wouldn’t be relevant until later for obvious reasons. It wouldn’t preclude simultaneously studying languages, mathematics, music or PE of course – I’m not suggesting a chronological curriculum ‘first and only’ – but a chronological framework would make sense of the curriculum as a whole.

It could also bridge the gap between so-called ‘academic’ and ‘vocational’ subjects. In a consumer society, it’s easy to lose sight of the importance of knowledge about food, water, fuel and infrastructure. But someone has to have that knowledge and our survival and quality of life are dependent on how good their knowledge is and how well they apply it. An awareness of how the need for food, water and fuel has driven human history and how technological solutions have been developed to deal with problems might serve to narrow the academic/vocational divide in a way that results in communities having a better collective understanding of how the real world works.

the curriculum in context

I can understand why Daisy is unimpressed by the idea that skills can be learned in the absence of knowledge or that skills are generic and completely transferable across knowledge domains. You can’t get to the skills at the top of Bloom’s taxonomy by bypassing the foundation level – knowledge. Having said that, I think Daisy’s criteria for the curriculum overlook some important points.

First, although I agree that subjects provide a useful framework for teaching concepts, the real world isn’t neatly divided up into subject areas. Teaching as if it is means it’s not only students who are likely to get a fragmented view of the world, but newspaper columnists, authors and policy-makers might too – with potentially disastrous consequences for all of us. It doesn’t follow that students need to be taught skills that allegedly transfer across all subjects, but they do need to know how subject areas fit together.

Second, although we can never eliminate subjectivity from knowledge, we can minimise it. Most knowledge domains reflect the real world accurately enough for us to be able to put them to good, practical use on a day-to-day basis. It doesn’t follow that all knowledge consists of verified facts or that students will grasp the unifying principles of a knowledge domains by learning thousands of facts. Students need to learn about the deep structure of knowledge domains and how the evidence for the facts they encompass has been evaluated.

Lastly, cultural references are an inadequate criterion for determining the breadth of the curriculum. Cultural references form exactly the sort of socially constructed framework that critics of traditional subject areas complain about. Most knowledge domains are firmly grounded in the real world and the knowledge itself, despite its inherent subjectivity, provides a much more valid and reliable criterion for deciding what students should know that what people are writing about. Knowledge about cultural references might enable students to participate in what Michael Oakeshott called the ‘conversation of mankind’, but life doesn’t consist only of a conversation – at whatever level you understand the term. For most people, even in the developed world, life is just as much about survival and quality of life, and in order to optimise our chances of both, we need to know as much as possible about how the world functions, not just what a small group of people are saying about it.

In my next post, hopefully the final one about Seven Myths, I plan to summarise why I think it’s so important to understand what Daisy and those who support her model of educational reform are saying.

References

Peal, R (2014). Progressively Worse: The Burden of Bad Ideas in British Schools. Civitas.
Willingham, D (2009). Why don’t students like school?. Jossey-Bass.
Young, T (2014). Prisoners of the Blob. Civitas.

seven myths about education: deep structure

deep structure and understanding

Extracting information from data is crucially important for learning; if we can’t spot patterns that enable us to identify changes and make connections and predictions, no amount of data will enable us to learn anything. Similarly, spotting patterns within and between facts enables us to identify changes and connections and make predictions will help us understand how the world works. Understanding is a concept that crops up a lot in information theory and education. Several of the proposed hierarchies of knowledge have included the concept of understanding – almost invariably at or above the knowledge level of the DIKW pyramid. Understanding is often equated with what’s referred to as the deep structure of knowledge. In this post I want to look at deep structure in two contexts; when it involves a small number of facts, and when it involves a very large number, as in an entire knowledge domain.

When I discussed the DIKW pyramid, I referred to information being extracted from a ‘lower’ level of abstraction to form a ‘higher’ one. Now I’m talking about ‘deep’ structure. What’s the difference, if any? The concept of deep structure comes from the field of linguistics. The idea is that you can say the same thing in different ways; the surface features of what you say might be different, but the deep structure of the statements could still be the same. So the sentences ‘the cat is on the mat’ and ‘the mat is under the cat’ have different surface features but the same deep structure. Similarly, ‘the dog is on the box’ and ‘the box is under the dog’ share the same deep structure. From an information-processing perspective the sentences about the dog and the cat share the same underlying schema.

In the DIKW knowledge hierarchy, extracted information is at a ‘higher’ level, not a ‘deeper’ one. The two different terminologies are used because the concepts of ‘higher’ level extraction of information and ‘deep’ structure come have different origins, but essentially they are the same thing. All you need to remember is that in terms of information-processing ‘high’ and ‘deep’ both refer to the same vertical dimension – which term you use depends on your perspective. Higher-level abstractions, deep structure and schemata refer broadly to the same thing.

deep structure and small numbers of facts

Daniel Willingham devotes an entire chapter of his book Why don’t students like school? to the deep structure of knowledge when addressing students’ difficulty in understanding abstract ideas. Willingham describes mathematical problems presented in verbal form that have different surface features but the same deep structure – in his opening example they involve the calculation of the area of a table top and of a soccer pitch (Willingham, p.87). What he is referring to is clearly the concept of a schema, though he doesn’t call it that.

Willingham recognises that students often struggle with deep structure concepts and recommends providing them with many examples and using analogies they’re are familiar with. These strategies would certainly help, but as we’ve seen previously, because the surface features of facts aren’t consistent in terms of sensory data, students’ brains are not going to spot patterns automatically and pre-consciously in the way they do with consistent low-level data and information. To the human brain, a cat on a mat is not the same as a dog on a box. And a couple trying to figure out whether a dining table would be big enough involves very different sensory data to that involved in a groundsman working out how much turf will be needed for a new football pitch.

Willingham’s problems involve several levels of abstraction. Note that the levels of abstraction only provide an overall framework, they’re not set in stone; I’ve had to split the information level into two to illustrate how information needs to be extracted at several successive levels before students can even begin to calculate the area of the table or the football pitch. The levels of abstraction are;

• data – the squiggles that make up letters and the sounds that make up speech
• first-order information – letters and words (chunked)
• second-order information – what the couple is trying to do and what the groundsman is trying to do (not chunked)
• knowledge – the deep structure/schema underlying each problem.

To anyone familiar with calculating area, the problems are simple ones; to anyone unfamiliar with the schema involved, they impose a high cognitive load because the brain is trying to juggle information about couples, tables, groundsmen and football pitches and can’t see the forest for the trees. Most brains would require quite a few examples before they had enough information to be able to spot the two patterns, so it’s not surprising that students who haven’t had much practical experience of buying tables, fitting carpets, painting walls or laying turf take a while to cotton on.

visual vs verbal representations

What might help students further is making explicit the deep structure of groups of facts with the help of visual representations. Visual representations have one huge advantage over verbal representations. Verbal representations, by definition, are processed sequentially – you can only say, hear or read one word at a time. Most people can process verbal information at the same rate at which they hear it or read it, so most students will be able to follow what a teacher is saying or what they are reading, even if it takes a while to figure out what the teacher or the book are getting at. However, if you can’t process verbal information quickly enough, can’t recall earlier sentences whilst processing the current one, miss a word, or don’t understand a crucial word or concept, it will be impossible to make sense of the whole thing. In visual representations, you can see all the key units of information at a glance, most of the information can be processed in parallel and the underlying schema is more obvious.

The concept of calculating area lends itself very well to visual representation; it is a geometry problem after all. Getting the students to draw a diagram of each problem would not only focus their attention on the deep structure rather than its surface features, it would also demonstrate clearly that problems with different surface features can have the same underlying deep structure.

It might not be so easy to make visual representations of the deep structure of other groups of facts, but it’s an approach worth trying because it makes explicit the deep structure of the relationship between the facts. In Seven Myths about Education, one of Daisy’s examples of a fact is the date of the battle of Waterloo. Battles are an excellent example of deep structure/schemata in action. There is a large but limited number of ways two opposing forces can position themselves in battle, whoever they are and whenever and wherever they are fighting, which is why ancient battles are studied by modern military strategists. The configurations of forces and what subsequent configurations are available to them are very similar to the configurations of pieces and next possible moves in chess. Of course chess began as a game of military strategy – as a visual representation of the deep structure of battles.

Deep structure/underlying schemata are a key factor in other domains too. Different atoms and different molecules can share the same deep structure in their bonding and reactions and chemists have developed formal notations for representing that visually; the deep structure of anatomy and physiology can be the same for many different animals – biologists rely heavily on diagrams to convey deep structure information. Historical events and the plots of plays can follow similar patterns even if the events occurred or the plays were written thousands of years apart. I don’t know how often history or English teachers use visual representations to illustrate the deep structure of concepts or groups of facts, but it might help students’ understanding.

deep structure of knowledge domains

It’s not just single facts or small groups of facts that have a deep structure or underlying schema. Entire knowledge domains have a deep structure too, although not necessarily in the form of a single schema; many connected schemata might be involved. How they are connected will depend on how experts arrange their knowledge or how much is known about a particular field.

Making students aware of the overall structure of a knowledge domain – especially if that’s via a visual representation so they can see the whole thing at once – could go a long way to improving their understanding of whatever they happen to be studying at any given time. It’s like the difference between Google Street View and Google Maps. Google Street View is invaluable if you’re going somewhere you’ve never been before and you want to see what it looks like. But Google Maps tells you where you are in relation to where you want to be – essential if you want to know how to get there. Having a mental map of an entire knowledge domain shows you how a particular fact or group of facts fits in to the big picture, and also tells you how much or how little you know.

Daisy’s model of cognition

Daisy doesn’t go into detail about deep structure or schemata. She touches on these concepts only a few times; once in reference to forming a chronological schema of historical events, then when referring to Joe Kirby’s double-helix metaphor for knowledge and skills and again when discussing curriculum design.

I don’t know if Daisy emphasises facts but downplays deep structure and schemata to highlight the point that the educational orthodoxy does essentially the opposite, or whether she doesn’t appreciate the importance of deep structure and schemata compared to surface features. I suspect it’s the latter. Daisy doesn’t provide any evidence to support her suggestion that simply memorising facts reduces cognitive load when she says;

“So when we commit facts to long-term memory, they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition”(p.20).

The examples she refers to immediately prior to this assertion are multiplication facts that meet the criteria for chunking – they are simple and highly consistent and if they are chunked they’d be treated as one item by working memory. Whether facts like the dates of historical events meet the criteria for chunking or whether they occupy less space in working memory when memorised is debatable.

What’s more likely is that if more complex and less consistent facts are committed to memory, they are accessed more quickly and reliably than those that haven’t been memorised. Research evidence suggests that neural connections that are activated frequently become stronger and are accessed faster. Because information is carried in networks of neural connections, the more frequently we access facts or groups of facts, the faster and more reliably we will be able to access them. That’s a good thing. It doesn’t follow that those facts will occupy less space in working memory.

It certainly isn’t the case that simply committing to memory hundreds or thousands of facts will enable students to form a schema, or if they do, that it will be the schema their teacher would like them to form. Teachers might need to be explicit about the schemata that link facts. Since hundreds or thousands of facts tend to be linked by several different schemata – you can arrange the same facts in different ways – being explicit about the different ways they can be linked might be crucial to students’ understanding.

Essentially, deep structure schemata play an important role in three ways;

Students’ pre-existing schemata will affect their understanding of new information – they will interpret it in the light of the way they currently organise their knowledge. Teachers need to know about common misunderstandings as well as what they want students to understand.

Secondly, being able to identify the schema underlying one fact or small group of facts is the starting point for spotting similarities and differences between several groups of facts.

Thirdly, having a bird’s-eye view of the schemata involved in an entire knowledge domain increases students’ chances of understanding where a particular fact fits in to the grand scheme of things – and their awareness of what they don’t know.

Having a bird’s-eye view of the curriculum can help too, because it can show how different subject areas are linked. Subject areas and the curriculum are the subjects of the next post.

seven myths about education: facts and schemata

Knowledge occupies the bottom level of Bloom’s taxonomy of educational objectives. In the 1950s, Bloom and his colleagues would have known a good deal about the strategies teachers use to help students to acquire knowledge. What they couldn’t have known is how students formed their knowledge; how they extracted information from data and knowledge from information. At the time cognitive psychologists knew a fair amount about learning but had only a hazy idea about how it all fitted together. The DIKW pyramid I referred to in the previous post explains how the bottom layer of Bloom’s taxonomy works – how students extract information and knowledge during learning. Anderson’s simple theory of cognition explains how people extract low-level information. More recent research at the knowledge and wisdom levels is beginning to shed light on Bloom’s higher-level skills, why people organise the same body of knowledge in different ways and why they misunderstand and make mistakes.

Seven Myths about Education addresses the knowledge level of Bloom’s taxonomy. Daisy Christodoulou presents a model of cognition that she feels puts the higher-level skills in Bloom’s taxonomy firmly into context. Her model also forms the basis for a pedagogical approach and a structure for a curriculum, which I’ll discuss in another post. Facts are a core feature of Daisy’s model. I’ve mentioned previously that many disciplines find facts problematic because facts, by definition, have to be valid (true), and it’s often difficult to determine their validity. In this post I want to focus instead on the information processing entailed in learning facts.

a simple theory of cognition

Having explained the concept of chunking and the relationship between working and long-term memory, Daisy introduces Anderson’s paper;

So when we commit facts to long-term memory, they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition. Anderson puts it thus:

‘All that there is to intelligence is the simple accrual and tuning of many small units of knowledge that in total produce complex cognition. The whole is no more than the sum of its parts, but it has a lot of parts.’”

She then says “a lot is no exaggeration. Long-term memory is capable of storing thousands of facts, and when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20).

facts

This was one of the points where I began to lose track of Daisy’s argument. I think she’s saying this:

Anderson shows that low-level data can be chunked into a ‘unit of knowledge’ that is then treated as one item by WM – in effect increasing the capacity of WM. In the same way, thousands of memorised facts can be chunked into a more complex unit (a schema) that is then treated as one item by WM – this essentially bypasses the limitations of WM.

I think Daisy assumes that the principle Anderson found pertaining to low-level ‘units of knowledge’ applies to all units of knowledge at whatever level of abstraction. It doesn’t. Before considering why it doesn’t, it’s worth noting a problem with the use of the word ‘facts’ when describing data. Some researchers have equated data with ‘raw facts’. The difficulty with defining data as ‘facts’ is that by definition a fact has to be valid (true) and not all data is valid, as the GIGO (garbage-in-garbage-out) principle that bedevils computer data processing and the human brain’s often flaky perception of sensory input demonstrate. In addition, ‘facts’ are more complex than raw (unprocessed) data or raw (unprocessed) sensory input.

It’s clear from Daisy’s examples of facts that she isn’t referring to raw data or raw sensory input. Her examples include the date of the battle of Waterloo, key facts about numerous historical events and ‘all of the twelve times tables’. She makes it clear in the rest of the book that in order to understand such facts, students need prior knowledge. In terms of the DIKW hierarchy, Daisy’s facts are at a higher level to Anderson’s ‘units of knowledge’ and are unlikely to be processed automatically and pre-consciously in the same way as Anderson’s units. To understand why, we need to take another look at Anderson’s units of knowledge and why chunking happens.

chunking revisited

Data that can be chunked easily have two key characteristics; they involve small amounts of information and the patterns within them are highly consistent. As I mentioned in the previous post, one of Anderson’s examples of chunking is the visual features of upper case H. As far as the brain is concerned, the two parallel vertical lines and linking horizontal line that make up the letter H don’t involve much information. Also, although fonts and handwriting vary, the core features of all the Hs the brain perceives are highly consistent. So the brain soon starts perceiving all Hs as the same thing and chunks up the core features into a single unit – the letter H. If H could also be written Ĥ and Ħ in English, it would take a bit longer for the brain to chunk the three different configurations of lines and to learn the association between them, but not much longer, since the three variants involve little information and are still highly consistent.

understanding facts

But the letter H isn’t a fact, it’s a symbol. So are + and the numerals 1 and 2. ‘1+2’ isn’t a fact in the sense that Daisy uses the term, it’s a series of symbols. ‘1+2=3’ could be considered a fact because it consists of symbols representing two entities and the relationship between them. If you know what the symbols refer to, you can understand it. It could probably be chunked because it contains a small amount of information and has consistent visual features. Each multiplication fact in multiplication tables could probably be chunked, too, since they meet the same criteria. But that’s not true for all the facts that Daisy refers to, because they are more complex and less consistent.

‘The cat is on the mat’ is a fact, but in order to understand it, you need some prior knowledge about cats, mats and what ‘on’ means. These would be treated by working memory as different items. Most English-speaking 5 year-olds would understand this fact, but because there are different sorts of cats, different sorts of mats and different ways in which the cat could be on the mat, each child could have a different mental image of the cat on the mat. A particular child might conjure up a different mental image each time he or she encountered the fact, meaning that different sensory data were involved each time, the mental representations of the fact would be low in consistency, and the fact’s component parts couldn’t be chunked into a single unit in the same way as lower-level more consistent representations. Consequently the fact wouldn’t be treated as one item in working memory.

Similarly, in order to understand a fact like ‘the battle of Waterloo was in 1815’ you’d need to know what a battle is, where Waterloo is (or at least that it’s a place), what 1815 means and how ‘of’ links a battle and a place name. If you’re learning about the Napoleonic wars, your perception of the battle is likely to keep changing and the components of the facts would have low consistency meaning that it couldn’t be chunked in the way Anderson describes.

The same problem involving inconsistency would prevent two or more facts being chunked into a single unit. But clearly people do mentally link facts and the components of facts. They do it using a schema, but not quite in the way Daisy describes.

schemata

Before discussing how people use schemata (schemas), a comment on the biological structures that enable us to form them. I mentioned in an earlier post that the neurons in the brain form complex networks resembling the veins in a leaf. Physical connections are formed between neighbouring neurons when the neurons are activated simultaneously by incoming data. If the same or very similar data are encountered repeatedly, the same neurons are activated repeatedly, connections between them are strengthened and eventually networks of neurons are formed that can carry a vast amount of information in their patterns of connections. The patterns of connections between the neurons represent the individual’s perception of the patterns in the data.

So if I see a cat on a mat, or read a sentence about a cat on a mat, or imagine a cat on a mat, my networks of neurons carrying information about cats and mats will be activated. Facts and concepts about cats, mats and things related to them will readily spring to mind. But I won’t have access to all of those facts and concepts at once. That would completely overload my working memory. Instead, what I recall is a stream of facts and concepts about cats and mats that takes time to access. It’s only a short time, but it doesn’t happen all at once. Also, some facts and concepts will be activated immediately and strongly and others will take longer and might be a bit hazy. In essence, a schema is a network of related facts and concepts, not a chunked ‘unit of knowledge’.

Daisy says “when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20). It doesn’t work quite like that, for several reasons.

the structure of a schema A schema is what it sounds like – a schematic plan or framework. It doesn’t consist of facts or concepts, but it’s a representation of how someone mentally arranges facts or concepts. In the same way the floor-plan of a building doesn’t consist of actual walls, doors and windows, but it does show you where those things are in the building in relation to each other. The importance of this apparently pedantic point will become clear when I discuss deep structure.

implicit and explicit schemata Schemata can be implicit – the brain organises facts and concepts in a particular way but we’re not aware of what it is – or explicit – we actively organise facts and concepts in a particular way and we aware of how they are organised.

the size of a schema Schemata can vary in size and complexity. The configuration of the three lines that make up the letter H is a schema, so is the way a doctor organises his or her knowledge about the human circulatory system. A schema doesn’t have to represent all the facts or concepts it links together. If it did, a schema involving thousands of facts would be so complex it wouldn’t be much help in showing how the facts were related. And in order to encompass all the different relationships between thousands of facts, a single schema for them would need to be very simple.

For example, a simple schema for chemistry would be that different chemicals are formed from different configurations of minute ‘particles’ that make up atoms and configurations of atoms that form molecules. Thousands of facts can be fitted into that schema. In order to have a good understanding of chemistry, students would need to know about schemata other than just that simple one, and would need to know thousands of facts about chemistry before they would qualify as experts, but the simple schema plus a few examples would give them a basic understanding of what chemistry was about.

experts’ schemata Research into expertise (e.g. Chi et al, 1981) shows that experts don’t usually have one single schema for all the facts they know, but instead use different schemata for different aspects of their body of knowledge. Sometimes those schemata are explicitly linked, but sometimes they’re not. Sometimes they can’t be because no one knows how they are linked.

chess experts

Daisy refers to research showing that expert chess players memorise thousands of different configurations of chess pieces (p.78). This is classic chunking; although in different chess sets specific pieces vary in appearance, their core visual features and the moves they can make are highly consistent, so frequently-encountered configurations of pieces are eventually treated by the brain as single units – the brain chunks the positions of the chess pieces in essentially the same way as it chunks letters into words.

De Groot’s work showed that chess experts initially identified the configurations of pieces that were possible as a next move and then went through a process of eliminating the possibilities. The particular configuration of pieces on the board would activate several associated schemata involving possible next and subsequent moves.

So, each of the different configurations of chess pieces that are encountered so frequently they are chunked has an underlying (simple) schema. Expert chess players then access more complex schemata for next and subsequent possible moves. Even if they have an underlying schema for chess as a whole, it doesn’t follow that they treat chess as a single unit or that they recall all possible configurations at once. Most people can reliably recognise thousands of faces and thousands of words and have schemata for organising them, but when thinking about faces or words, they don’t recall all faces or all words simultaneously. That would rapidly overload working memory.

Compared to most knowledge domains, chess is pretty simple. Chess expertise consists of memorising a large but limited number of configurations and having schemata that predict the likely outcomes from a selection of them. Because of the rules of chess, although lots of moves are possible, the possibilities are clearly defined and limited. Expertise in medicine, say, or history, is considerably more complex and less certain. A doctor might have many schemata for human biology; one for each of the skeletal, nervous, circulatory, respiratory and digestive systems, for cell metabolism, biochemistry and genetics etc. Not only is human biology more complex than chess, there’s also more uncertainty involved. Some of those schemata we’re pretty sure about, some we’re not so sure about and some we know comparatively little about. There’s even more uncertainty involved in history. Evaluating evidence about how the human body works might be difficult, but the evidence itself is readily available in the form of human bodies. Historical evidence is often absent and likely to stay that way, which makes establishing facts and developing schemata a bit more challenging.

To illustrate her point about schemata Daisy claims that learning couple of key facts about 150 historical events from 3000BC to the present, will form “the fundamental chronological schema that is the basis of all historical understanding” (p. 20). Chronological sequencing could certainly form a simple schema for history, but you don’t need to know about many events in order to grasp that principle – two or three would suffice. Again, although this simple schema would give students a basic understanding of what history was about, in order to have a good understanding of history, students would need to know not only thousands of facts, but to develop many schemata about how those facts were linked before they would qualify as experts. This brings us on to the deep structure of knowledge, the subject of the next post.

references
Chi, MTH, Feltovich, PJ & Glaser, R (1981). Categorisation and Representation of Physics Problems by Experts and Novices, Cognitive Science, 5, 121-152
de Groot, AD (1978). Thought in Chess. Mouton.

seven myths about education: a knowledge framework

In Seven Myths about Education Daisy Christodoulou refers to Bloom’s taxonomy of educational objectives as a metaphor that leads to two false conclusions; that skills are separate from knowledge and that knowledge is ‘somehow less worthy and important’ (p.21). Bloom’s taxonomy was developed in the 1950s as a way of systematising what students need to do with their knowledge. At the time, quite a lot was known about what people did with knowledge because they usually process it actively and explicitly. Quite a lot less was known about how people acquire knowledge, because much of that process is implicit; students usually ‘just learned’ – or they didn’t. Daisy’s book focuses on how students acquire knowledge, but her framework is an implicit one; she doesn’t link up the various stages of acquiring knowledge in an explicit formal model like Bloom’s. Although I think Daisy makes some valid points about the educational orthodoxy, some features of her model lead to conclusions that are open to question. In this post, I compare the model of cognition that Daisy describes with an established framework for analysing knowledge with origins outside the education sector.

a framework for knowledge

Researchers from a variety of disciplines have proposed frameworks involving levels of abstraction in relation to how knowledge is acquired and organised. The frameworks are remarkably similar. Although there are differences of opinion about terminology and how knowledge is organised at higher levels, there’s general agreement that knowledge is processed along the lines of the catchily named DIKW pyramid – DIKW stands for data, information, knowledge and wisdom. The Wikipedia entry gives you a feel for the areas of agreement and disagreement involved. In the pyramid, each level except the data level involves the extraction of information from the level below. I’ll start at the bottom.



Data

As far as the brain is concerned, data don’t actually tell us anything except whether something is there or not. For computers, data are a series of 0s and 1s; for the brain data is largely in the form of sensory input – light, dark and colour, sounds, tactile sensations, etc.

Information
It’s only when we spot patterns within data that the data can tell us anything. Information consists of patterns that enable us to identify changes, identify connections and make predictions. For computers, information involves detecting patterns in all the 0s and 1s. For the brain it involves detecting patterns in sensory input.

Knowledge
Knowledge has proved more difficult to define, but involves the organisation of information.

Wisdom
Although several researchers have suggested that knowledge is also organised at a meta-level, this hasn’t been extensively explored.

The processes involved in the lower levels of the hierarchy – data and information – are well-established thanks to both computer modelling and brain research. We know a fair bit about the knowledge level largely due to work on how experts and novices think, but how people organise knowledge at a meta-level isn’t so clear.

The key concept in this framework is information. Used in this context, ‘information’ tells you whether something has changed or not, whether two things are the same or not, and identifies patterns. The DIKW hierarchy is sometimes summarised as; information is information about data, knowledge is information about information, and wisdom is information about knowledge.

a simple theory of complex cognition

Daisy begins her exploration of cognitive psychology with a quote by John Anderson, from his paper ACT: A simple theory of complex cognition (p.20). Anderson’s paper tackles the mystique often attached to human intelligence when compared to that of other species. He demonstrates that it isn’t as sophisticated or as complex as it appears, but is derived from a simple underlying principle. He goes on to explain how people extract information from data, deduce production rules and make predictions about commonly occurring patterns, which suggests that the more examples of particular data the brain perceives, the more quickly and accurately it learns. He demonstrates the principle using examples from visual recognition, mathematical problem solving and prediction of word endings.

natural learning

What Anderson describes is how human beings learn naturally; the way brains automatically process any information that happens to come their way unless something interferes with that process. It’s the principle we use to recognise and categorise faces, places and things. It’s the one we use when we learn to talk, solve problems and associate cause with effect. Scattergrams provide a good example of how we extract information from data in this way.

Scatterplot of longitudinal measurements of total brain volume for males (N=475 scans, shown in dark blue) and females (N=354 scans, shown in red).  From Lenroot et al (2007).

Scatterplot of longitudinal measurements of total brain volume for
males (N=475 scans, shown in dark blue) and females (N=354 scans,
shown in red). From Lenroot et al (2007).

Although the image consists of a mass of dots and lines in two colours, we can see at a glance that the different coloured dots and lines form two clusters.

Note that I’m not making the same distinction that Daisy makes between ‘natural’ and ‘not natural’ learning (p.36). Anderson is describing the way the brain learns, by default, when it encounters data. Daisy, in contrast, claims that we learn things like spoken language without visible effort because language is ‘natural’ whereas we need to be taught ‘formally and explicitly’, inventions like the alphabet and numbers. That distinction, although frequently made, isn’t necessarily a valid one. It’s based on an assumption that the brain has evolved mechanisms to process some types of data e.g. to recognise faces and understand speech, but can’t have had time to evolve mechanisms to process recent inventions like writing and mathematics. This assumption about brain hardwiring is a contentious one, and the evidence about how brains learn (including the work that’s developed from Anderson’s theory) makes it look increasingly likely that it’s wrong. If formal and explicit instruction are necessary in order to learn man-made skills like writing and mathematics, it begs the question of how these skills were invented in the first place, and Anderson would not have been able to use mathematical problem-solving and word prediction as his examples of the underlying mechanism of human learning. The theory that the brain is hardwired to process some types of information but not others, and the theory that the same mechanism processes all information, both explain how people appear to learn some things automatically and ‘naturally’. Which theory is right (or whether both are right) is still the subject of intense debate. I’ll return to the second theory later when I discuss schemata.

data, information and chunking

Chunking is a core concept in Daisy’s model of cognition. Chunking occurs when the brain links together several bits of data it encounters frequently and treats them as a single item – groups of letters that frequently co-occur are chunked into words. Anderson’s paper is about the information processing involved in chunking. One of his examples is how the brain chunks the three lines that make up an upper case H. Although Anderson doesn’t make an explicit distinction between data and information, in his examples the three lines would be categorised as data in the DIKW framework, as would be the curves and lines that make up numerals. When the brain figures out the production rule for the configuration of the lines in the letter H, it’s extracting information from the data – spotting a pattern. Because the pattern is highly consistent – H is almost always written using this configuration of lines – the brain can chunk the configuration of lines into the single unit we call the letter H. The letters A and Z also consist of three lines, but have different production rules for their configurations. Anderson shows that chunking can also occur at a slightly higher level; letters (already chunked) can be chunked again into words that are processed as single units, and numerals (already chunked) can be chunked into numbers to which production rules can be applied to solve problems. Again, chunking can take place because the patterns of letters in the words, and the patterns of numerals in Anderson’s mathematical problems are highly consistent. Anderson calls these chunked units and production rules ‘units of knowledge’. He doesn’t use the same nomenclature as the DIKW model, but it’s clear from his model that initial chunking occurs at the data level and further chunking can occur at the information level.

The brain chunks data and low-level units of information automatically; evidence for this comes from research showing that babies begin to identify and categorise objects using visual features and categorise speech sounds using auditory features by about the age of 9 months (Younger, 2003). Chunking also occurs pre-consciously (e.g. Lamme 2003); we know that people are often aware of changes to a chunked unit like a face, a landscape or a piece of music, but don’t know what has changed – someone has shaved off their moustache, a tree has been felled, the song is a cover version with different instrumentation. In addition, research into visual and auditory processing shows that sensory information initially feeds forward in the brain; a lot of processing occurs before the information reaches the location of working memory in the frontal lobes. So at this level, what we are talking about is an automatic, usually pre-conscious process that we use by default.

knowledge – the organisation of information

Anderson’s paper was written in 1995 – twenty years ago – at about the time the DIKW framework was first proposed, which explains why he doesn’t used the same terminology. He calls the chunked units and production rules ‘units of knowledge’ rather than ‘units of information’ because they are the fundamental low-level units from which higher-level knowledge is formed.

Although Anderson’s model of information processing for low-level units still holds true, what has puzzled researchers in the intervening couple of decades is why that process doesn’t scale up. The way people process low-level ‘units of knowledge’ is logical and rational enough to be accurately modelled using computer software, but when handling large amounts of information, such as the concepts involved in day-to-day life, or trying to comprehend, apply, analyse, synthesise or evaluate it, the human brain goes a bit haywire. People (including experts) exhibit a number of errors and biases in their thinking. These aren’t just occasional idiosyncrasies – everybody shows the same errors and biases to varying extents. Since complex information isn’t inherently different to simple information – there’s just more of it – researchers suspected that the errors and biases were due to the wiring of the brain. Work on judgement and decision-making and on the biological mechanisms involved in processing information at higher levels has demonstrated that brains are indeed wired up differently to computers. The reason is that what has shaped the evolution of the human brain isn’t the need to produce logical, rational solutions to problems, but the need to survive, and overall quick-and-dirty information processing tends to result in higher survival rates than slow, precise processing.

What this means is that Anderson’s information processing principle can be applied directly to low-level units of information, but might not be directly applicable to the way people process information at a higher-level, the way they process facts, for example. Facts are the subject of the next post.

References
Anderson, J (1996) ACT: A simple theory of complex cognition, American Psychologist, 51, 355-365.
Lamme, VAF (2003) Why visual attention and awareness are different, TRENDS in Cognitive Sciences, 7, 12-18.
Lenroot,RK, Gogtay, N, Greenstein, DK, Molloy, E, Wallace, GL, Clasen, LS, Blumenthal JD, Lerch,J, Zijdenbos, AP, Evans, AC, Thompson, PM & Giedd, JN (2007). Sexual dimorphism of brain developmental trajectories during childhood and adolescence. NeuroImage, 36, 1065–1073.
Younger, B (2003). Parsing objects into categories: Infants’ perception and use of correlated attributes. In Rakison & Oakes (eds.) Early Category and Concept development: Making sense of the blooming, buzzing confusion, Oxford University Press.