biologically primary and secondary knowledge?

David Geary is an evolutionary psychologist who developed the concept of biologically primary and biologically secondary knowledge, popular with some teachers. I’ve previously critiqued Geary’s ideas as he set them out in a chapter entitled Educating the Evolved Mind. One teacher responded by suggesting I read Geary’s The Origin of Mind because it explained his ideas in more detail. So I did.

Geary’s theory

If I’ve understood correctly, Geary’s argument goes like this:

The human body and brain have evolved over time in response to environmental pressures ranging from climate and diet through to social interaction. For Geary, social interaction is a key driver of evolved brain structures because social interactions can increase the resources available to individuals.

Environmental pressures have resulted in the evolution of brain ‘modules’ specialising in processing certain types of information, such as language or facial features. Information is processed by the modules rapidly, automatically and implicitly, resulting in heuristics (rules of thumb) characteristic of the ‘folk’ psychology, biology and physics that form the default patterns for the way we think. But we are also capable of flexible thought that overrides those default patterns. The flexibility is due to the highly plastic frontal areas of our brain responsible for intelligence. Geary refers to the thinking using the evolved modules as biologically primary, and that involving the plastic frontal areas as biologically secondary.

Chapters 2 & 3 of The Origin of Mind offer a clear, coherent account of Darwinian and hominid evolution respectively. They’d make a great resource for teachers. But when Geary moves on to cognition his model begins to get a little shaky – because it rests on several assumptions.

Theories about evolution of the brain are inevitably speculative because brain tissue decomposes and the fossil record is incomplete. Theories about brain function also involve speculation because our knowledge about how brains work is incomplete. There’s broad agreement on the general principles, but some hypotheses have generated what Geary calls ‘hot debate’. Despite acknowledging the debates, Geary’s model is built on assumptions about which side of the debate is correct. The assumptions involve the modularity of the brain, folk systems, intelligence, and motivation-to-control.

modularity

The general principle of modularity – that there are specific areas of the brain dedicated to processing specific types of information – is not in question. What is less clear is how specialised the modules are. For example, the fusiform face area (FFA) specialises in processing information about faces. But not just faces. It has also been shown to process information about cars, birds, butterflies, chess pieces, Digimon, and novel items called greebles. This raises the question of whether the FFA evolved to process information about faces as such (the Face Specific Hypothesis), or to process information about objects requiring fine-grained discrimination (the Expertise Hypothesis). Geary comes down on the Faces side of the debate on the grounds that the FFA does not “generally respond to other types of objects … that do not have facelike features, except in individuals with inherent sociocognitive deficits, such as autism” (p.141). Geary is entitled to his view, but that’s not the only hotly debated interpretation of the evidence.

folk systems

The general principle of folk systems – evolved forms of thought that result from information being processed rapidly, automatically and implicitly – is also not in question. Geary admits it’s unclear whether the research is “best understood in terms of inherent modular constraints, or as the result of general learning mechanisms” but comes down on the side of children’s thinking being the result of “inherent modular systems”.  I couldn’t find a reference to Eleanor Rosch’s prototype theory developed in the 1970s, which explains folk categories in terms of general learning mechanisms. And it’s regrettable that Rakison & Oakes’ 2008 review of research into how children form categories (that also lends weight to the general learning mechanisms hypothesis) wasn’t published until three years after The Origin of Mind. I don’t know whether either would have prompted Geary to amend his theory.

intelligence

In 1904 Charles Spearman published a review of attempts to measure intellectual ability. He concluded that the correlations between various specific abilities indicated “that there really exists a something that we may provisionally term “General Sensory Discrimination” and similarly a “General Intelligence”” (Spearman p.272).

It’s worth looking at what the specific abilities included. Spearman ranks (p. 276) in order of their correlation with ‘General Intelligence’, performance in: Classics, Common Sense, Pitch Discrimination, French, Cleverness, English, Mathematics, Pitch Discrimination among the uncultured, Music, Light Discrimination and Weight Discrimination.

So, measures of school performance turned out to be good predictors of… school performance. The measures of school performance correlated strongly with ‘General Intelligence’ – a construct derived from… the measures of school performance. This tautology wasn’t lost on other psychologists and Spearman’s conclusions received considerable criticism. As Edwin Boring pointed out in 1923, ‘intelligence’ is defined by the content of ‘intelligence’ tests. The correlations between specific abilities and the predictive power of intelligence tests are well-established. What’s contentious is whether they indicate the existence of an underlying ‘general mental ability’.

Geary says the idea that children’s intellectual functioning can be improved is ‘hotly debated’ (p.295). But he appears to look right past the even hotter debate that’s raged since Spearman’s work was published, about whether the construct general intellectual ability (g) actually represents ‘a something’ that ‘really exists’. Geary assumes it does, and also accepts Cattell’s later constructs crystallised and fluid intelligence without question.

Clearly some people are more ‘intelligent’ than others, so the idea of g initially appears valid. But ‘intelligence’ is, ironically, a folk construct. It’s a label we apply to a set of loosely defined characteristics – a useful shorthand descriptive term. It doesn’t follow that ‘intelligence’ is a biologically determined ‘something’ that ‘really exists’.

motivation-to-control

The motivation to control relationships, events and resources is a key part of Geary’s theory. He argues that motivation-to-control is an evolved disposition (inherent in the way people think) that manifests itself most clearly in the behaviour of despots – who seek to maximise their control of resources. Curiously, in referring to despots, Geary cites a paper by Herb Simon (Simon, 1990) on altruism (a notoriously knotty problem for evolution researchers). Geary describes an equally successful alternative strategy to despotism, not as altruism but as “adherence to [social] laws and mores”, even though the evidence suggests altruism is an evolved disposition, not merely a behaviour.

Altruism calls into question the control part of the motivation-to-control hypothesis. Many people have a tendency to behave in ways that increase their individual control of resources, but many tend to collaborate and co-operate instead – strategies that increase individual access to resources, despite reducing individual control over them. The altruism debate is another that’s been going on for decades, but you wouldn’t know that to read Geary.

Then there’s the motivation part. Like ‘intelligence’, ‘motivation’ is a label for a loosely defined bunch of factors that provide incentives for behaviour. ‘Motivation’ is a useful label. But again it doesn’t follow that ‘motivation’ is ‘a something’ that ‘really exists’. The biological mechanisms involved in the motivation to eat or drink are unlikely to be the same as those involved in wanting to marry the boss’s daughter or improve on our personal best for the half-marathon. The first two examples are likely to increase our access to resources; whether they increase our control over them will depend on the circumstances. Geary doesn’t explain the biological mechanism involved.

biologically primary and secondary knowledge

In The Origin of Mind, Geary touches on the idea of biologically primary and secondary competencies and abilities but doesn’t go into detail about their implications for education. Instead, he illustrates the principle by referring to the controlled problem solving used by Charles Darwin and Alfred Wallace in tackling the problem of how different species had arisen.

Geary says that problem solving of the type used by Darwin and Wallace requires the inhibition of ‘heuristic-based folk systems’ (p.197), and repeatedly proposes (pp.188, 311, 331, 332) that the prior knowledge of scientific pioneers such as Linnaeus, Darwin and Wallace “arose from evolved folk biological systems…as elaborated by associated academic learning” (p.188). He cites as evidence the assumptions resulting from religious belief made by anatomist and palaeontologist Richard Owen (p.187), and Wallace’s reference to an ‘Overruling Intelligence’ being behind natural selection (p.83). But this proposal is problematic, for three reasons:

The first problem is that some ‘evolved’ folk knowledge is explicit, not implicit. Belief in a deity is undoubtedly folk knowledge; societies all over the world have come up with variations on the concept. But the folk knowledge about religious beliefs is usually culturally transmitted to children, rather than generated by them spontaneously.

Another difficulty is that thinkers such as Darwin, Linnaeus, Owen and Wallace had a tendency to be born into scholarly families, so their starting point, even as young children, would not have been merely ‘folk biological systems’. So each of them had the advantage of previous researchers having already reduced their problem- space.

A third challenge is that heuristics aren’t exclusively ‘biologically primary’; they can be learned, as Geary points out, via ‘biologically secondary knowledge’ (p.185).

So if biologically primary knowledge sometimes involves explicit instruction, and biologically secondary knowledge can result in the development of fast, automatic, implicit heuristics, how can we tell which type of knowledge is which?

use of evidence

Geary accepts contentious constructs such as motivation, intelligence and personality (p.319) without question. And he appears to have a rather unique take on concepts such as bounded rationality (p.172), satisficing (p.173) and schemata (p.186).

In addition, although Geary’s evidence is not always contentious, sometimes his conclusions are tenuous. For example, he predicts that if social competition were a driving force during evolution, “a burning desire to master algebra or Newtonian physics will not be universal or even common. Surveys of the attitudes and preferences of American schoolchildren support this prediction and indicate that they value achievement in sports … much more than achievement in any academic area” (pp.334-5), citing a 1993 paper by Eccles et al. The surveys were two studies, the American schoolchildren 865 elementary school students, the attitudes and preferences were competence beliefs and task values, and the academic areas were math, reading and music. Responses show some statistically significant differences. Geary appears to generalise the results, overegg the evidential pudding somewhat, and to completely look past the possibility that there might be culturally transmitted factors involved.

conclusion

I find Geary’s model perplexing. Most of the key links in it – brain evolution, brain modularity, the heuristics and biases that result in ‘folk’ thinking, motivation and intelligence – involve highly contentious hypotheses.  Geary mentions the ‘hot debates’ but doesn’t go into detail. He simply comes down on one side of the debate and builds his model on the assumption that that side is correct.

He appears to have developed an overarching model of cognition and learning and squeezed the evidence into it, rather than building the model according to the evidence. The problem with the second approach of course, is that if the evidence is inconclusive, you can’t develop an overarching model of cognition and learning without it being highly speculative.

What also perplexes me about Geary’s model is its purpose. Teachers have been aware of the difference between implicit and explicit learning (even if they didn’t call it that) for centuries. It’s useful for them to know about brain evolution and modularity and the heuristics and biases that result in ‘folk’ thinking etc. But teachers can usually spot whether children are learning something apparently effortlessly (implicitly) or whether they need step-by-step (explicit) instruction. That’s essentially why teachers exist. Why do they need yet another speculative educational model?

references

Eccles, J., Wigfield, A., Harold, R.D.,  & Blumenfeld, P. (1993). Age and gender differences in children’s self‐and task perceptions during elementary school, Child Development, 64, 830-847.

Gauthier, I., Tarr, M.J., Anderson, A.W., Skudlarski, P. & Gore, J.C.  (1999). Activation of the middle fusiform ‘face area’ increases with expertise in recognizing novel objects, Nature Neuroscience, 2, 568-573.

Rakison, D.H.  & Oakes L.M. (eds) (2008). Early Category and Concept Development.  Oxford University Press.

Simon, H.A. (1990). A mechanism for social selection and successful altruism. Science, 250, 1665-1668.

Spearman, C.  (1904).  ‘General Intelligence’ objectively determined and measured.  The American Journal of Psychology, 15, 201-292.

 

 

educating the evolved mind: education

The previous two posts have been about David Geary’s concepts of primary and secondary knowledge and abilities; evolved minds and intelligence.  This post is about how Geary applies his model to education in Educating the Evolved Mind.

There’s something of a mismatch between the cognitive and educational components of Geary’s model.  The cognitive component is a range of biologically determined functions that have evolved over several millennia.  The educational component is a culturally determined education system cobbled together in a somewhat piecemeal and haphazard fashion over the past century or so.

The education system Geary refers to is typical of the schooling systems in developed industrialised nations, and according to his model, focuses on providing students with biologically secondary knowledge and abilities. Geary points out that many students prefer to focus on biologically primary knowledge and abilities such as sports and hanging out with their mates (p.52).   He recognises they might not see the point of what they are expected to learn and might need its importance explained to them in terms of social value (p.56). He suggests ‘low achieving’ students especially might need explicit, teacher driven instruction (p.43).

You’d think, if cognitive functions have been biologically determined through thousands of years of evolution, that it would make sense to adapt the education system to the cognitive functions, rather then the other way round. But Geary doesn’t appear to question the structure of the current US education system at all; he accepts it as a given. I suggest that in the light of how human cognition works, it might be worth taking a step back and re-thinking the education system itself in the light of the following principles:

1.communities need access to expertise

Human beings have been ‘successful’, in evolutionary terms, mainly due to our use of language. Language means it isn’t necessary for each of us to learn everything for ourselves from scratch; we can pass on information to each other verbally. Reading and writing allow knowledge to be transmitted across time and space. The more knowledge we have as individuals and communities, the better our chances of survival and a decent quality of life.

But, although it’s desirable for everyone to be proficient reader and writer and to have an excellent grasp of collective human knowledge, that’s not necessary in order for each of us to have a decent quality of life. What each community needs is a critical mass of people with good knowledge and skills.

Also, human knowledge is now so vast that no one can be an expert on everything; what’s important is that everyone has access to the expertise they need, when and where they need it.  For centuries, communities have facilitated access to expertise by educating and training experts (from carpenters and builders to doctors and lawyers) who can then share their expertise with their communities.

2.education and training is not just for school

Prior to the development of mass education systems, most children’s and young people’s education and training would have been integrated into the communities in which they lived. They would understand where their new knowledge and skills fitted into the grand scheme of things and how it would benefit them, their families and others. But schools in mass education systems aren’t integrated into communities. The education system has become its own specialism. Children and young people are withdrawn from their community for many hours to be taught whatever knowledge and skills the education system thinks fit. The idea that good exam results will lead to good jobs is expected to provide sufficient motivation for students to work hard at mastering the school curriculum.  Geary recognises that it doesn’t.

For most of the millennia during which cognitive functions have been developing, children and young people have been actively involved in producing food or making goods, and their education and training was directly related to those tasks. Now it isn’t.  I’m not advocating a return to child labour; what I am advocating is ensuring that what children and young people learn in school is directly and explicitly related to life outside school.

Here’s an example: A highlight of the Chemistry O level course I took many years ago was a visit to the nearby Avon (make-up) factory. Not only did we each get a bag of free samples, but in the course of an afternoon the relevance of all that rote learning of industrial applications, all that dry information about emulsions, fat-soluble dyes, anti-fungal additives etc. suddenly came into sharp focus. In addition, the factory was a major local employer and the Avon distribution network was very familiar to us, so the whole end-to-end process made sense.

What’s commonly referred to as ‘academic’ education – fundamental knowledge about how the world works – is vital for our survival and wellbeing as a species. But knowledge about how the world works is also immensely practical. We need to get children and young people out, into the community, to see how their communities apply knowledge about how the world works, and why it’s important. The increasing emphasis in education in the developed world on paper-and-pencil tests, examination results and college attendance is moving the education system in the opposite direction, away from the practical importance of extensive, robust knowledge to our everyday lives.  And Geary appears to go along with that.

3.(not) evaluating the evidence

Broadly speaking, Geary’s model has obvious uses for teachers.   There’s considerable supporting evidence for a two-phase model of cognition ranging from Fodor’s specialised, stable/general, unstable distinction, to the System 1/System 2 model Daniel Kahnemann describes in Thinking, Fast and Slow. Whether the difference between Geary’s biologically primary and secondary knowledge and abilities is as clear-cut as he claims, is a different matter.

It’s also well established that in order to successfully acquire the knowledge usually taught in schools, children need the specific abilities that are measured by intelligence tests; that’s why the tests were invented in the first place. And there’s considerable supporting evidence for the reliability and predictive validity of intelligence tests. They clearly have useful applications in schools. But it doesn’t follow that what we call intelligence or g (never mind gF or gC) is anything other than a construct created by the intelligence test.

In addition, the fact that there is evidence that supports Geary’s claims doesn’t mean all his claims are true. There might also be considerable contradictory evidence; in the case of Geary’s two-phase model the evidence suggests the divide isn’t as clear-cut as he suggests, and the reification of intelligence has been widely critiqued. Geary mentions the existence of ‘vigorous debate’ but doesn’t go into details and doesn’t evaluate the evidence by actually weighing up the pros and cons.

Geary’s unquestioning acceptance of the concepts of modularity, intelligence and education systems in the developed world, increases the likelihood that teachers will follow suit and simply accept Geary’s model as a given. I’ve seen the concepts of biologically primary and secondary knowledge and abilities, crystallised intelligence (gC) and fluid intelligence (gF), and the idea that students with low gF who struggle with biologically secondary knowledge just need explicit direct instruction, all asserted as if they must be true – presumably because an academic has claimed they are and cited evidence in support.

This absence of evaluation of the evidence is especially disconcerting in anyone who emphasises the importance of teachers becoming research-savvy and developing evidence-based practice, or who posits models like Geary’s in opposition to the status quo. The absence of evaluation is also at odds with the oft cited requirement for students to acquire robust, extensive knowledge about a subject before they can understand, apply, analyse, evaluate or use it creatively. That requirement applies only to school children, it seems.

references

Fodor, J (1983).  The modularity of mind.  MIT Press.

Geary, D (2007).  Educating the evolved mind: Conceptual foundations for an evolutionary educational psychology, in Educating the evolved mind: Conceptual foundations for an evolutionary educational psychology, JS Carlson & JR Levin (Eds). Information Age Publishing.

Kahneman, D (2012).  Thinking, fast and slow.   Penguin.

evolved minds and education: evolved minds

At the recent Australian College of Educators conference in Melbourne, John Sweller summarised his talk as follows:  “Biologically primary, generic-cognitive skills do not need explicit instruction.  Biologically secondary, domain-specific skills do need explicit instruction.”

sweller.png

Biologically primary and biologically secondary cognitive skills

This distinction was proposed by David Geary, a cognitive developmental and evolutionary psychologist at the University of Missouri. In a recent blogpost, Greg Ashman refers to a chapter by Geary that sets out his theory in detail.

If I’ve understood it correctly, here’s the idea at the heart of Geary’s model:

*****

The cognitive processes we use by default have evolved over millennia to deal with information (e.g. about predators, food sources) that has remained stable for much of that time. Geary calls these biologically primary knowledge and abilities. The processes involved are fast, frugal, simple and implicit.

But we also have to deal with novel information, including knowledge we’ve learned from previous generations, so we’ve evolved flexible mechanisms for processing what Geary terms biologically secondary knowledge and abilities. The flexible mechanisms are slow, effortful, complex and explicit/conscious.

Biologically secondary processes are influenced by an underlying factor we call general intelligence, or g, related to the accuracy and speed of processing novel information. We use biologically primary processes by default, so they tend to hinder the acquisition of the biologically secondary knowledge taught in schools. Geary concludes the best way for students to acquire the latter is through direct, explicit instruction.

*****

On the face of it, Geary’s model is a convincing one.   The errors and biases associated with the cognitive processes we use by default do make it difficult for us to think logically and rationally. Children are not going to automatically absorb the body of human knowledge accumulated over the centuries, and will need to be taught it actively. Geary’s model is also coherent; its components make sense when put together. And the evidence he marshals in support is formidable; there are 21 pages of references.

However, on closer inspection the distinction between biologically primary and secondary knowledge and abilities begins to look a little blurred. It rests on some assumptions that are the subject of what Geary terms ‘vigorous debate’. Geary does note the debate, but because he plumps for one view, doesn’t evaluate the supporting evidence, and doesn’t go into detail about competing theories, teachers unfamiliar with the domains in question could easily remain unaware of possible flaws in his model. In addition, Geary adopts a particular cultural frame of reference; essentially that of a developed, industrialised society that places high value on intellectual and academic skills. There are good reasons for adopting that perspective; and equally good reasons for not doing so. In a series of three posts, I plan to examine two concepts that have prompted vigorous debate – modularity and intelligence – and to look at Geary’s cultural frame of reference.

Modularity

The concept of modularity – that particular parts of the brain are dedicated to particular functions – is fundamental to Geary’s model.   Physicians have known for centuries that some parts of the brain specialise in processing specific information. Some stroke patients for example, have been reported as being able to write but no longer able to read (alexia without agraphia), to be able to read symbols but not words (pure alexia), or to be unable to recall some types of words (anomia). Language isn’t the only ability involving specialised modules; different areas of the brain are dedicated to processing the visual features of, for example, faces, places and tools.

One question that has long perplexed researchers is how modular the brain actually is. Some functions clearly occur in particular locations and in those locations only; others appear to be more distributed. In the early 1980s, Jerry Fodor tackled this conundrum head-on in his book The modularity of mind. What he concluded is that at the perceptual and linguistic level functions are largely modular, i.e. specialised and stable, but at the higher levels of association and ‘thought’ they are distributed and unstable.  This makes sense; you’d want stability in what you perceive, but flexibility in what you do with those perceptions.

Geary refers to the ‘vigorous debate’ (p.12) between those who lean towards specialised brain functions being evolved and modular, and those who see specialised brain functions as emerging from interactions between lower-level stable mechanisms. Although he acknowledges the importance of interaction and emergence during development (pp. 14,18) you wouldn’t know that from Fig 1.2, showing his ‘evolved cognitive modules’.

At first glance, Geary’s distinction between stable biologically primary functions and flexible biologically secondary functions appears to be the same as Fodor’s stable/unstable distinction. But it isn’t.  Fodor’s modules are low-level perceptual ones; some of Geary’s modules in Fig. 1.2 (e.g. theory of mind, language, non-verbal behaviour) engage frontal brain areas used for the flexible processing of higher-level information.

Novices and experts; novelty and automation

Later in his chapter, Geary refers to research involving these frontal brain areas. Two findings are particularly relevant to his modular theory. The first is that frontal areas of the brain are initially engaged whilst people are learning a complex task, but as the task becomes increasingly automated, frontal area involvement decreases (p.59). Second, research comparing experts’ and novices’ perceptions of physical phenomena (p.69) showed that if there is a conflict between what people see and their current schemas, frontal areas of their brains are engaged to resolve the conflict. So, when physics novices are shown a scientifically accurate explanation, or when physics experts are shown a ‘folk’ explanation, both groups experience conflict.

In other words, what’s processed quickly, automatically and pre-consciously is familiar, overlearned information. If that familiar and overlearned information consists of incomplete and partially understood bits and pieces that people have picked up as they’ve gone along, errors in their ‘folk’ psychology, biology and physics concepts (p.13) are unsurprising. But it doesn’t follow that there must be dedicated modules in the brain that have evolved to produce those concepts.

If the familiar overlearned information is, in contrast, extensive and scientifically accurate, the ‘folk’ concepts get overridden and the scientific concepts become the ones that are accessed quickly, automatically and pre-consciously. In other words, the line between biologically primary and secondary knowledge and abilities might not be as clear as Geary’s model implies.  Here’s an example; the ability to draw what you see.

The eye of the beholder

Most of us are able to recognise, immediately and without error, the face of an old friend, the front of our own house, or the family car. However, if asked to draw an accurate representation of those items, even if they were in front of us at the time, most of us would struggle. That’s because the processes involved in visual recognition are fast, frugal, simple and implicit; they appear to be evolved, modular systems. But there are people can draw accurately what they see in front of them; some can do so ‘naturally’, others train themselves to do so, and still others are taught to do so via direct instruction.  It looks as if the ability to draw accurately straddles Geary’s biologically primary and secondary divide.  The extent to which modules are actually modular is further called into question by recent research involving the fusiform face area (FFA).

Fusiform face area

The FFA is one of the visual processing areas of the brain. It specialises in processing information about faces. What wasn’t initially clear to researchers was whether it processed information about faces only, or whether faces were simply a special case of the type of information it processes. There was considerable debate about this until a series of experiments found that various experts used their FFA for differentiating subtle visual differences within classes of items as diverse as birds, cars, chess configurations, x-ray images, Pokémon, and objects named ‘greebles’ invented by researchers.

What these experiments tell us is that an area of the brain apparently dedicated to processing information about faces, is also used to process information about modern artifacts with features that require fine-grained differentiation in order to tell them apart. They also tell us that modules in the brain don’t seem to draw a clear line between biologically primary information such as faces (no explicit instruction required), and biologically secondary information such as x-ray images or fictitious creatures (where initial explicit instruction is required).

What the experiments don’t tell us is whether the FFA evolved to process information about faces and is being co-opted to process other visually similar information, or whether it evolved to process fine-grained visual distinctions, of which faces happen to be the most frequent example most people encounter.

We know that brain mechanisms have evolved and that has resulted in some modular processing. What isn’t yet clear is exactly how modular the modules are, or whether there is actually a clear divide between biologically primary and biologically secondary abilities. Another component of Geary’s model about which there has been considerable debate is intelligence – the subject of the next post.

Incidentally, it would be interesting to know how Sweller developed his summary because it doesn’t quite map on to a concept of modularity in which the cognitive skills are anything but generic.

References

Fodor, J (1983).  The modularity of mind.  MIT Press.

Geary, D (2007).  Educating the evolved mind: Conceptual foundations for an evolutionary educational psychology, in Educating the evolved mind: Conceptual foundations for an evolutionary educational psychology, JS Carlson & JR Levin (Eds). Information Age Publishing.

Acknowledgements

I thought the image was from @greg_ashman’s Twitter timeline but can’t now find it.  Happy to acknowledge correctly if notified.

the debating society

One of my concerns about the model of knowledge promoted by the Tiger Teachers is that it hasn’t been subjected to sufficient scrutiny.   A couple of days ago on Twitter I said as much.  Jonathan Porter, a teacher at the Michaela Community School, thought my criticism unfair because the school has invited critique by publishing a book and hosting two debating days. Another teacher recommended watching the debate between Guy Claxton and Daisy Christodoulou Sir Ken is right: traditional education kills creativity. She said it may not address my concerns about theory. She was right, it didn’t. But it did suggest a constructive way to extend the Tiger Teachers’ model of knowledge.

the debate

Guy, speaking for the motion and defending Sir Ken Robinson’s views, highlights the importance of schools developing students’ creativity, and answers the question ‘what is creativity?’ by referring to the findings of an OECD study; that creativity emerges from six factors – curiosity, determination, imagination, discipline, craftsmanship and collaboration. Daisy, opposing the motion, says that although she and Guy agree on the importance of creativity and its definition, they differ over the methods used in schools to develop it.

Daisy says Guy’s model involves students learning to be creative by practising being creative, which doesn’t make sense. It’s a valid point. Guy says knowledge is a necessary but not sufficient condition for developing creativity; other factors are involved. Another valid point. Both Daisy and Guy debate the motion but they approach it from very different perspectives, so they don’t actually rigorously test each other’s arguments.

Daisy’s model of creativity is a bottom-up one. Her starting point is how people form their knowledge and how that develops into creativity. Guy’s model, in contrast, is a top-down one; he points out that creativity isn’t a single thing, but emerges from several factors. In this post, I propose that Daisy and Guy are using the same model of creativity, but because Daisy’s focus is on one part and Guy’s on another, their arguments shoot straight past each other, and that in isolation, both perspectives are problematic.

Creativity is a complex construct, as Guy points out. A problem with his perspective is that the factors he found to be associated with creativity are themselves complex constructs. How does ‘curiosity’ manifest itself? Is it the same in everyone or does it vary from person to person? Are there multiple component factors associated with curiosity too? Can we ask the same questions about ‘imagination’? Daisy, in contrast, claims a central role for knowledge and deliberate practice. A problem with Daisy’s perspective is, as I’ve pointed out elsewhere, that her model of knowledge peters out when it comes to the complex cognition Guy refers to. With bit more information, Daisy and Guy could have done some joined-up thinking.  To me, the two models look like the representation below, the grey words and arrows indicating concepts and connections referred to but not explained in detail.

slide1

cognition and expertise

If I’ve understood it correctly, Daisy’s model of creativity is essentially this: If knowledge is firmly embedded in long-term memory (LTM) via lots of deliberate practice and organised into schemas, it results in expertise. Experts can retrieve their knowledge from LTM instantly and can apply it flexibly. In short, creativity is a feature of expertise.

Daisy makes frequent references to research; what scientists think, half a century of research, what all the research has shown. She names names; Herb Simon, Anders Ericsson, Robert Bjork. She reports research showing that expert chess players, football players or musicians don’t practise whole games or entire musical works – they practise short sequences repeatedly until they’ve overlearned them. That’s what enables experts to be creative.

Daisy’s model of expertise is firmly rooted in an understanding of cognition that emerged from artificial intelligence (AI) research in the 1950s and 1960s. At the time, researchers were aware that human cognition was highly complex and often seemed illogical.  Computer science offered an opportunity to find out more; by manipulating the data and rules fed into a computer, researchers could test different models of cognition that might explain how experts thought.

It was no good researchers starting with the most complex illogical thinking – because it was complex and illogical. It made more sense to begin with some simpler examples, which is why the AI researchers chose chess, sport and music as domains to explore. Expertise in these domains looks pretty complex, but the complexity has obvious limits because chess, sport and music have clear, explicit rules. There are thousands of ways you can configure chess pieces or football players and a ball during a game, but you can’t configure them any-old-how because chess and football have rules. Similarly, a musician can play a piece of music in many different ways, but they can’t play it any-old-how because then it wouldn’t be the same piece of music.

In chess, sport and music, experts have almost complete knowledge, clear explicit rules, and comparatively low levels of uncertainty.   Expert geneticists, doctors, sociologists, politicians and historians, in contrast, often work with incomplete knowledge, many of the domain ‘rules’ are unknown, and uncertainty can be very high. In those circumstances, expertise  involves more than simply overlearning a great many facts and applying them flexibly.

Daisy is right that expertise and creativity emerge from deliberate practice of short sequences – for those who play chess, sport or music. Chess, soccer and Beethoven’s piano concerto No. 5 haven’t changed much since the current rules were agreed and are unlikely to change much in future. But domains like medicine, economics and history still periodically undergo seismic shifts in the way whole areas of the domains are structured, as new knowledge comes to light.

This is the point at which Daisy’s and Guy’s models of creativity could be joined up.  I’m not suggesting some woolly compromise between the two. What I am suggesting is that research that followed the early AI work offers the missing link.

I think the missing link is the schema.   Daisy mentions schemata (or schemas if you prefer) but only in terms of arranging historical events chronologically. Joe Kirby in Battle Hymn of the Tiger Teachers also recognises that there can be an underlying schema in the way students are taught.  But the Tiger Teachers don’t explore the idea of the schema in any detail.

schemas, schemata

A schema is the way people mentally organise their knowledge. Some schemata are standardised and widely used – such as the periodic table or multiplication tables. Others are shared by many people, but are a bit variable – such as the Linnaean taxonomy of living organisms or the right/left political divide. But because schemata are constructed from the knowledge and experience of the individual, some are quite idiosyncratic. Many teachers will be familiar with students all taught the same material in the same way, but developing rather different understandings of it.

There’s been a fair amount of research into schemata. The schema was first proposed as a psychological concept by Jean Piaget*. Frederic Bartlett carried out a series of experiments in the 1930s demonstrating that people use schemata, and in the heyday of AI the concept was explored further by, for example, David Rumelhart, Marvin Minsky and Robert Axelrod. It later extended into script theory (Roger Schank and Robert Abelson), and how people form prototypes and categories (e.g. Eleanor Rosch, George Lakoff). The schema might be the missing link between Daisy’s and Guy’s models of creativity, but both models stop before they get there. Here’s how the cognitive science research allows them to be joined up.

Last week I finally got round to reading Jerry Fodor’s book The Modularity of Mind, published in 1983. By that time, cognitive scientists had built up a substantial body of evidence related to cognitive architecture. Although the evidence itself was generally robust, what it was saying about the architecture was ambiguous. It appeared to indicate that cognitive processes were modular, with specific modules processing specific types of information e.g. visual or linguistic. It also indicated that some cognitive processes operated across the board, e.g. problem-solving or intelligence. The debate had tended to be rather polarised.  What Fodor proposed was that cognition isn’t a case of either-or, but of both-and; that perceptual and linguistic processing is modular, but higher-level, more complex cognition that draws on modular information, is global.   His prediction turned out to be pretty accurate, which is why Daisy’s and Guy’s models can be joined up.

Fodor was familiar enough with the evidence to know that he was very likely to be on the right track, but his model of cognition is a complex one, and he knew he could have been wrong about some bits of it. So he deliberately exposes his model to the criticism of cognitive scientists, philosophers and anyone else who cared to comment, because that’s how the scientific method works. A hypothesis is tested. People try to falsify it. If they can’t, then the hypothesis signposts a route worth exploring further. If they can, then researchers don’t need to waste any more time exploring a dead end.

joined-up thinking

Daisy’s model of creativity has emerged from a small sub-field of cognitive science – what AI researchers discovered about expertise in domains with clear, explicit rules. She doesn’t appear to see the need to explore schemata in detail because the schemata used in chess, sport and music are by definition highly codified and widely shared.  That’s why the AI researchers chose them.  The situation is different in the sciences, humanities and arts where schemata are of utmost importance, and differences between them can be the cause of significant conflict.  Guy’s model originates in a very different sub-field of cognitive science – the application of high-level cognitive processes to education. Schemata are a crucial component; although Guy doesn’t explore them in this debate, his previous work indicates he’s very familiar with the concept.

Since the 1950s, cognitive science has exploded into a vast research field, encompassing everything from the dyes used to stain brain tissue, through the statistical analysis of brain scans, to the errors and biases that affect judgement and decision-making by experts. Obviously it isn’t necessary to know everything about cognitive science before you can apply it to teaching, but if you’re proposing a particular model of cognition, having an overview of the field and inviting critique of the model would help avoid unnecessary errors and disagreements.  In this debate, I suggest schemata are noticeable by their absence.

*First use of schema as a psychological concept is widely attributed to Piaget, but I haven’t yet been able to find a reference.

is systematic synthetic phonics generating neuromyths?

A recent Twitter discussion about systematic synthetic phonics (SSP) was sparked by a note to parents of children in a reception class, advising them what to do if their children got stuck on a word when reading. The first suggestion was “encourage them to sound out unfamiliar words in units of sound (e.g. ch/sh/ai/ea) and to try to blend them”. If that failed “can they use the pictures for any clues?” Two other strategies followed. The ensuing discussion began by questioning the wisdom of using pictures for clues and then went off at many tangents – not uncommon in conversations about SSP.
richard adams reading clues

SSP proponents are, rightly, keen on evidence. The body of evidence supporting SSP is convincing but it’s not the easiest to locate; much of the research predates the internet by decades or is behind a paywall. References are often to books, magazine articles or anecdote; not to be discounted, but not what usually passes for research. As a consequence it’s quite a challenge to build up an overview of the evidence for SSP that’s free of speculation, misunderstandings and theory that’s been superseded. The tangents that came up in this particular discussion are, I suggest, the result of assuming that if something is true for SSP in particular it must also be true for reading, perception, development or biology in general. Here are some of the inferences that came up in the discussion.

You can’t guess a word from a picture
Children’s books are renowned for their illustrations. Good illustrations can support or extend the information in the text, showing readers what a chalet, a mountain stream or a pine tree looks like, for example. Author and artist usually have detailed discussions about illustrations to ensure that the book forms an integrated whole and is not just a text with embellishments.

If the child is learning to read, pictures can serve to focus attention (which could be wandering anywhere) on the content of the text and can have a weak priming effect, increasing the likelihood of the child accessing relevant words. If the picture shows someone climbing a mountain path in the snow, the text is unlikely to contain words about sun, sand and ice-creams.

I understand why SSP proponents object to the child being instructed to guess a particular word by looking at a picture; the guess is likely to be wrong and the child distracted from decoding the word. But some teachers don’t seem to be keen on illustrations per se. As one teacher put it “often superficial time consuming detract from learning”.

Cues are clues are guesswork
The note to parents referred to ‘clues’ in the pictures. One contributor cited a blogpost that claimed “with ‘mixed methods’ eyes jump around looking for cues to guess from”. Clues and cues are often used interchangeably in discussions about phonics on social media. That’s understandable; the words have similar meanings and a slip on the keyboard can transform one into the other. But in a discussion about reading methods, the distinction between guessing, clues and cues is an important one.

Guessing involves drawing conclusions in the absence of enough information to give you a good chance of being right; it’s haphazard, speculative. A clue is a piece of information that points you in a particular direction. A cue has a more specific meaning depending on context; e.g. theatrical cues, social cues, sensory cues. In reading research, a cue is a piece of information about something the observer is attending to, or a property of a thing to be attended to. It could be the beginning sound or end letter of a word, or an image representing the word. Cues are directly related to the matter in hand, clues are more indirectly related, guessing is a stab in the dark.

The distinction is important because if teachers are using the terms cue and clue interchangeably and assuming they both involve guessing there’s a risk they’ll mistakenly dismiss references to ‘cues’ in reading research as guessing or clues, which they are not.

Reading isn’t natural
Another distinction that came up in the discussion was the idea of natural vs. non-natural behaviours. One argument for children needing to be actively taught to read rather than picking it up as they go along is that reading, unlike walking and talking, isn’t a ‘natural’ skill. The argument goes that reading is a relatively recent technological development so we couldn’t possibly have evolved mechanisms for reading in the same way as we have evolved mechanisms for walking and talking. One proponent of this idea is Diane McGuinness, an influential figure in the world of synthetic phonics.

The argument rests on three assumptions. The first is that we have evolved specific mechanisms for walking and talking but not for reading. The ideas that evolution has an aim or purpose and that if everybody does something we must have evolved a dedicated mechanism to do it, are strongly contested by those who argue instead that we can do what our anatomy and physiology enable us to do (see arguments over Chomsky’s linguistic theory). But you wouldn’t know about that long-standing controversy from reading McGuinness’s books or comments from SSP proponents.

The second assumption is that children learn to walk and talk without much effort or input from others. One teacher called the natural/non-natural distinction “pretty damn obvious”. But sometimes the pretty damn obvious isn’t quite so obvious when you look at what’s actually going on. By the time they start school, the average child will have rehearsed walking and talking for thousands of hours. And most toddlers experience a considerable input from others when developing their walking and talking skills even if they don’t have what one contributor referred to as a “WEIRDo Western mother”. Children who’ve experienced extreme neglect (such as those raised in the notorious Romanian orphanages) tend to show significant developmental delays.

The third assumption is that learning to use technological developments requires direct instruction. Whether it does or not depends on the complexity of the task. Pointy sticks and heavy stones are technologies used in foraging and hunting, but most small children can figure out for themselves how to use them – as do chimps and crows. Is the use of sticks and stones by crows, chimps or hunter-gatherers natural or non-natural? A bicycle is a man-made technology more complex than sticks and stones, but most people are able to figure out how to ride a bike simply by watching others do it, even if a bit of practice is needed before they can do it themselves. Is learning to ride a bike with a bit of support from your mum or dad natural or non-natural?

Reading English is a more complex task than riding a bike because of the number of letter-sound correspondences. You’d need a fair amount of watching and listening to written language being read aloud to be able to read for yourself. And you’d need considerable instruction and practice before being able to fly a fighter jet because the technology is massively more complex than that involved in bicycles and alphabetic scripts.

One teacher asked “are you really going to go for the continuum fallacy here?” No idea why he considers a continuum a fallacy. In the natural/non-natural distinction used by SSP proponents there are three continua involved;

• the complexity of the task
• the length of rehearsal time required to master the task, and
• the extent of input from others that’s required.

Some children learn to read simply by being read to, reading for themselves and asking for help with words they don’t recognise. But because reading is a complex task, for most children learning to read by immersion like that would take thousands of hours of rehearsal. It makes far more sense to cut to the chase and use explicit instruction. In principle, learning to fly a fighter jet would be possible through trial-and-error, but it would be a stupidly costly approach to training pilots.

Technology is non-biological
I was told by several teachers that reading, riding a bike and flying an aircraft weren’t biological functions. I fail to see how they can’t be, since all involve human beings using their brain and body. It then occurred to me that the teachers are equating ‘biological’ with ‘natural’ or with the human body alone. In other words, if you acquire a skill that involves only body parts (e.g. walking or talking) it’s biological. If it involves anything other than a body part it’s not biological. Not sure where that leaves hunting with wooden spears, making baskets or weaving woolen fabric using a wooden loom and shuttle.

Teaching and learning are interchangeable
Another tangent was whether or not learning is involved in sleeping, eating and drinking. I contended that it is; newborns do not sleep, eat or drink in the same way as most of them will be sleeping, eating or drinking nine months later. One teacher kept telling me they don’t need to be taught to do those things. I can see why teachers often conflate teaching and learning, but they are not two sides of the same coin. You can teach children things but they might fail to learn them. And children can learn things that nobody has taught them. It’s debatable whether or not parents shaping a baby’s sleeping routine, spoon feeding them or giving them a sippy cup instead of a bottle count as teaching, but it’s pretty clear there’s a lot of learning going on.

What’s true for most is true for all
I was also told by one teacher that all babies crawl (an assertion he later modified) and by a school governor that they can all suckle (an assertion that wasn’t modified). Sweeping generalisations like this coming from people working in education is worrying. Children vary. They vary a lot. Even if only 0.1% of children do or don’t do something, that would involve 8 000 children in English schools. Some and most are not all or none and teachers of all people should be aware of that.

A core factor in children learning to read is the complexity of the task. If the task is a complex one, like reading, most children are likely to learn more quickly and effectively if you teach them explicitly. You can’t infer from that that all children are the same, they all learn in the same way or that teaching and learning are two sides of the same coin. Nor can you infer from a tenuous argument used to justify the use of SSP that distinctions between natural and non-natural or biological and technological are clear, obvious, valid or helpful. The evidence that supports SSP is the evidence that supports SSP. It doesn’t provide a general theory for language, education or human development.

synthetic phonics, dyslexia and natural learning

Too intense a focus on the virtues of synthetic phonics (SP) can, it seems, result in related issues getting a bit blurred. I discovered that some whole language supporters do appear to have been ideologically motivated but that the whole language approach didn’t originate in ideology. And as far as I can tell we don’t know if SP can reduce adult functional illiteracy rates. But I wouldn’t have known either of those things from the way SP is framed by its supporters. SP proponents also make claims about how the brain is involved in reading. In this post I’ll look at two of them; dyslexia and natural learning.

Dyslexia

Dyslexia started life as a descriptive label for the reading difficulties adults can develop due to brain damage caused by a stroke or head injury. Some children were observed to have similar reading difficulties despite otherwise normal development. The adults’ dyslexia was acquired (they’d previously been able to read) but the children’s dyslexia was developmental (they’d never learned to read). The most obvious conclusion was that the children also had brain damage – but in the early 20th century when the research started in earnest there was no easy way to determine that.

Medically, developmental dyslexia is still only a descriptive label meaning ‘reading difficulties’ (causes unknown, might/might not be biological, might vary from child to child). However, dyslexia is now also used to denote a supposed medical condition that causes reading difficulties. This new usage is something that Diane McGuinness complains about in Why Children Don’t Learn to Read.

I completely agree with McGuinness that this use isn’t justified and has led to confusion and unintended and unwanted outcomes. But I think she muddies the water further by peppering her discussion of dyslexia (pp. 132-140) with debatable assertions such as:

“We call complex human traits ‘talents’”.

“Normal variation is on a continuum but people working from a medical or clinical model tend to think in dichotomies…”.

“Reading is definitely not a property of the human brain”.

“If reading is a biological property of the brain, transmitted genetically, then this must have occurred by Lamarckian evolution.”

Why debatable? Because complex human traits are not necessarily ‘talents’; clinicians tend to be more aware of normal variation than most people; reading must be a ‘property of the brain’ if we need a brain to read; and the research McGuinness refers to didn’t claim that ‘reading’ was transmitted genetically.

I can understand why McGuinness might be trying to move away from the idea that reading difficulties are caused by a biological impairment that we can’t fix. After all, the research suggests SP can improve the poor phonological awareness that’s strongly associated with reading difficulties. I get the distinct impression, however, that she’s uneasy with the whole idea of reading difficulties having biological causes. She concedes that phonological processing might be inherited (p.140) but then denies that a weakness in discriminating phonemes could be due to organic brain damage. She’s right that brain scans had revealed no structural brain differences between dyslexics and good readers. And in scans that show functional variations, the ability to read might be a cause, rather than an effect.

But as McGuinness herself points out reading is a complex skill involving many brain areas, and biological mechanisms tend to vary between individuals. In a complex biological process there’s a lot of scope for variation. Poor phonological awareness might be a significant factor, but it might not be the only factor. A child with poor phonological awareness plus visual processing impairments plus limited working memory capacity plus slow processing speed – all factors known to be associated with reading difficulties – would be unlikely to find those difficulties eliminated by SP alone. The risk in conceding that reading difficulties might have biological origins is that using teaching methods to remediate them might then called into question – just what McGuinness doesn’t want to happen, and for good reason.

Natural and unnatural abilities

McGuinness’s view of the role of biology in reading seems to be derived from her ideas about the origin of skills. She says;

It is the natural abilities of people that are transmitted genetically, not unnatural abilities that depend upon instruction and involve the integration of many subskills”. (p.140, emphasis McGuinness)

This is a distinction often made by SP proponents. I’ve been told that children don’t need to be taught to walk or talk because these abilities are natural and so develop instinctively and effortlessly. Written language, in contrast, is a recent man-made invention; there hasn’t been time to evolve a natural mechanism for reading, so we need to be taught how to do it and have to work hard to master it. Steven Pinker, who wrote the foreword to Why Children Can’t Read seems to agree. He says “More than a century ago, Charles Darwin got it right: language is a human instinct, but written language is not” (p.ix).

Although that’s a plausible model, what Pinker and McGuinness fail to mention is that it’s also a controversial one. The part played by nature and nurture in the development of language (and other abilities) has been the subject of heated debate for decades. The reason for the debate is that the relevant research findings can be interpreted in different ways. McGuinness is entitled to her interpretation but it’s disingenuous in a book aimed at a general readership not to tell readers that other researchers would disagree.

Research evidence suggests that the natural/unnatural skills model has got it wrong. The same natural/unnatural distinction was made recently in the case of part of the brain called the fusiform gyrus. In the fusiform gyrus, visual information about objects is categorised. Different types of objects, such as faces, places and small items like tools, have their own dedicated locations. Because those types of objects are naturally occurring, researchers initially thought their dedicated locations might be hard-wired.

But there’s also word recognition area. And in experts, the faces area is also used for cars, chess positions, and specially invented items called greebles. To become an expert in any of those things you require some instruction – you’d need to learn the rules of chess or the names of cars or greebles. But your visual system can still learn to accurately recognise, discriminate between and categorise many thousands of items like faces, places, tools, cars, chess positions and greebles simply through hours and hours of visual exposure.

Practice makes perfect

What claimants for ‘natural’ skills also tend to overlook is how much rehearsal goes into them. Most parents don’t actively teach children to talk, but babies hear and rehearse speech for many months before they can say recognisable words. Most parents don’t teach toddlers to walk, but it takes young children years to become fully stable on their feet despite hours of daily practice.

There’s no evidence that as far as the brain is concerned there’s any difference between ‘natural’ and ‘unnatural’ knowledge and skills. How much instruction and practice knowledge or skills require will depend on their transparency and complexity. Walking and bike-riding are pretty transparent; you can see what’s involved by watching other people. But they take a while to learn because of the complexity of the motor-co-ordination and balance involved. Speech and reading are less transparent and more complex than walking and bike-riding, so take much longer to master. But some children require intensive instruction in order to learn to speak, and many children learn to read with minimal input from adults. The natural/unnatural distinction is a false one and it’s as unhelpful as assuming that reading difficulties are caused by ‘dyslexia’.

Multiple causes

What underpins SP proponents’ reluctance to admit biological factors as causes for reading difficulties is, I suspect, an error often made when assessing cause and effect. It’s an easy one to make, but one that people advocating changes to public policy need to be aware of.

Let’s say for the sake of argument that we know, for sure, that reading difficulties have three major causes, A, B and C. The one that occurs most often is A. We can confidently predict that children showing A will have reading difficulties. What we can’t say, without further investigation, is whether a particular child’s reading difficulties are due to A. Or if A is involved, that it’s the only cause.

We know that poor phonological awareness is frequently associated with reading difficulties. Because SP trains children to be aware of phonological features in speech, and because that training improves word reading and spelling, it’s a safe bet that poor phonological awareness is also a cause of reading difficulties. But because reading is a complex skill, there are many possible causes for reading difficulties. We can’t assume that poor phonological awareness is the only cause, or that it’s a cause in all cases.

The evidence that SP improves children’s decoding ability is persuasive. However, the evidence also suggests that 12% – 15% of children will still struggle to learn to decode using SP. And that around 15% of children will struggle with reading comprehension. Having a method of reading instruction that works for most children is great, but education should benefit all children, and since the minority of children who struggle are the ones people keep complaining about, we need to pay attention to what causes reading difficulties for those children – as individuals. In education, one size might fit most, but it doesn’t fit all.

Reference

McGuinness, D. (1998). Why Children Can’t Read and What We Can Do About It. Penguin.

seven myths about education: finally…

When I first heard about Daisy Christodoulou’s myth-busting book in which she adopts an evidence-based approach to education theory, I assumed that she and I would see things pretty much the same way. It was only when I read reviews (including Daisy’s own summary) that I realised we’d come to rather different conclusions from what looked like the same starting point in cognitive psychology. I’ve been asked several times why, if I have reservations about the current educational orthodoxy, think knowledge is important, don’t have a problem with teachers explaining things and support the use of systematic synthetic phonics, I’m critical of those calling for educational reform rather than those responsible for a system that needs reforming. The reason involves the deep structure of the models, rather than their surface features.

concepts from cognitive psychology

Central to Daisy’s argument is the concept of the limited capacity of working memory. It’s certainly a core concept in cognitive psychology. It explains not only why we can think about only a few things at once, but also why we oversimplify and misunderstand, are irrational, are subject to errors and biases and use quick-and-dirty rules of thumb in our thinking. And it explains why an emphasis on understanding at the expense of factual information is likely to result in students not knowing much and, ironically, not understanding much either.

But what students are supposed to learn is only one of the streams of information that working memory deals with; it simultaneously processes information about students’ internal and external environment. And the limited capacity of working memory is only one of many things that impact on learning; a complex array of environmental factors is also involved. So although you can conceptually isolate the material students are supposed to learn and the limited capacity of working memory, in the classroom neither of them can be isolated from all the other factors involved. And you have to take those other factors into account in order to build a coherent, workable theory of learning.

But Daisy doesn’t introduce only the concept of working memory. She also talks about chunking, schemata and expertise. Daisy implies (although she doesn’t say so explicitly) that schemata are to facts what chunking is to low-level data. That just as students automatically chunk low-level data they encounter repeatedly, so they will automatically form schemata for facts they memorise, and the schemata will reduce cognitive load in the same way that chunking does (p.20). That’s a possibility, because the brain appears to use the same underlying mechanism to represent associations between all types of information – but it’s unlikely. We know that schemata vary considerably between individuals, whereas people chunk information in very similar ways. That’s not surprising if the information being chunked is simple and highly consistent, whereas schemata often involve complex, inconsistent information.

Experimental work involving priming suggests that schemata increase the speed and reliability of access to associated ideas and that would reduce cognitive load, but students would need to have the schemata that experts use explained to them in order to avoid forming schemata of their own that were insufficient or misleading. Daisy doesn’t go into detail about deep structure or schemata, which I think is an oversight, because the schemata students use to organise facts are crucial to their understanding of how the facts relate to each other.

migrating models

Daisy and teachers taking a similar perspective frequently refer approvingly to ‘traditional’ approaches to education. It’s been difficult to figure out exactly what they mean. Daisy focuses on direct instruction and memorising facts, Old Andrew’s definition is a bit broader and Robert Peal’s appears to include cultural artefacts like smart uniforms and school songs. What they appear to have in common is a concept of education derived from the behaviourist model of learning that dominated psychology in the inter-war years. In education it focused on what was being learned; there was little consideration of the broader context involving the purpose of education, power structures, socioeconomic factors, the causes of learning difficulties etc.

Daisy and other would-be reformers appear to be trying to update the behaviourist model of education with concepts that, ironically, emerged from cognitive psychology not long after it switched focus from behaviourist model of learning to a computational one; the point at which the field was first described as ‘cognitive’. The concepts the educational reformers focus on fit the behaviourist model well because they are strongly mechanistic and largely context-free. The examples that crop up frequently in the psychology research Daisy cites usually involve maths, physics and chess problems. These types of problems were chosen deliberately by artificial intelligence researchers because they were relatively simple and clearly bounded; the idea was that once the basic mechanism of learning had been figured out, the principles could then be extended to more complex, less well-defined problems.

Researchers later learned a good deal about complex, less well-defined problems, but Daisy doesn’t refer to that research. Nor do any of the other proponents of educational reform. What more recent research has shown is that complex, less well-defined knowledge is organised by the brain in a different way to simple, consistent information. So in cognitive psychology the computational model of cognition has been complemented by a constructivist one, but it’s a different constructivist model to the social constructivism that underpins current education theory. The computational model never quite made it across to education, but early constructivist ideas did – in the form of Piaget’s work. At that point, education theory appears to have grown legs and wandered off in a different direction to cognitive psychology. I agree with Daisy that education theorists need to pay attention to findings from cognitive psychology, but they need to pay attention to what’s been discovered in the last half century not just to the computational research that superseded behaviourism.

why criticise the reformers?

So why am I critical of the reformers, but not of the educational orthodoxy? When my children started school, they, and I, were sometimes perplexed by the approaches to learning they encountered. Conversations with teachers painted a picture of educational theory that consisted of a hotch-potch of valid concepts, recent tradition, consequences of policy decisions and ideas that appeared to have come from nowhere like Brain Gym and Learning Styles. The only unifying feature I could find was a social constructivist approach and even on that opinions seemed to vary. It was difficult to tell what the educational orthodoxy was, or even if there was one at all. It’s difficult to critique a model that might not be a model. So I perked up when I heard about teachers challenging the orthodoxy using the findings from scientific research and calling for an evidence-based approach to education.

My optimism was short-lived. Although the teachers talked about evidence from cognitive psychology and randomised controlled trials, the model of learning they were proposing appeared as patchy, incomplete and incoherent as the model they were criticising – it was just different. So here are my main reservations about the educational reformers’ ideas:

1. If mainstream education theorists aren’t aware of working memory, chunking, schemata and expertise, that suggests there’s a bigger problem than just their ignorance of these particular concepts. It suggests that they might not be paying enough attention to developments in some or all of the knowledge domains their own theory relies on. Knowing about working memory, chunking, schemata and expertise isn’t going to resolve that problem.

2. If teachers don’t know about working memory, chunking, schemata and expertise, that suggests there’s a bigger problem than just their ignorance of these particular concepts. It suggests that teacher training isn’t providing teachers with the knowledge they need. To some extent this would be an outcome of weaknesses in educational theory, but I get the impression that trainee teachers aren’t expected or encouraged to challenge what they’re taught. Several teachers who’ve recently discovered cognitive psychology have appeared rather miffed that they hadn’t been told about it. They were all Teach First graduates; I don’t know if that’s significant.

3. A handful of concepts from cognitive psychology doesn’t constitute a robust enough foundation for developing a pedagogical approach or designing a curriculum. Daisy essentially reiterates what Daniel Willingham has to say about the breadth and depth of the curriculum in Why Don’t Students Like School?. He’s a cognitive psychologist and well-placed to show how models of cognition could inform education theory. But his book isn’t about the deep structure of theory, it’s about applying some principles from cognitive psychology in the classroom in response to specific questions from teachers. He explores ideas about pedagogy and the curriculum, but that’s as far as it goes. Trying to develop a model of pedagogy and design a curriculum based on a handful of principles presented in a format like this is like trying to devise courses of treatment and design a health service based on the information gleaned from a GP’s problem page in a popular magazine. But I might be being too charitable; Willingham is a trustee of the Core Knowledge Foundation, after all.

4. Limited knowledge Rightly, the reforming teachers expect students to acquire extensive factual knowledge and emphasise the differences between experts and novices. But Daisy’s knowledge of cognitive psychology appears to be limited to a handful of principles discovered over thirty years ago. She, Robert Peal and Toby Young all quote Daniel Willingham on research in cognitive psychology during the last thirty years, but none of them, Willingham included, tell us what it is. If they did, it would show that the principles they refer to don’t scale up when it comes to complex knowledge. Nor do most of the teachers writing about educational reform appear to have much teaching experience. That doesn’t mean they are wrong, but it does call into question the extent of their expertise relating to education.

Some of those supporting Daisy’s view have told me they are aware that they don’t know much about cognitive psychology, but have argued that they have to start somewhere and it’s important that teachers are made aware of concepts like the limits of working memory. That’s fine if that’s all they are doing, but it’s not. Redesigning pedagogy and the curriculum on the basis of a handful of facts makes sense if you think that what’s important is facts and that the brain will automatically organise those facts into a coherent schema. The problem is of course that that rarely happens in the absence of an overview of all the relevant facts and how they fit together. Cognitive psychology, like all other knowledge domains, has incomplete knowledge but it’s not incomplete in the same way as the reforming teachers’ knowledge. This is classic Sorcerer’s Apprentice territory; a little knowledge, misapplied, can do a lot of damage.

5. Evaluating evidence Then there’s the way evidence is handled. Evidence-based knowledge domains have different ways of evaluating evidence, but they all evaluate it. That means weighing up the pros and cons, comparing evidence for and against competing hypotheses and so on. Evaluating evidence does not mean presenting only the evidence that supports whatever view you want to get across. That might be a way of making your case more persuasive, but is of no use to anyone who wants to know about the reliability of your hypothesis or your evidence. There might be a lot of evidence telling you your hypothesis is right – but a lot more telling you it’s wrong. But Daisy, Robert Peal and Toby Young all present supporting evidence only. They make no attempt to test the hypotheses they’re proposing or the evidence cited, and much of the evidence is from secondary sources – with all due respect to Daniel Willingham, just because he says something doesn’t mean that’s all there is to say on the matter.

cargo-cult science

I suggested to a couple of the teachers who supported Daisy’s model that ironically it resembled Feynman’s famous cargo-cult analogy (p. 97). They pointed out that the islanders were using replicas of equipment, whereas the concepts from cognitive psychology were the real deal. I suggest that even the Americans had left their equipment on the airfield and the islanders knew how to use it, that wouldn’t have resulted in planes bringing in cargo – because there were other factors involved.

My initial response to reading Seven Myths about Education was one of frustration that despite making some good points about the educational orthodoxy and cognitive psychology, Daisy appeared to have got hold of the wrong ends of several sticks. This rapidly changed to concern that a handful of misunderstood concepts is being used as ‘evidence’ to support changes in national education policy.

In Michael Gove’s recent speech at the Education Reform Summit, he refers to the “solidly grounded research into how children actually learn of leading academics such as ED Hirsch or Daniel T Willingham”. Daniel Willingham has published peer-reviewed work, mainly on procedural learning, but I could find none by ED Hirsch. It would be interesting to know what the previous Secretary of State for Education’s criteria for ‘solidly grounded research’ and ‘leading academic’ were. To me the educational reform movement doesn’t look like an evidence-based discipline but bears all the hallmarks of an ideological system looking for evidence that affirms its core beliefs. This is no way to develop public policy. Government should know better.

seven myths about education: traditional subjects

In Seven Myths about Education, Daisy Christodoulou refers to the importance of ‘subjects’ and clearly doesn’t think much of cross-curricular projects. In the chapter on myth 5 ‘we should teach transferable skills’ she cites Daniel Willingham pointing out that the human brain isn’t like a calculator that can perform the same operations on any data. Willingham must be referring to higher-level information-processing because Anderson’s model of cognition makes it clear that at lower levels the brain is like a calculator and does perform essentially the same operations on any data; that’s Anderson’s point. Willingham’s point is that skills and knowledge are interdependent; you can’t acquire skills in the absence of knowledge and skills are often subject-specific and depend on the type of knowledge involved.

Daisy dislikes cross-curricular projects because students are unlikely to have the requisite prior knowledge from across several knowledge domains, are often expected to behave like experts when they are novices and get distracted by peripheral tasks. I would suggest those problems are indicators of poor project design rather than problems with cross-curricular work per se. Instead, Daisy would prefer teachers to stick to traditional subject areas.

traditional subjects

Daisy refers several times to traditional subjects, traditional bodies of knowledge and traditional education. The clearest explanation of what she means is on pp.117-119, when discussing the breadth and depth of the curriculum;

For many of the theorists we looked at, subject disciplines were themselves artificial inventions designed to enforce Victorian middle-class values … They may well be human inventions, but they are very useful … because they provide a practical way of teaching … important concepts …. The sentence in English, the place value in mathematics, energy in physics; in each case subjects provide a useful framework for teaching the concept.”

It’s worth considering how the subject disciplines the theorists complained about came into being. At the end of the 18th century, a well-educated, well-read person could have just about kept abreast of most advances in human knowledge. By the end of the 19th century that would have been impossible. The exponential growth of knowledge made increasing specialisation necessary; the names of many specialist occupations including the term ‘scientist’ were coined the 19th century. By the end of the 20th century, knowledge domains/subjects existed that hadn’t even been thought of 200 years earlier.

It makes sense for academic researchers to specialise and for secondary schools to employ teachers who are subject specialists because it’s essential to have good knowledge of a subject if you’re researching it or teaching it. The subject areas taught in secondary schools have been determined largely by the prior knowledge universities require from undergraduates. That determines A level content, which in turn determines GCSE content, which in turn determines what’s taught at earlier stages in school. That model also makes sense; if universities don’t know what’s essential in a knowledge domain, no one does.

The problem for schools is that they can’t teach everything, so someone has to decide on the subjects and subject content that’s included in the curriculum. The critics Daisy cites question traditional subject areas on the grounds that they reflect the interests of a small group of people with high social prestige (p.110-111).

criteria for the curriculum

Daisy doesn’t buy the idea that subject areas represent the interests of a social elite, but she does suggest an alternative criterion for curriculum content. Essentially, this is frequency of citation. In relation to the breadth of the curriculum, she adopts the principle espoused by ED Hirsch (and Daniel Willingham, Robert Peal and Toby Young), of what writers of “broadsheet newspapers and intelligent books” (p.116) assume their readers will know. The writers in question are exemplified by those contributing to the “Washington Post, Chicago Tribune and so on” (Willingham p.47). Toby Young suggests a UK equivalent – “Times leader writers and heavyweight political commentators” (Young p.34). Although this criterion for the curriculum is better than nothing, its limitations are obvious. The curriculum would be determined by what authors, editors and publishers knew about or thought was important. If there were subject areas crucial to human life that they didn’t know about, ignored or deliberately avoided, the next generation would be sunk.

When it comes to the depth of the curriculum, Daisy quotes Willingham; “cognitive science leads to the rather obvious conclusion that students must learn the concepts that come up again and again – the unifying ideas of each discipline” (Willingham p.48). My guess is that Willingham describes the ‘unifying ideas of each discipline’ as ‘concepts that come up again and again’ to avoid going into unnecessary detail about the deep structure of knowledge domains; he makes a clear distinction between the criteria for the breadth and depth of the curriculum in his book. But his choice of wording, if taken out of context, could give the impression that the unifying ideas of each discipline are the concepts that come up again and again in “broadsheet newspapers and intelligent books”.

One problem with the unifying ideas of each discipline is that they don’t always come up again and again. They certainly encompass “the sentence in English, place value in mathematics, energy in physics”, but sometimes the unifying ideas involve deep structure and schemata taken for granted by experts but not often made explicit, particularly to school students.

Daisy points out, rightly, that neither ‘powerful knowledge’ nor ‘high culture’ are owned by a particular social class or culture (p.118). But she apparently fails to see that using cultural references as a criterion for what’s taught in schools could still result in the content of the curriculum being determined by a small, powerful social group; exactly what the traditional subject critics and Daisy herself complain about, though they are referring to different groups.

dead white males

This drawback is illustrated by Willingham’s observation that using the cultural references criterion means “we may still be distressed that much of what writers assume their readers know seems to be touchstones of the culture of dead white males” (p.116). Toby Young turns them into ‘dead white, European males’ (Young p.34, my emphasis).

What advocates of the cultural references model for the curriculum appear to have overlooked is that the dead white males’ domination of cultural references is a direct result of the long period during which European nations colonised the rest of the world. This colonisation (or ‘trade’ depending on your perspective) resulted in Europe becoming wealthy enough to fund many white males (and some females) engaged in the pursuit of knowledge or in creating works of art. What also tends to be forgotten is that the foundation for their knowledge originated with males (and females) who were non-whites and non-Europeans living long before the Renaissance. The dead white guys would have had an even better foundation for their work if people of various ethnic origins hadn’t managed to destroy the library at Alexandria (and a renowned female scholar). The cognitive bias that edits out non-European and non-male contributions to knowledge is also evident in the US and UK versions of the Core Knowledge sequence.

Core Knowledge sequence

Determining the content of the curriculum by the use of cultural references has some coherence, but cultural references don’t necessarily reflect the deep structure of knowledge. Daisy comments favourably on ED Hirsch’s Core Knowledge sequence (p.121). She observes that “The history curriculum is designed to be coherent and cumulative… pupils start in first grade studying the first American peoples, they progress up to the present day, which they reach in the eighth grade. World history runs alongside this, beginning with the Ancient Greeks and progressing to industrialism, the French revolution and Latin American independence movements.”

Hirsch’s Core Knowledge sequence might encompass considerably more factual knowledge than the English national curriculum, but the example Daisy cites clearly leaves some questions unanswered. How did the first American peoples get to America and why did they go there? Who lived in Europe (and other continents) before the Ancient Greeks and why are the Ancient Greeks important? Obviously the further back we go, the less reliable evidence there is, but we know enough about early history and pre-history to be able to develop a reasonably reliable overview of what happened. It’s an overview that clearly demonstrates that the natural environment often had a more significant role than human culture in shaping history. And one that shows that ‘dead white males’ are considerably less important than they appear if the curriculum is derived from cultural references originating in the English-speaking world. Similar caveats apply to the UK equivalent of the Core Knowledge sequence published by Civitas, the one that recommends children in year 1 being taught about the Glorious Revolution and the significance of Robert Walpole.

It’s worth noting that few of the advocates of curriculum content derived from cultural references are scientists; Willingham is, but his background is in human cognition, not chemistry, biology, geology or geography. I think there’s a real risk of overlooking the role that geographical features, climate, minerals, plants and animals have played in human history, and of developing a curriculum that’s so Anglo-centric and culturally focused it’s not going to equip students to tackle the very concrete problems the world is currently facing. Ironically, Daisy and others are recommending that students acquire a strongly socially-constructed body of knowledge, rather than a body of knowledge determined by what’s out there in the real world.

knowledge itself

Michael Young, quoted by Daisy, aptly sums up the difference:

Although we cannot deny the sociality of all forms of knowledge, certain forms of knowledge which I find useful to refer to as powerful knowledge and are often equated with ‘knowledge itself’, have properties that are emergent from and not wholly dependent on their social and historical origins.” (p.118)

Most knowledge domains are pretty firmly grounded in the real world, which means that the knowledge itself has a coherent structure reflecting the real world and therefore, as Michael Young points out, it has emergent properties of its own, regardless of how we perceive or construct it.

So what criteria should we use for the curriculum? Generally, academics and specialist teachers have a good grasp of the unifying principles of their field – the ‘knowledge itself’. So their input would be essential. But other groups have an interest in the curriculum; notably the communities who fund and benefit from the education system and those involved on a day-to-day basis – teachers, parents and students. 100% consensus on a criterion is unlikely, but the outcome might not be any worse than the constant tinkering with the curriculum by government over the past three decades.

why subjects?

‘Subjects’ are certainly a convenient way of arranging our knowledge and they do enable a focus on the deep structure of a specific knowledge domain. But the real world, from which we get our knowledge, isn’t divided neatly into subject areas, it’s an interconnected whole. ‘Subjects’ are facets of knowledge about a world that in reality is highly integrated and interconnected. The problem with teaching along traditional subject area lines is that students are very likely to end up with a fragmented view of how the real world functions, and to miss important connections. Any given subject area might be internally coherent, but there’s often no apparent connection between subject areas, so the curriculum as a whole just doesn’t make sense to students. How does history relate to chemistry or RE to geography? It’s difficult to tell while you are being educated along ‘subject’ lines.

Elsewhere I’ve suggested that what might make sense would be a chronological narrative spine for the curriculum. Learning about the Big Bang, the formation of galaxies, elements, minerals, the atmosphere and supercontinents through the origins of life to early human groups, hunter-gatherer migration, agricultural settlement, the development of cities and so on, makes sense of knowledge that would otherwise be fragmented. And it provides a unifying, overarching framework for any knowledge acquired in the future.

Adopting a chronological curriculum would mean an initial focus on sciences and physical geography; the humanities and the arts wouldn’t be relevant until later for obvious reasons. It wouldn’t preclude simultaneously studying languages, mathematics, music or PE of course – I’m not suggesting a chronological curriculum ‘first and only’ – but a chronological framework would make sense of the curriculum as a whole.

It could also bridge the gap between so-called ‘academic’ and ‘vocational’ subjects. In a consumer society, it’s easy to lose sight of the importance of knowledge about food, water, fuel and infrastructure. But someone has to have that knowledge and our survival and quality of life are dependent on how good their knowledge is and how well they apply it. An awareness of how the need for food, water and fuel has driven human history and how technological solutions have been developed to deal with problems might serve to narrow the academic/vocational divide in a way that results in communities having a better collective understanding of how the real world works.

the curriculum in context

I can understand why Daisy is unimpressed by the idea that skills can be learned in the absence of knowledge or that skills are generic and completely transferable across knowledge domains. You can’t get to the skills at the top of Bloom’s taxonomy by bypassing the foundation level – knowledge. Having said that, I think Daisy’s criteria for the curriculum overlook some important points.

First, although I agree that subjects provide a useful framework for teaching concepts, the real world isn’t neatly divided up into subject areas. Teaching as if it is means it’s not only students who are likely to get a fragmented view of the world, but newspaper columnists, authors and policy-makers might too – with potentially disastrous consequences for all of us. It doesn’t follow that students need to be taught skills that allegedly transfer across all subjects, but they do need to know how subject areas fit together.

Second, although we can never eliminate subjectivity from knowledge, we can minimise it. Most knowledge domains reflect the real world accurately enough for us to be able to put them to good, practical use on a day-to-day basis. It doesn’t follow that all knowledge consists of verified facts or that students will grasp the unifying principles of a knowledge domains by learning thousands of facts. Students need to learn about the deep structure of knowledge domains and how the evidence for the facts they encompass has been evaluated.

Lastly, cultural references are an inadequate criterion for determining the breadth of the curriculum. Cultural references form exactly the sort of socially constructed framework that critics of traditional subject areas complain about. Most knowledge domains are firmly grounded in the real world and the knowledge itself, despite its inherent subjectivity, provides a much more valid and reliable criterion for deciding what students should know that what people are writing about. Knowledge about cultural references might enable students to participate in what Michael Oakeshott called the ‘conversation of mankind’, but life doesn’t consist only of a conversation – at whatever level you understand the term. For most people, even in the developed world, life is just as much about survival and quality of life, and in order to optimise our chances of both, we need to know as much as possible about how the world functions, not just what a small group of people are saying about it.

In my next post, hopefully the final one about Seven Myths, I plan to summarise why I think it’s so important to understand what Daisy and those who support her model of educational reform are saying.

References

Peal, R (2014). Progressively Worse: The Burden of Bad Ideas in British Schools. Civitas.
Willingham, D (2009). Why don’t students like school?. Jossey-Bass.
Young, T (2014). Prisoners of the Blob. Civitas.

seven myths about education: facts and schemata

Knowledge occupies the bottom level of Bloom’s taxonomy of educational objectives. In the 1950s, Bloom and his colleagues would have known a good deal about the strategies teachers use to help students to acquire knowledge. What they couldn’t have known is how students formed their knowledge; how they extracted information from data and knowledge from information. At the time cognitive psychologists knew a fair amount about learning, but had only a hazy idea about how it all fitted together. The DIKW pyramid I referred to in the previous post explains how the bottom layer of Bloom’s taxonomy works – how students extract information and knowledge during learning. Anderson’s simple theory of cognition explains how people extract low-level information. More recent research at the knowledge and wisdom levels is beginning to shed light on Bloom’s higher-level skills, why people organise the same body of knowledge in different ways, and why they misunderstand and make mistakes.

Seven Myths about Education addresses the knowledge level of Bloom’s taxonomy. Daisy Christodoulou presents a model of cognition that she feels puts the higher-level skills in Bloom’s taxonomy firmly into context. Her model also forms the basis for a pedagogical approach and a structure for a curriculum, which I’ll discuss in another post. Facts are a core feature of Daisy’s model. I’ve mentioned previously that many disciplines find facts problematic because facts, by definition have to be valid (true), and it’s often difficult to determine their validity. In this post I want to focus instead on the information processing entailed in learning facts.

a simple theory of cognition

Having explained the concept of chunking and the relationship between working and long-term memory, Daisy introduces Anderson’s paper;

So when we commit facts to long-term memory, they actually become part of our thinking apparatus and have the ability to expand one of the biggest limitations of human cognition. Anderson puts it thus:

‘All that there is to intelligence is the simple accrual and tuning of many small units of knowledge that in total produce complex cognition. The whole is no more than the sum of its parts, but it has a lot of parts.’”

She then says “a lot is no exaggeration. Long-term memory is capable of storing thousands of facts, and when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20).

facts

This was one of the points where I began to lose track of Daisy’s argument. I think she’s saying this:

Anderson shows that low-level data can be chunked into a ‘unit of knowledge’ that is then treated as one item by WM – in effect increasing the capacity of WM. In the same way, thousands of memorised facts can be chunked into a more complex unit (a schema) that is then treated as one item by WM – this essentially bypasses the limitations of WM.

I think Daisy assumes that the principle Anderson found pertaining to low-level ‘units of knowledge’ applies to all units of knowledge at whatever level of abstraction. It doesn’t. Before considering why it doesn’t, it’s worth noting a problem with the use of the word ‘facts’ when describing data. Some researchers have equated data with ‘raw facts’. The difficulty with defining data as ‘facts’ is that by definition a fact has to be valid (true) and not all data is valid, as demonstrated by the GIGO (garbage-in-garbage-out) principle that bedevils computer data processing, and the human brain’s often flaky perception of sensory input. In addition, ‘facts’ are more complex than raw (unprocessed) data or raw (unprocessed) sensory input.

It’s clear from Daisy’s examples of facts that she isn’t referring to raw data or raw sensory input. Her examples include the date of the battle of Waterloo, key facts about numerous historical events and ‘all of the twelve times tables’. She makes it clear in the rest of the book that in order to understand such facts, students need prior knowledge. In terms of the DIKW hierarchy, Daisy’s ‘facts’ are at a higher level to Anderson’s ‘units of knowledge’ and are unlikely to be processed automatically and pre-consciously in the same way as Anderson’s units. To understand why, we need to take another look at Anderson’s units of knowledge and why chunking happens.

chunking revisited

Data that can be chunked easily have two key characteristics; they involve small amounts of information and the patterns within them are highly consistent. As I mentioned in the previous post, one of Anderson’s examples of chunking is the visual features of upper case H. As far as the brain is concerned, the two parallel vertical lines and linking horizontal line that make up the letter H don’t involve much information. Also, although fonts and handwriting vary, the core features of all the Hs the brain perceives are highly consistent. So the brain soon starts perceiving all Hs as the same thing, and chunks up the core features into a single unit – the letter H. If H could also be written Ĥ and Ħ in English, it would take a bit longer for the brain to chunk the three different configurations of lines and to learn the association between them, but not much longer, since the three variants involve little information and are still highly consistent.

understanding facts

But the letter H isn’t a fact, it’s a symbol. So are + and the numerals 1 and 2. ‘1+2’ isn’t a fact in the sense that Daisy uses the term, it’s a series of symbols. ‘1+2=3’ could be considered a fact because it consists of symbols representing two entities and the relationship between them. If you know what the symbols refer to, you can understand it. It could probably be chunked because it contains a small amount of information and has consistent visual features. Each multiplication fact in multiplication tables could probably be chunked, too, since they meet the same criteria. But that’s not true for all the facts that Daisy refers to, because they are more complex and less consistent.

‘The cat is on the mat’ is a fact, but in order to understand it, you need some prior knowledge about cats, mats and what ‘on’ means. These would be treated by working memory as different items. Most English-speaking 5 year-olds would understand the ‘cat is on the mat’ fact, but because there are different sorts of cats, different sorts of mats and different ways in which the cat could be on the mat, each child could have a different mental image of the cat on the mat. A particular child might conjure up a different mental image each time he or she encountered the fact, meaning that different sensory data were involved each time, the mental representations of the fact would be low in consistency, and the fact’s component parts couldn’t be chunked into a single unit in the same way as lower-level more consistent representations. Consequently the fact is less likely to be treated as one item in working memory.

Similarly, in order to understand a fact like ‘the battle of Waterloo was in 1815’ you’d need to know what a battle is, where Waterloo is (or at least that it’s a place), what 1815 means and how ‘of’ links a battle and a place name. If you’re learning about the Napoleonic wars, your perception of the battle is likely to keep changing and the components of the facts would have low consistency meaning that it couldn’t be chunked in the way Anderson describes.

The same problem involving inconsistency would prevent two or more facts being chunked into a single unit. But clearly people do mentally link facts and the components of facts. They do it using a schema, but not quite in the way Daisy describes.

schemata

Before discussing how people use schemata (schemas), a comment on the biological structures that enable us to form them. I mentioned in an earlier post that the neurons in the brain form complex networks a bit like the veins in a leaf. Physical connections are formed between neighbouring neurons when the neurons are activated simultaneously by incoming data. If the same or very similar data are encountered repeatedly, the same neurons are activated repeatedly, connections between them are strengthened, and eventually networks of neurons are formed that can carry a vast amount of information in their patterns of connections. The patterns of connections between the neurons represent the individual’s perception of the patterns in the data.

So if I see a cat on a mat, or read a sentence about a cat on a mat, or imagine a cat on a mat, my networks of neurons carrying information about cats and mats will be activated. Facts and concepts about cats, mats and things related to them will readily spring to mind. But I won’t have access to all of those facts and concepts at once. That would completely overload my working memory. Instead, what I recall is a stream of facts and concepts about cats and mats that takes time to access. It’s only a short time, but it doesn’t happen all at once. Also, some facts and concepts will be activated immediately and strongly and others will take longer and might be a bit hazy. In essence, a schema is a network of related facts and concepts, not a chunked ‘unit of knowledge’.

Daisy says “when we have memorised thousands of facts on a specific topic, these facts together form what is known as a schema” (p. 20). It doesn’t work quite like that, for several reasons.

the structure of a schema A schema is what it sounds like – a schematic plan or framework. It doesn’t consist of facts or concepts, but it’s a representation of how someone mentally arranges facts or concepts. In the same way the floor-plan of a building doesn’t consist of actual walls, doors and windows, but it does show you where those things are in the building in relation to each other. The importance of this apparently pedantic point will become clear when I discuss deep structure.

implicit and explicit schemata Schemata can be implicit – the brain organises facts and concepts in a particular way but we’re not aware of what it is – or explicit – we actively organise facts and concepts in a particular way and we aware of how they are organised.

the size of a schema Schemata can vary in size and complexity. The configuration of the three lines that make up the letter H is a schema, so is the way a doctor organises his or her knowledge about the human circulatory system. A schema doesn’t have to represent all the facts or concepts it links together. If it did, a schema involving thousands of facts would be so complex it wouldn’t be much help in showing how the facts were related. And in order to encompass all the different relationships between thousands of facts, a single schema for them would need to be very simple.

For example, a simple schema for chemistry would be that different chemicals are formed from different configurations of the sub-atomic ‘particles’ that make up atoms and configurations of atoms that form molecules. Thousands of facts can be fitted into that schema. In order to have a good understanding of chemistry, students would need to know about schemata other than just that simple one, and would need to know thousands of facts about chemistry before they would qualify as experts, but the simple schema plus a few examples would give them a basic understanding of what chemistry was about.

experts’ schemata Research into expertise (e.g. Chi et al, 1981) shows that experts don’t usually have one single schema for all the facts they know, but instead use different schemata for different aspects of their body of knowledge. Sometimes those schemata are explicitly linked, but sometimes they’re not. Sometimes they can’t be linked because no one knows how the linkage works yet.

chess experts

Daisy refers to research showing that expert chess players memorise thousands of different configurations of chess pieces (p.78). This is classic chunking; although in different chess sets specific pieces vary in appearance, their core visual features and the moves they can make are highly consistent, so frequently-encountered configurations of pieces are eventually treated by the brain as single units – the brain chunks the positions of the chess pieces in essentially the same way as it chunks letters into words.

De Groot’s work showed that chess experts initially identified the configurations of pieces that were possible as a next move, and then went through a process of eliminating the possibilities. The particular configuration of pieces on the board would activate several associated schemata involving possible next and subsequent moves.

So, each of the different configurations of chess pieces that are encountered so frequently they are chunked, has an underlying (simple) schema. Expert chess players then access more complex schemata for next and subsequent possible moves. Even if they have an underlying schema for chess as a whole, it doesn’t follow that they treat chess as a single unit or that they recall all possible configurations at once. Most people can reliably recognise thousands of faces and thousands of words and have schemata for organising them, but when thinking about faces or words, they don’t recall all faces or all words simultaneously. That would rapidly overload working memory.

Compared to most knowledge domains, chess is pretty simple. Chess expertise consists of memorising a large but limited number of configurations and having schemata that predict the likely outcomes from a selection of them. Because of the rules of chess, although lots of moves are possible, the possibilities are clearly defined and limited. Expertise in medicine, say, or history, is considerably more complex and less certain. A doctor might have many schemata for human biology; one for each of the skeletal, nervous, circulatory, respiratory and digestive systems, for cell metabolism, biochemistry and genetics etc. Not only is human biology more complex than chess, there’s also more uncertainty involved. Some of those schemata we’re pretty sure about, some we’re not so sure about and some we know very little about. There’s even more uncertainty involved in history. Evaluating evidence about how the human body works might be difficult, but the evidence itself is readily available in the form of human bodies. Historical evidence is often absent and likely to stay that way, which makes establishing facts and developing schemata more challenging.

To illustrate her point about schemata Daisy claims that learning a couple of key facts about 150 historical events from 3000BC to the present, will form “the fundamental chronological schema that is the basis of all historical understanding” (p.20). Chronological sequencing could certainly form a simple schema for history, but you don’t need to know about many events in order to grasp that principle – two or three would suffice. Again, although this simple schema would give students a basic understanding of what history was about, in order to have a good understanding of history, students would need to know not only thousands of facts, but to develop many schemata about how those facts were linked before they would qualify as experts. This brings us on to the deep structure of knowledge, the subject of the next post.

references
Chi, MTH, Feltovich, PJ & Glaser, R (1981). Categorisation and Representation of Physics Problems by Experts and Novices, Cognitive Science, 5, 121-152
de Groot, AD (1978). Thought in Chess. Mouton.

Edited for clarity 8/1/17.

seven myths about education: a knowledge framework

In Seven Myths about Education Daisy Christodoulou refers to Bloom’s taxonomy of educational objectives as a metaphor that leads to two false conclusions; that skills are separate from knowledge and that knowledge is ‘somehow less worthy and important’ (p.21). Bloom’s taxonomy was developed in the 1950s as a way of systematising what students need to do with their knowledge. At the time, quite a lot was known about what people did with knowledge because they usually process it actively and explicitly. Quite a lot less was known about how people acquire knowledge, because much of that process is implicit; students usually ‘just learned’ – or they didn’t. Daisy’s book focuses on how students acquire knowledge, but her framework is an implicit one; she doesn’t link up the various stages of acquiring knowledge in an explicit formal model like Bloom’s. Although I think Daisy makes some valid points about the educational orthodoxy, some features of her model lead to conclusions that are open to question. In this post, I compare the model of cognition that Daisy describes with an established framework for analysing knowledge with origins outside the education sector.

a framework for knowledge

Researchers from a variety of disciplines have proposed frameworks involving levels of abstraction in relation to how knowledge is acquired and organised. The frameworks are remarkably similar. Although there are differences of opinion about terminology and how knowledge is organised at higher levels, there’s general agreement that knowledge is processed along the lines of the catchily named DIKW pyramid – DIKW stands for data, information, knowledge and wisdom. The Wikipedia entry gives you a feel for the areas of agreement and disagreement involved. In the pyramid, each level except the data level involves the extraction of information from the level below. I’ll start at the bottom.



Data

As far as the brain is concerned, data don’t actually tell us anything except whether something is there or not. For computers, data are a series of 0s and 1s; for the brain data is largely in the form of sensory input – light, dark and colour, sounds, tactile sensations, etc.

Information
It’s only when we spot patterns within data that the data can tell us anything. Information consists of patterns that enable us to identify changes, identify connections and make predictions. For computers, information involves detecting patterns in all the 0s and 1s. For the brain it involves detecting patterns in sensory input.

Knowledge
Knowledge has proved more difficult to define, but involves the organisation of information.

Wisdom
Although several researchers have suggested that knowledge is also organised at a meta-level, this hasn’t been extensively explored.

The processes involved in the lower levels of the hierarchy – data and information – are well-established thanks to both computer modelling and brain research. We know a fair bit about the knowledge level largely due to work on how experts and novices think, but how people organise knowledge at a meta-level isn’t so clear.

The key concept in this framework is information. Used in this context, ‘information’ tells you whether something has changed or not, whether two things are the same or not, and identifies patterns. The DIKW hierarchy is sometimes summarised as; information is information about data, knowledge is information about information, and wisdom is information about knowledge.

a simple theory of complex cognition

Daisy begins her exploration of cognitive psychology with a quote by John Anderson, from his paper ACT: A simple theory of complex cognition (p.20). Anderson’s paper tackles the mystique often attached to human intelligence when compared to that of other species. He demonstrates that it isn’t as sophisticated or as complex as it appears, but is derived from a simple underlying principle. He goes on to explain how people extract information from data, deduce production rules and make predictions about commonly occurring patterns, which suggests that the more examples of particular data the brain perceives, the more quickly and accurately it learns. He demonstrates the principle using examples from visual recognition, mathematical problem solving and prediction of word endings.

natural learning

What Anderson describes is how human beings learn naturally; the way brains automatically process any information that happens to come their way unless something interferes with that process. It’s the principle we use to recognise and categorise faces, places and things. It’s the one we use when we learn to talk, solve problems and associate cause with effect. Scattergrams provide a good example of how we extract information from data in this way.

Scatterplot of longitudinal measurements of total brain volume for males (N=475 scans, shown in dark blue) and females (N=354 scans, shown in red).  From Lenroot et al (2007).

Scatterplot of longitudinal measurements of total brain volume for
males (N=475 scans, shown in dark blue) and females (N=354 scans,
shown in red). From Lenroot et al (2007).

Although the image consists of a mass of dots and lines in two colours, we can see at a glance that the different coloured dots and lines form two clusters.

Note that I’m not making the same distinction that Daisy makes between ‘natural’ and ‘not natural’ learning (p.36). Anderson is describing the way the brain learns, by default, when it encounters data. Daisy, in contrast, claims that we learn things like spoken language without visible effort because language is ‘natural’ whereas we need to be taught ‘formally and explicitly’, inventions like the alphabet and numbers. That distinction, although frequently made, isn’t necessarily a valid one. It’s based on an assumption that the brain has evolved mechanisms to process some types of data e.g. to recognise faces and understand speech, but can’t have had time to evolve mechanisms to process recent inventions like writing and mathematics. This assumption about brain hardwiring is a contentious one, and the evidence about how brains learn (including the work that’s developed from Anderson’s theory) makes it look increasingly likely that it’s wrong. If formal and explicit instruction are necessary in order to learn man-made skills like writing and mathematics, it begs the question of how these skills were invented in the first place, and Anderson would not have been able to use mathematical problem-solving and word prediction as his examples of the underlying mechanism of human learning. The theory that the brain is hardwired to process some types of information but not others, and the theory that the same mechanism processes all information, both explain how people appear to learn some things automatically and ‘naturally’. Which theory is right (or whether both are right) is still the subject of intense debate. I’ll return to the second theory later when I discuss schemata.

data, information and chunking

Chunking is a core concept in Daisy’s model of cognition. Chunking occurs when the brain links together several bits of data it encounters frequently and treats them as a single item – groups of letters that frequently co-occur are chunked into words. Anderson’s paper is about the information processing involved in chunking. One of his examples is how the brain chunks the three lines that make up an upper case H. Although Anderson doesn’t make an explicit distinction between data and information, in his examples the three lines would be categorised as data in the DIKW framework, as would be the curves and lines that make up numerals. When the brain figures out the production rule for the configuration of the lines in the letter H, it’s extracting information from the data – spotting a pattern. Because the pattern is highly consistent – H is almost always written using this configuration of lines – the brain can chunk the configuration of lines into the single unit we call the letter H. The letters A and Z also consist of three lines, but have different production rules for their configurations. Anderson shows that chunking can also occur at a slightly higher level; letters (already chunked) can be chunked again into words that are processed as single units, and numerals (already chunked) can be chunked into numbers to which production rules can be applied to solve problems. Again, chunking can take place because the patterns of letters in the words, and the patterns of numerals in Anderson’s mathematical problems are highly consistent. Anderson calls these chunked units and production rules ‘units of knowledge’. He doesn’t use the same nomenclature as the DIKW model, but it’s clear from his model that initial chunking occurs at the data level and further chunking can occur at the information level.

The brain chunks data and low-level units of information automatically; evidence for this comes from research showing that babies begin to identify and categorise objects using visual features and categorise speech sounds using auditory features by about the age of 9 months (Younger, 2003). Chunking also occurs pre-consciously (e.g. Lamme 2003); we know that people are often aware of changes to a chunked unit like a face, a landscape or a piece of music, but don’t know what has changed – someone has shaved off their moustache, a tree has been felled, the song is a cover version with different instrumentation. In addition, research into visual and auditory processing shows that sensory information initially feeds forward in the brain; a lot of processing occurs before the information reaches the location of working memory in the frontal lobes. So at this level, what we are talking about is an automatic, usually pre-conscious process that we use by default.

knowledge – the organisation of information

Anderson’s paper was written in 1995 – twenty years ago – at about the time the DIKW framework was first proposed, which explains why he doesn’t used the same terminology. He calls the chunked units and production rules ‘units of knowledge’ rather than ‘units of information’ because they are the fundamental low-level units from which higher-level knowledge is formed.

Although Anderson’s model of information processing for low-level units still holds true, what has puzzled researchers in the intervening couple of decades is why that process doesn’t scale up. The way people process low-level ‘units of knowledge’ is logical and rational enough to be accurately modelled using computer software, but when handling large amounts of information, such as the concepts involved in day-to-day life, or trying to comprehend, apply, analyse, synthesise or evaluate it, the human brain goes a bit haywire. People (including experts) exhibit a number of errors and biases in their thinking. These aren’t just occasional idiosyncrasies – everybody shows the same errors and biases to varying extents. Since complex information isn’t inherently different to simple information – there’s just more of it – researchers suspected that the errors and biases were due to the wiring of the brain. Work on judgement and decision-making and on the biological mechanisms involved in processing information at higher levels has demonstrated that brains are indeed wired up differently to computers. The reason is that what has shaped the evolution of the human brain isn’t the need to produce logical, rational solutions to problems, but the need to survive, and overall quick-and-dirty information processing tends to result in higher survival rates than slow, precise processing.

What this means is that Anderson’s information processing principle can be applied directly to low-level units of information, but might not be directly applicable to the way people process information at a higher-level, the way they process facts, for example. Facts are the subject of the next post.

References
Anderson, J (1996) ACT: A simple theory of complex cognition, American Psychologist, 51, 355-365.
Lamme, VAF (2003) Why visual attention and awareness are different, TRENDS in Cognitive Sciences, 7, 12-18.
Lenroot,RK, Gogtay, N, Greenstein, DK, Molloy, E, Wallace, GL, Clasen, LS, Blumenthal JD, Lerch,J, Zijdenbos, AP, Evans, AC, Thompson, PM & Giedd, JN (2007). Sexual dimorphism of brain developmental trajectories during childhood and adolescence. NeuroImage, 36, 1065–1073.
Younger, B (2003). Parsing objects into categories: Infants’ perception and use of correlated attributes. In Rakison & Oakes (eds.) Early Category and Concept development: Making sense of the blooming, buzzing confusion, Oxford University Press.