apprentice without a sorcerer

Cummings’ essay Some Thoughts on Education and Political Priorities highlights his admiration for experts, notably scientists, but this doesn’t prevent him making several classic novice errors. These errors, not surprisingly, lead Cummings to some conclusions contradicted by evidence he hasn’t considered. I’ve focused on four of them.

oversimplifying systems

Cummings knows that systems operate differently at different levels, and although all systems, as part of the physical world involve maths and physics, you can’t reduce all systems to maths and physics (p.18). But his preoccupation with maths and physics, and lack of attention to the higher levels of systems suggest he can’t resist doing just that. In his essay maths is mentioned 473 times (almost 2 mentions per page) and physics 179 times. Science gets 507 references and quantum 238. In contrast, the arts get 8 mentions and humanities 16. Ironically, given his emphasis on complex systems, Cummings seems determined to view complex knowledge domains like education, politics, the humanities and arts, only through the lenses of maths, physics and linear scales.

Cummings’ first degree is in history, but he knows a lot of scientific facts. How deep his understanding goes is another matter. He opens the section on a scientific approach to teaching practice with the famous ‘Cargo Cult’ speech in which Richard Feynman accused educational and psychological studies of mimicking the surface features of science but not applying the deep structure of the scientific method (p.70). Cumming’s criticism is well-founded; evidence has always influenced educational practice in the UK, but the level of rigour involved has varied considerably. Ironically, Cummings’ appeal to scientific evidence then itself sets off down the cargo-cult route.

misunderstanding key concepts: chunking vs schemata

Cummings claims “experts do better because they ‘chunk’ together lots of individual things in higher level concepts – networks of abstractions – which have a lot of compressed information and allow them to make sense of new information (experts can also use their networks to piece together things they have forgotten)” (p.71).

‘Chunking’ occurs when several distinct items of information are perceived and processed as one item. The research e.g. Miller (1956), De Groot (1965) and Anderson (1996), shows it happens automatically after groups of low-level (simple) items with strongly similar features have been encountered very frequently, e.g. Morse code, words, faces, chess positions. I’ve not seen any research that shows the same phenomenon happening with information that’s associated but complex and dissimilar. And Cummings doesn’t cite any.

Information that’s complex and dissimilar but frequently encountered together (e.g. Periodic Table, biological taxonomy, battle of Hastings) forms strong associations cognitively that are configured into a schema. What Cummings describes isn’t chunking; it’s the formation of a high level schema. Chunks are schemata, but not all schemata are chunks.

Cummings is right that experts abstract information to form high level schemata, but the information isn’t compressed as he claims. The abstractions are key features of aspects of the schema e.g. key features of transition metals, birds or invasions.  I can just about hold all the key features of birds in my working memory at once, but not at the same time as exceptions (e.g ostrich, penguin) or features of different bird species. The prototypical features make it easier to retrieve associated information, but it isn’t retrieved all at once. If I think about the key features of birds, many facts about birds and their features spring to mind, but they do so sequentially, not at the same time. The limitations of working memory still apply.

The distinction between chunking and schema formation is important because schemata play a big part in expertise e.g. Schank & Abelson (1977) and Rumelhart (1980). Despite their importance, Cummings refers to schemata only once, when he’s describing how his essay is structured (p.7). The omission is a significant one with implications for Cumming’s model of how experts structure their knowledge.

experts vs novices

Experts in a particular field derive their expertise from a body of knowledge that’s been found to be valid and reliable. They construct that knowledge into schemata, or mental models. New knowledge can then be incorporated into the schemata, which might then need to be configured differently. Sometimes experts disagree strongly, not about the content of their schemata, but about how the content is configured.

The ensuing debates can go on for decades. A classic example is the debate between those who think correlations between intelligence test scores indicate that intelligence is a ‘something’ that ‘really exists’, and those who think the assumption that there’s a ‘something’ called intelligence, shapes the choice of items in intelligence tests, so correlations should come as no surprise (see previous post). Another long-standing debate involves those who think universal patterns in the structure of language mean that language is hard-wired in the brain, versus others who think the patterns emerge from the way networks of neurons compute information.

Acquiring key information about an unfamiliar knowledge domain takes time and effort, and Cummings has obviously put in the hours. What’s more challenging is finding out how domain experts configure their knowledge – experts often take their schemata for granted and don’t make them explicit. Sometimes you need to ask directly (or be told) why knowledge is organized in a certain way, and if there are any crucial differences of opinion in the field.

Cummings doesn’t seem to have asked how experts structure their knowledge. Instead, he appears to have squeezed knowledge new to him (e.g. chunking) into his own pre-existing schema without checking whether his schema is right or wrong. Or, he’s adopted the first schema he’s agreed with (e.g. genes and IQ). He admits to basing his genes/IQ model largely on Robert Plomin’s Behavioural Genetics and talks by Stephen Hsu. He dismisses the controversies and takes Plomin and Hsu’s models for granted.

evaluating evidence

There are references to the scientific method in Cummings’ essay but they’re about data analysis, not the scientific method as such. A crucial step in the scientific method is evaluating evidence – analysing data for sure, but also testing hypotheses by weighing up the evidence for and against. This process isn’t about ‘balance’ – it’s about finding flaws in methods and reasoning in order to avoid confirmation bias.

But Cummings repeatedly accepts evidence in support of one thing or against another, without questioning it. I’d suggest he can’t question much of it because he doesn’t know enough about the field. Some that caught my eye are:

  • Assuming hunter-gatherers’ knowledge is “based on superstition (almost total ignorance of complex systems)” (p.1). Anthropology that might claim otherwise, is like other social sciences, summarily dismissed by Cummings.
  • Unsubstantiated claims such as “Aeronautics was confined to qualitative stories (like Icarus) until the 1880s when people started making careful observations and experiments about the principles of flight” (p.21). Da Vinci, Bacon, Montgolfiers, Caley? No mention.
  • Attributing European economic development between 14th and 19th centuries to ‘markets and science’ and omitting the role of the Reformation, French Revolution, or Enclosure Acts (p.108).
  • Uncritical acceptance of Smith’s and Hayek’s speculative claims about the benefits of markets (p.106).
  • Overlooking systems constraints on growth – in corn yields, computing power etc. (pp.46, 231-2). No mention of the ubiquitous sigmoid curve.
  • Overlooking the Club of Rome’s Limits to Growth when discussing shortage and innovation (p.112).
  • Emphasising the importance of complex systems with no mention of systems theory as such (e.g. Bertalanffy’s general systems theory).
  • Ignoring important debates about construct validity e.g. intelligence and personality (p.49).

not just wrong

People are often wrong about things and usually a few minor errors don’t matter. In Cummings’ case they matter a great deal, partly because he’s so influential, but also because even tiny errors can have huge consequences. I chose the example of chunking because Cummings’ interpretation of it has been disproportionately influential in recent English education policy.

Daisy Christodoulou in Seven Myths about Education (2014) takes the assumption about chunking a step further. She’s right that chunking low-level associations such as times tables allows us to ‘cheat’ the limitations of working memory, but wrong to assume (like Cummings) high-level schemata do the same. And flat-out wrong to claim “we can summon up the information from long-term memory to working memory without imposing a cognitive load.” (Christodoulou p.19, my emphasis). Her own example (23,322 x 42) contradicts her claim.

Christodoulou’s claim is based on Kirschner, Sweller & Clark’s 2006 paper ‘Why minimal guidance during instruction does not work’. The authors say; “The limitations of working memory only apply to new, yet to be learned information that has not been stored in long-term memory. New information such as new combinations of numbers or letters can only be stored for brief periods with severe limitations on the amount of such information that can be dealt with. In contrast, when dealing with previously learned information stored in long-term memory, these limitations disappear.” (Kirschner et al p.77).  The only evidence they cite is a 1995 review paper proposing an additional cognitive mechanism “long-term working memory”.

I have yet to read a proponent of Kirschner, Sweller & Clarke’s model discuss the well-known limitations of long-term memory, summarised here. Greg Ashman for example, following on from a useful summary of schemata, says;

One way of thinking about the role of long-term memory in solving problems or dealing with new information is that entire schema can be brought readily into working memory and manipulated as a single element alongside any new elements that we need to process. The normal limits imposed on working memory fall away almost entirely when dealing with schemas retrieved from long-term memory – a key idea of cognitive load theory. This illustrates both the power of having robust schemas in long-term memory and the effortlessness of deploying them; an effortlessness that fools so many of us into neglecting the critical role long-term memory plays in learning”.

Many with expertise as varied as English, history, physics or politics, have enthusiastically embraced findings from cognitive science that could improve the effectiveness of teaching. Or more accurately, they’ve embraced Kirschner, Sweller and Clarke’s model of memory and learning.  Some of the ‘cog sci’ enthusiasts have gone further. They’ve taken a handful of facts out of context, squeezed them into their own pre-existing schemata, and drawn conclusions that are at odds with the research. They’ve also assumed that if an expert in ‘cog sci’ makes a plausible claim it must be true, but haven’t evaluated the evidence cited by the expert – because they don’t have the relevant expertise; cognitive science is a knowledge domain unfamiliar to them.

Nevertheless objections to the Kirschner, Sweller and Clarke model are often dismissed as originating either in ideology or ignorance. Ironic, as despite emphasising the importance of knowledge, evidence and expertise, many of the proponents of ‘cog sci’ are patently novices selecting evidence to support a model that doesn’t stand up to scrutiny. Murray Gell-Man is right that we need people who can take a crude look at the whole of knowledge (p.5), but the crude look should be one informed by a good grasp of the domains in question.

In 1797, Goethe published a poem entitled Der Zauberlehrling (Sorcerer’s Apprentice). It was a popular work, and became even more popular in 1940 when animated as part of Disney’s Fantasia, with Mickey Mouse playing the part of the apprentice who started something he couldn’t stop. The moral of the story is that a little knowledge can be a dangerous thing. Cummings has been portrayed as a brilliant eccentric and/or an evil genius. I think he’s an apprentice without a sorcerer.

references

Anderson, J (1996) ACT: A simple theory of complex cognition, American Psychologist, 51, 355-365.

Christodoulou, D (2014).  Seven Myths about Education.  Routledge.

de Groot, A D (1965).  Thought and Choice in Chess.  Mouton.

Kirschner, PA, Sweller, J & Clark, RE (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching Educational Psychologist, 41, 75-86.

Miller, G (1956). The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information, Psychological Review, 63, 81-97.

Rumelhart, DE (1980). Schemata: the building blocks of cognition. In R.J. Spiro et al. (eds) Theoretical Issues in Reading Comprehension.  Lawrence Erlbaum: Hillsdale, NJ.

Schank, RC & Abelson, RP (1977). Scripts, Plans, Goals and Understanding: an Inquiry into Human Knowledge Structures.  Lawrence Erlbaum: Hillsdale, NJ.

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s