play or direct instruction in early years?

One of the challenges levelled at advocates of the importance of play for learning in the Early Years Foundation Stage (EYFS) has been the absence of solid evidence for its importance. Has anyone ever tested this theory? Where are the randomised controlled trials?

The assumption that play is an essential vehicle for learning is widespread and has for many years dominated approaches to teaching young children. But is it anything more than an assumption?  I can understand why critics have doubts.  After all, EY teachers tend to say “Of course play is important. Why would you question that?” rather than “Of course play is important (Smith & Jones, 1943; Blenkinsop & Tompkinson, 1972).”  I think there are two main reasons why EY teachers tend not to cite the research.

why don’t EY teachers cite the research?

First, the research about play is mainly from the child development literature rather than the educational literature. There’s a vast amount of it and it’s pretty robust, showing how children use play to learn how the world works: What does a ball do? How does water behave? What happens if…?  If children did not learn through play, much of the research would have been impossible.

Secondly, you can observe children learning through play. In front of your very eyes. A kid who can’t post all the bricks in the right holes at the beginning of a play session, can do so at the end. A child who doesn’t know how to draw a cat when they sit down with the crayons, can do so a few minutes later.

Play is so obviously the primary vehicle for learning used by young children, that a randomised controlled trial of the importance of play in learning would be about as ethical as one investigating the importance of food for growth, or the need to hear talk to develop speech.

what about play at school?

But critics have another question: Children can play at home – why waste time playing in school when they could use that time to learn something useful, like reading, writing or arithmetic? Advocates for learning through play often argue that a child has to be developmentally ‘ready’ before they can successfully engage in such tasks, and play facilitates that development ‘readiness’. By developmentally ‘ready’, they’re not necessarily referring to some hypothetical, questionable Piagetian ‘stages’, but whether the child has developed the capability to carry out the educational tasks. You wouldn’t expect a six month-old to walk – their leg muscles and sense of balance wouldn’t be sufficiently well developed. Nor would you expect the average 18 month-old to read – they wouldn’t have the necessary language skills.

Critics might point out that a better use of time would be to teach the tasks directly. “These are the shapes you need to know about.” “This is how you draw a cat.” Why not ‘just tell them’ rather than spend all that time playing?

There are two main reasons why play is a good vehicle for learning at the Early Years stage. One is that young children are highly motivated to play. Play involves a great deal of trial-and-error, an essential mechanism for learning in many contexts. The variable reinforcement that happens during trial-and-error play is strongly motivating for mammals, and human beings are no exception.

The other reason is during play, there is a great deal of incidental learning going on. When posting bricks children learn about manual dexterity as well as about colour, number, texture, materials, shapes and angles. Drawing involves learning about shape, colour, 2-D representation of 3-D objects, and again, manual dexterity. Approached as play, both activities could also expand a child’s vocabulary and enable them to learn how to co-operate, collaborate or compete with others. Play offers a high learning return for a small investment of time and resources.

why not ‘just tell them’?

But isn’t ‘just telling them’ a more efficient use of time?   Sue Cowley, a keen advocate of the importance of play in Early Years, recently tweeted a link to an article in Psychology Today by Peter Gray, a researcher at Boston College. It’s entitled “Early Academic Training Produces Long-Term Harm”.

This is a pretty dramatic claim, and for me it raised a red flag – or at least an amber one. I’ve read through several longitudinal studies about children’s long-term development and they all have one thing in common; they show that the impact of early experiences (good and bad) is often moderated by later life events. ‘Delinquents’ settle down and become respectable married men with families; children from exemplary middle class backgrounds get in with the wrong crowd in their teens and go off the rails; the improvements in academic achievement resulting from a language programme in kindergarten have all but disappeared by third grade. The findings set out in Gray’s review article didn’t square with the findings of other longitudinal studies. Also, review articles can sometimes skate over crucial points in the methods used in studies that call the conclusions into question.

what the data tell us

So I was somewhat sceptical about Dr Gray’s claims – until I read the references (at least, three of the references – I couldn’t access the second). The studies he cites compared outcomes from three types of pre-school programme; High/Scope, direct instruction (including the DISTAR programme), and a traditional nursery pre-school curriculum. Some of the findings weren’t directly related to long-term outcomes but caught my attention:

  • In first, second and third grades, school districts used retention in grade rather than special education services for children experiencing learning difficulties (Marcon).
  • Transition (in this case grade 3 to 4) was followed by a dip in children’s academic performance (Marcon).
  • Because of the time that had elapsed since the original interventions, there had been ample opportunity for methodological criticisms to be addressed and resolved (Schweinhart & Weikart).
  • Mothers’ educational level was a significant factor (as in other studies) (Schweinhart & Weikart).
  • Small numbers of teachers were involved, so individual teachers could have had a disproportionate influence (Schweinhart & Weikart).
  • The lack of cited evidence for Common Core State Standards (Carlsson-Page et al).

Essentially, the studies cited by Dr Gray found that educational approaches featuring a significant element of child-initiated learning result in better long-term outcomes overall (including high school graduation rates) than those featuring direct instruction. The reasons aren’t entirely clear. Peter Gray and some of the researchers suggested the home visits that were a feature of all the programmes might have played a significant role; if parents had bought-in to a programme’s ethos (likely if there were regular home visits from teachers), children expected to focus on academic achievement at school and at home might have fewer opportunities for early incidental learning about social interaction that could shape their behaviour in adulthood.

The research findings provided an unexpected answer to a question I have repeatedly asked of proponents of Engelmann’s DISTAR programme (featured in one of the studies) but to which I’ve never managed to get a clear answer; what outcomes were there from the programme over the long-term?  Initially, children who had followed direct instruction programmes performed significantly better in academic tests than those who hadn’t, but the gains disappeared after a few years, and the long-term outcomes included more years in special education, and later in significantly more felony arrests and assaults with dangerous weapons.

This wasn’t what I was expecting. What I was expecting was the pattern that emerged from the Abecedarian study; that academic gains after early intervention peter out after a few years, but that there are marginal long-term benefits. Transient and marginal improvements are not to be sniffed at. ‘Falling behind’ early on at school can have a devastating impact on a child’s self-esteem, and only a couple of young people choosing college rather than teenage parenthood or petty crime can make a big difference to a neighbourhood.

The most likely reason for the tail-off in academic performance is that the programme was discontinued, but the overall worse outcomes for the direct instruction children than for those in the control group are counterintuitive.  Of course it doesn’t follow that direct instruction caused the worse outcomes. The results of the interventions are presented at the group level; it would be necessary to look at the pathways followed by individuals to identify the causes for them dropping out of high school or getting arrested.


There’s no doubt that early direct instruction improves children’s academic performance in the short-term. That’s a desirable outcome, particularly for children who would otherwise ‘fall behind’. However, from these studies, direct instruction doesn’t appear to have the long-term impact sometimes claimed for it; that it will address the problem of ‘failing’ schools; that it will significantly reduce functional illiteracy; or that early intervention will eradicate the social problems that cause so much misery and perplex governments.  In fact, these studies suggest that direct instruction results in worse outcomes.  Hopefully, further research will tell us whether that is a valid finding, and if so why it happened.

I’ve just found a post by Greg Ashman drawing attention to a critique of the High/Scope studies.  Worth reading.  [edit 21/4/17]


Carlsson-Paige, N, McLaughlin, GB and Almon, JW. (2015).  “Reading Instruction in Kindergarten: Little to Gain and Much to Lose”.  Published online by the Alliance for Childhood.…

Gray, P. (2015). Early Academic Training Produces Long-Term Harm.  Psychology Today

Marcon, RA (2002). “Moving up the grades: Relationship between preschool model and later school success.” Early Childhood Research & Practice 4 (1).

Schweinhart, LJ and Weikart, DP (1997). “The High/Scope Pre- school Curriculum Comparison Study through age 23.” Early Childhood Research Quarterly, 12. pp. 117-143.

About the Author

You are reading

Freedom to Learn

Social Norms, Moral Judgments, and Irrational Parenting

From Chinese foot binding to today’s extreme constraints on children’s freedom.

Childrearing Beliefs Were Best Predictor of Trump Support

A poll with four weird questions helps explain Trump’s surprising victory.

A Frugal Man’s Guide to Happiness and Health

For me, the inexpensive ways to do things are also the healthiest and most fun.

Sue Cowley is a robust advocate of the importance of play in learning

behavioural optometry: pros and cons

MUSEC is Macquarie University’s Special Educational Centre. Since 2005 it has been issuing one-page briefings on various topics relevant to special education; a brilliant idea and very useful for busy teachers. One of the drawbacks of a one-page briefing is that if the topic is a complex one, there might be space for a simple explanation and a couple of references only. The briefings get round that problem, in part, by putting relevant references on a central website.

Behavioural optometry is based on the assumption that some behavioural issues (in the broadest sense) are due to problems with the way the eyes function. This could include anything from poor convergence (eyes don’t focus together) to variations in processing visual information in different coloured lights. The theory is a plausible one; visual dysfunction can cause considerable discomfort and can affect balance and co-ordination, for example.

Behavioural optometrists are sometimes consulted if children have problems with reading, because reading requires fine-grained visual (and auditory) discrimination, and even small variations in the development of the visual system can cause problems for young children. One of the reasons systematic synthetic phonics programmes are so effective in helping young children learn to decode text is because they train children in making fine-grained distinctions between graphemes (and between phonemes). But phonics programmes cannot address all visual (or auditory) processing anomalies, which is the point where behavioural optometrists often come in.

The MUSEC briefing on behavioural optometry (Issue 33) draws on two references; a 2011 report by the American Academy of Paediatrics (AAP), and a 2009 review paper by Brendan Barrett, a professor of visual development at Bradford University.  Aspects of the briefing perplexed me.  I felt it didn’t accurately reflect the conclusions of the two references because it:

  • doesn’t discriminate between treatments
  • overlooks the expertise of behavioural optometrists
  • equates lack of evidence for efficacy with inefficacy
  • assumes that what is true for a large population must be true for individuals
  • gives misleading advice to readers.

Discrimination between treatments

In its second paragraph the briefing lists three types of treatment used by behavioural optometrists; lenses and prisms, coloured lenses or overlays, and vision therapy. But from paragraph four onwards, no distinction is made between treatments – they are all referred to as ‘behavioural optometry’ and evidence (for all behavioural optometry treatments presumably) is said to be ‘singularly lacking’. Since lenses and prisms are used in what Barrett calls traditional optometry (p.5), this generalization is self-evidently inaccurate. Nor does it reflect Barrett’s conclusions. Although he highlights the scarcity of evidence and lack of support for some treatments, he also refers to treatments developed by behavioural optometrists being adopted in mainstream practice and to evidence that supports claims involving convergence insufficiency, yoked prisms, and vision rehabilitation after brain disease/injury.

Expertise of behavioural optometrists

The briefing also appears to overlook the fact that behavioural optometrists are actually optometrists – a protected title, in the UK at least. As such, they are qualified to make independent professional judgments about the treatment of their patients. As Barrett points out, some of the controversies over treatments involve complex theoretical and technical issues; behavioural optometry isn’t the equivalent of Brain Gym. But teachers are unlikely to know that if they only read the briefing and not the references.

Lack of evidence for efficacy

Both references cited by the MUSEC briefing are reviews commissioned by professional bodies. Clearly, the American Academy of Pediatrics, the College of Optometrists or MUSEC cannot endorse or advocate treatments for which there is little or no evidence of efficacy. But individual practitioners are not issuing policy statements, they are treating individual patients. If they are using treatments for which a robust evidence base is lacking, that’s unsatisfactory, but a weak evidence base doesn’t mean that there is no evidence for efficacy, nor that the treatments in question are ineffective. Setting up RCTs of treatments for complex issues like ‘learning difficulties’ is challenging, expensive and time-consuming. As a parent, I would far rather my child try treatments that had a weak evidence base but were recommended by experienced practitioners, than wait for the Cochrane reviewers to finish a task that could take decades.

Populations vs individuals

The briefing paper says that “there is clear consensus among reading scientists that visual perception difficulties are rarely critical in reading difficulties and that the problem is typically more to do with language, specifically phonological processing.

Although this statement is right about the consensus and the role of phonological processing, one can’t assume that what’s true at a population level is true for every individual. Take, for example, convergence insufficiency (one of the areas where Barrett found evidence to support behavioural optometrists’ claims). According to the AAP report, the prevalence of convergence insufficiency is somewhere between 0.3% and 5% of the population (p.832).   So the probability of any given child having convergence insufficiency is low, but in the UK it still could affect up to 500,000 children. Although the report found no evidence that convergence insufficiency causes problems with decoding, comprehension or school achievement, it points out that it ‘can interfere with the ability to concentrate on print for a prolonged period of time’.   So even though in theory convergence insufficiency could be contributing to the difficulties of a quarter of the UK’s reluctant readers, it isn’t screened for in standard eye tests.

Advice to readers

The briefing recommends visual assessment for problems with acuity and refractive or ‘similar’ problems, but that’s not what the AAP recommends. It says:

Children with suspected learning disabilities in whom a vision problem is suspected by the child, parents, physicians, or educators should be seen by an ophthalmologist who has experience with the assessment and treatment of children, because some of these children may also have a treatable visual problem that accompanies or contributes to their primary reading or learning dysfunction.” (p. 829)

In the UK, that would require considerable persistence on the part of the child, parent or educator, although physicians might have more success.

The briefing also suggests an alternative to behavioural optometry; ‘explicit instruction in the specific areas causing difficulty’. Quite how ‘explicit instruction’ would improve problems with eye tracking, visual processing speed, visual sequential memory, visual discrimination, visual motor integration, visual spatial skills and rapid naming, never mind attention or dyspraxia where the difficulty is often discovered because the child is unable to carry out explicit instructions, is unclear.


I’m not claiming that behavioural optometry ‘does help children with reading difficulties’ because I don’t know whether it does or not. But that appears to be the nub of the problem – in the absence of evidence nobody knows whether it does or not. Nor which treatments help, if any. As the AAP paper says “Although it is prudent to be skeptical, especially with regard to prematurely disseminated therapies, it is important to also remain openminded.” (p.836)

I also had problems with the MUSEC briefing’s reading of Barrett’s conclusions. Although I wouldn’t go so far as to say the briefing is wrong (except perhaps about the lenses, and I’m not sure what it means by ‘explicit instruction’), its take-home message, for me, was that behavioural optometrists lack competence, that visual problems are unlikely to play any part in developmental abnormalities, and that if there are visual problems they will be limited to acuity and refractive or ‘similar’ factors. That’s not the message I got from either of the papers cited by the briefing. Obviously, on one side of A4, the authors couldn’t have covered all the relevant issues, but I felt that what they included and omitted could give the wrong impression to anyone unfamiliar with the issues.


American Academy of Pediatrics (2011). Joint technical report – Learning disabilities, dyslexia, and vision. Pediatrics, 127, e818-e856.

Barrett, B.T. (2009). A critical evaluation of the evidence supporting the practice of behavioural vision therapy. Ophthalmic and Physiological Optics, 29, 4-25.

going round in circles

Central to the Tiger Teachers’ model of cognitive science is the concept of cognitive load. Cognitive load refers to the amount of material that working memory is handling at any one time. It’s a concept introduced by John Sweller, a researcher frequently cited by the Tiger Teachers. Cognitive load is an important concept for education because human working memory capacity is very limited – we can think about only a handful of items at the same time. If students’ cognitive load is too high, they won’t be able to solve problems or will fail to learn some material.

I’ve had concerns about the Tiger Teachers’ interpretations of concepts from cognitive science, and about how they apply those concepts to their own learning, but until recently I hadn’t paid much attention to the way their students were being taught. I had little information about it for a start, and if it ‘worked’ for a particular group of teachers and students, I saw no reason to question it.

increasing cognitive load

The Michaela Community School recently blogged about solving problems involving circle theorems. Vince Ulam, a mathematician and maths teacher*, took issue with the diagrammatic representations of the problems.

The diagrams of the circles and triangles are clearly not accurate; they don’t claim to be. In an ensuing Twitter discussion, opinion was divided over whether or not the accuracy of diagrams mattered. Some people thought it didn’t matter if the diagrams were intended only as a representation of an algebraic or arithmetic problem. One teacher thought inaccurate diagrams would ensure the students didn’t measure angles or guess them.

The problem with the diagrams is not that they are imprecise – few people would quibble over a sketch diagram representing an angle of 28° that was actually 32°. It’s that they are so inaccurate as to be misleading. For example, there’s an obtuse angle that clearly isn’t obtuse, an angle of 71° is more acute than one of 28°, and a couple of isosceles triangles are scalene. As Vince points out, this makes it impossible for students to determine anything by inspection – an important feature of trigonometry. Diagrams with this level of inaccuracy also have implications for cognitive load, something that the Tiger Teachers are, rightly, keen to minimise.

My introduction to trigonometry at school was what the Tiger Teachers would probably describe as ‘traditional’. A sketch diagram illustrating a trigonometry problem was acceptable, but was expected to present a reasonably accurate representation of the problem. A diagram of an isosceles triangle might not be to scale, but it should be an isosceles triangle. An obtuse angle should be an obtuse angle, and an angle of 28° should not be larger than one of 71°.

Personally, I found some of the inaccurate diagrams so inaccurate as to be quite disconcerting. After all those years of trigonometry, the shapes of isosceles triangles, obtuse angles, and the relative sizes of angles of ~30° or ~70°, are burned into my brain, as the Tiger Teachers would no doubt expect them to be. So seeing a scalene triangle masquerading as an isosceles, an acute angle claiming to be 99°, and angles of 28° and 71° trading places, set up a somewhat unnerving Necker shift. In each case my brain started flipping between two contradictory representations; what the diagram was telling me and what the numbers were telling me.

It was the Stroop effect but with lines and numbers rather than letters and colours; and the Stroop effect increases cognitive load.  Even students accustomed to isosceles triangles not always looking like isosceles triangles would experience an increased cognitive load whilst looking at these diagrams, because they’ll have to process two competing representations; what their brain is telling them about the diagram and what it’s telling them about the numbers.  I had similar misgivings about the ‘CUDDLES’ approach used to teach French at Michaela.

CUDDLES and cognitive load

The ‘traditional’ approach to teaching foreign languages is to start with a bunch of simple nouns, adjectives and verbs, do a lot of rehearsal, and work up from there; that approach keeps cognitive load low from the get-go.   The Michaela approach seems to be to start with some complex language and break it down in a quasi-mathematical fashion involving underlining some letters, dotting others and telling stories about words.

Not only do students need to learn the words, what they represent and how French speakers use them, they have to learn a good deal of material extraneous to the language itself. I can see how the extraneous material acts as a belt-and-braces approach to ‘securing’ knowledge, but it must increase cognitive load because the students have to think about that as well as the language.

The Tiger Teacher’s approach to teaching is intriguing, but I still can’t figure out the underlying rationale; it certainly isn’t about reducing cognitive load.  Why does the Tiger Teachers’ approach to teaching matter?  Because now Nick Gibb is signed up to it, it will probably become educational policy, regardless of the validity of the evidence.

Note:  I resisted the temptation to call this post ‘non angeli sed anguli’.

*Amended from ‘maths teacher’ –  Old Andrew correctly pointed out that this was an assumption on my part. Vince Ulam assures me my assumption was correct.  I guess he should know.

the debating society

One of my concerns about the model of knowledge promoted by the Tiger Teachers is that it hasn’t been subjected to sufficient scrutiny.   A couple of days ago on Twitter I said as much.  Jonathan Porter, a teacher at the Michaela Community School, thought my criticism unfair because the school has invited critique by publishing a book and hosting two debating days. Another teacher recommended watching the debate between Guy Claxton and Daisy Christodoulou Sir Ken is right: traditional education kills creativity. She said it may not address my concerns about theory. She was right, it didn’t. But it did suggest a constructive way to extend the Tiger Teachers’ model of knowledge.

the debate

Guy, speaking for the motion and defending Sir Ken Robinson’s views, highlights the importance of schools developing students’ creativity, and answers the question ‘what is creativity?’ by referring to the findings of an OECD study; that creativity emerges from six factors – curiosity, determination, imagination, discipline, craftsmanship and collaboration. Daisy, opposing the motion, says that although she and Guy agree on the importance of creativity and its definition, they differ over the methods used in schools to develop it.

Daisy says Guy’s model involves students learning to be creative by practising being creative, which doesn’t make sense. It’s a valid point. Guy says knowledge is a necessary but not sufficient condition for developing creativity; other factors are involved. Another valid point. Both Daisy and Guy debate the motion but they approach it from very different perspectives, so they don’t actually rigorously test each other’s arguments.

Daisy’s model of creativity is a bottom-up one. Her starting point is how people form their knowledge and how that develops into creativity. Guy’s model, in contrast, is a top-down one; he points out that creativity isn’t a single thing, but emerges from several factors. In this post, I propose that Daisy and Guy are using the same model of creativity, but because Daisy’s focus is on one part and Guy’s on another, their arguments shoot straight past each other, and that in isolation, both perspectives are problematic.

Creativity is a complex construct, as Guy points out. A problem with his perspective is that the factors he found to be associated with creativity are themselves complex constructs. How does ‘curiosity’ manifest itself? Is it the same in everyone or does it vary from person to person? Are there multiple component factors associated with curiosity too? Can we ask the same questions about ‘imagination’? Daisy, in contrast, claims a central role for knowledge and deliberate practice. A problem with Daisy’s perspective is, as I’ve pointed out elsewhere, that her model of knowledge peters out when it comes to the complex cognition Guy refers to. With bit more information, Daisy and Guy could have done some joined-up thinking.  To me, the two models look like the representation below, the grey words and arrows indicating concepts and connections referred to but not explained in detail.


cognition and expertise

If I’ve understood it correctly, Daisy’s model of creativity is essentially this: If knowledge is firmly embedded in long-term memory (LTM) via lots of deliberate practice and organised into schemas, it results in expertise. Experts can retrieve their knowledge from LTM instantly and can apply it flexibly. In short, creativity is a feature of expertise.

Daisy makes frequent references to research; what scientists think, half a century of research, what all the research has shown. She names names; Herb Simon, Anders Ericsson, Robert Bjork. She reports research showing that expert chess players, football players or musicians don’t practise whole games or entire musical works – they practise short sequences repeatedly until they’ve overlearned them. That’s what enables experts to be creative.

Daisy’s model of expertise is firmly rooted in an understanding of cognition that emerged from artificial intelligence (AI) research in the 1950s and 1960s. At the time, researchers were aware that human cognition was highly complex and often seemed illogical.  Computer science offered an opportunity to find out more; by manipulating the data and rules fed into a computer, researchers could test different models of cognition that might explain how experts thought.

It was no good researchers starting with the most complex illogical thinking – because it was complex and illogical. It made more sense to begin with some simpler examples, which is why the AI researchers chose chess, sport and music as domains to explore. Expertise in these domains looks pretty complex, but the complexity has obvious limits because chess, sport and music have clear, explicit rules. There are thousands of ways you can configure chess pieces or football players and a ball during a game, but you can’t configure them any-old-how because chess and football have rules. Similarly, a musician can play a piece of music in many different ways, but they can’t play it any-old-how because then it wouldn’t be the same piece of music.

In chess, sport and music, experts have almost complete knowledge, clear explicit rules, and comparatively low levels of uncertainty.   Expert geneticists, doctors, sociologists, politicians and historians, in contrast, often work with incomplete knowledge, many of the domain ‘rules’ are unknown, and uncertainty can be very high. In those circumstances, expertise  involves more than simply overlearning a great many facts and applying them flexibly.

Daisy is right that expertise and creativity emerge from deliberate practice of short sequences – for those who play chess, sport or music. Chess, soccer and Beethoven’s piano concerto No. 5 haven’t changed much since the current rules were agreed and are unlikely to change much in future. But domains like medicine, economics and history still periodically undergo seismic shifts in the way whole areas of the domains are structured, as new knowledge comes to light.

This is the point at which Daisy’s and Guy’s models of creativity could be joined up.  I’m not suggesting some woolly compromise between the two. What I am suggesting is that research that followed the early AI work offers the missing link.

I think the missing link is the schema.   Daisy mentions schemata (or schemas if you prefer) but only in terms of arranging historical events chronologically. Joe Kirby in Battle Hymn of the Tiger Teachers also recognises that there can be an underlying schema in the way students are taught.  But the Tiger Teachers don’t explore the idea of the schema in any detail.

schemas, schemata

A schema is the way people mentally organise their knowledge. Some schemata are standardised and widely used – such as the periodic table or multiplication tables. Others are shared by many people, but are a bit variable – such as the Linnaean taxonomy of living organisms or the right/left political divide. But because schemata are constructed from the knowledge and experience of the individual, some are quite idiosyncratic. Many teachers will be familiar with students all taught the same material in the same way, but developing rather different understandings of it.

There’s been a fair amount of research into schemata. The schema was first proposed as a psychological concept by Jean Piaget*. Frederic Bartlett carried out a series of experiments in the 1930s demonstrating that people use schemata, and in the heyday of AI the concept was explored further by, for example, David Rumelhart, Marvin Minsky and Robert Axelrod. It later extended into script theory (Roger Schank and Robert Abelson), and how people form prototypes and categories (e.g. Eleanor Rosch, George Lakoff). The schema might be the missing link between Daisy’s and Guy’s models of creativity, but both models stop before they get there. Here’s how the cognitive science research allows them to be joined up.

Last week I finally got round to reading Jerry Fodor’s book The Modularity of Mind, published in 1983. By that time, cognitive scientists had built up a substantial body of evidence related to cognitive architecture. Although the evidence itself was generally robust, what it was saying about the architecture was ambiguous. It appeared to indicate that cognitive processes were modular, with specific modules processing specific types of information e.g. visual or linguistic. It also indicated that some cognitive processes operated across the board, e.g. problem-solving or intelligence. The debate had tended to be rather polarised.  What Fodor proposed was that cognition isn’t a case of either-or, but of both-and; that perceptual and linguistic processing is modular, but higher-level, more complex cognition that draws on modular information, is global.   His prediction turned out to be pretty accurate, which is why Daisy’s and Guy’s models can be joined up.

Fodor was familiar enough with the evidence to know that he was very likely to be on the right track, but his model of cognition is a complex one, and he knew he could have been wrong about some bits of it. So he deliberately exposes his model to the criticism of cognitive scientists, philosophers and anyone else who cared to comment, because that’s how the scientific method works. A hypothesis is tested. People try to falsify it. If they can’t, then the hypothesis signposts a route worth exploring further. If they can, then researchers don’t need to waste any more time exploring a dead end.

joined-up thinking

Daisy’s model of creativity has emerged from a small sub-field of cognitive science – what AI researchers discovered about expertise in domains with clear, explicit rules. She doesn’t appear to see the need to explore schemata in detail because the schemata used in chess, sport and music are by definition highly codified and widely shared.  That’s why the AI researchers chose them.  The situation is different in the sciences, humanities and arts where schemata are of utmost importance, and differences between them can be the cause of significant conflict.  Guy’s model originates in a very different sub-field of cognitive science – the application of high-level cognitive processes to education. Schemata are a crucial component; although Guy doesn’t explore them in this debate, his previous work indicates he’s very familiar with the concept.

Since the 1950s, cognitive science has exploded into a vast research field, encompassing everything from the dyes used to stain brain tissue, through the statistical analysis of brain scans, to the errors and biases that affect judgement and decision-making by experts. Obviously it isn’t necessary to know everything about cognitive science before you can apply it to teaching, but if you’re proposing a particular model of cognition, having an overview of the field and inviting critique of the model would help avoid unnecessary errors and disagreements.  In this debate, I suggest schemata are noticeable by their absence.

*First use of schema as a psychological concept is widely attributed to Piaget, but I haven’t yet been able to find a reference.

The Tiger Teachers and cognitive science

Cognitive science is a key plank in the Tiger Teachers’ model of knowledge. If I’ve understood it properly the model looks something like this:

Cognitive science has discovered that working memory has limited capacity and duration, so pupils can’t process large amounts of novel information. If this information is secured in long-term memory via spaced, interleaved practice, students can recall it instantly whenever they need it, freeing up working memory for thinking.

What’s wrong with that? Nothing, as it stands. It’s what’s missing that’s the problem.

Subject knowledge

One of the Tiger Teachers’ beefs about the current education system is its emphasis on transferable skills. They point out that skills are not universally transferable, many are subject-specific, and in order to develop expertise in higher-level skills novices need a substantial amount of subject knowledge. Tiger Teachers’ pupils are expected to pay attention to experts (their teachers) and memorise a lot of facts before they can comprehend, apply, analyse, synthesise or evaluate. The model is broadly supported by cognitive science and the Tiger Teachers apply it rigorously to children. But not to themselves, it seems.

For most Tiger Teachers cognitive science will be an unfamiliar subject area. That makes them (like most of us) cognitive science novices. Obviously they don’t need to become experts in cognitive science to apply it to their educational practice, but they do need the key facts and concepts and a basic overview of the field. The overview is important because they need to know how the facts fit together and the limitations of how they can be applied.   But with a few honourable exceptions (Daisy Christodoulou, David Didau and Greg Ashman spring to mind – apologies if I’ve missed anyone out), many Tiger Teachers don’t appear to have even thought about acquiring expertise, key facts and concepts or an overview. As a consequence facts are misunderstood or overlooked, principles from other knowledge domains are applied inappropriately, and erroneous assumptions made about how science works. Here are some examples:

It’s a fact…

“Teachers’ brains work exactly the same way as pupils’” (p.177). No they don’t. Cognitive science (ironically) thinks that children’s brains begin by forming trillions of connections (synapses). Then through to early adulthood, synapses that aren’t used get pruned, which makes information processing more efficient. (There’s a good summary here.)  Pupils’ brains are as different to teachers’ brains as children’s bodies are different to adults’ bodies. Similarities don’t mean they’re identical.

Then there’s working memory. “As the cognitive scientist Daniel Willingham explains, we learn by transferring knowledge from the short-term memory to the long term memory” (p177). Well, kind of – if you assume that what Willingham explicitly describes as “just about the simplest model of the mind possible”  is an exhaustive model of memory. If you think that, you might conclude, wrongly, “the more knowledge we have in long-term memory, the more space we have in our working memory to process new information” (p.177). Or that “information cannot accumulate into long-term memory while working memory is being used” (p.36).

Long-term memory takes centre stage in the Tiger Teachers’ model of cognition. The only downside attributed to it is our tendency to forget things if we don’t revisit them (p.22). Other well-established characteristics of long-term memory – its unreliability, errors and biases – are simply overlooked, despite Daisy Christodoulou’s frequent citation of Daniel Kahneman whose work focused on those flaws.

With regard to transferable skills we’re told “cognitive scientist Herb Simon and his colleagues have cast doubt on the idea that there are any general or transferable cognitive skills” (p.17), when what they actually cast doubt on is the ideas that all skills are transferable or that none are.

The Michaela cognitive model is distinctly reductionist; “all there is to intelligence is the simple accrual and tuning of many small units of knowledge that in total produce complex cognition” (p.19). Then there’s “skills are simply just a composite of sequential knowledge – all skills can be broken down to irreducible pieces of knowledge” (p.161).

The statement about intelligence is a direct quote from John Anderson’s paper ‘A Simple Theory of Complex Cognition’ but Anderson isn’t credited, so you might not know he was talking about simple encodings of objects and transformations, and that by ‘intelligence’ he means how ants behave rather than IQ. I’ve looked at Daisy Christodoulou’s interpretation of Anderson’s model here.

The idea that intelligence and skills consist ‘simply just’ of units of knowledge ignores Anderson’s procedural rules and marginalises the role of the schema – the way people configure their knowledge. Joe Kirby mentions “procedural and substantive schemata” (p. 17), but seems to see them only in terms of how units of knowledge are configured for teaching purposes; “subject content knowledge is best organised into the most memorable schemata … chronological, cumulative schemata help pupils remember subject knowledge in the long term” (p.21). The concept of schemata as the way individuals, groups or entire academic disciplines configure their knowledge, that the same knowledge can be configured in different ways resulting in different meanings, or that configurations sometimes turn out to be profoundly wrong, doesn’t appear to feature in the Tiger Teachers’ model.

Skills: to transfer or not to transfer?

Tiger Teachers see higher-level skills as subject-specific. That hasn’t stopped them applying higher-level skills from one domain inappropriately to another. In her critique of Bloom’s taxonomy, Daisy Christodoulou describes it as a ‘metaphor’ for the relationship between knowledge and skills. She refers to two other metaphors; ED Hirsch’s scrambled egg and Joe Kirby’s double helix (Seven Myths p.21).  Daisy, Joe and ED teach English, and metaphors are an important feature in English literature. Scientists do use metaphors, but they use analogies more often, because in the natural world patterns often repeat themselves at different levels of abstraction. Daisy, Joe and ED are right to complain about Bloom’s taxonomy being used to justify divorcing skills from knowledge. And the taxonomy itself might be wrong or misleading.   But it is a taxonomy and it is based on an important scientific concept – levels of abstraction – so should be critiqued as such, not as if it were a device used by a novelist.

Not all evidence is equal

A major challenge for novices is what criteria they can use to decide whether or not factual information is valid. They can’t use their overview of a subject area if they don’t have one. They can’t weigh up one set of facts against another if they don’t know enough facts. So Tiger Teachers who are cognitive science novices have to fall back on the criteria ED Hirsch uses to evaluate psychology – the reputation of researchers and consensus. Those might be key criteria in evaluating English literature, but they’re secondary issues for scientific research, and for good reason.

Novices then have to figure out how to evaluate the reputation of researchers and consensus. The Tiger Teachers struggle with reputation. Daniel Willingham and Paul Kirschner are cited more frequently than Herb Simon, but with all due respect to Willingham and Kirschner, they’re not quite in the same league. Other key figures don’t get a mention.  When asked what was missing from the Tiger Teachers’ presentations at ResearchEd, I suggested, for starters, Baddeley and Hitch’s model of working memory. It’s been a dominant model for 40 years and has the rare distinction of being supported by later biological research. But it’s mentioned only in an endnote in Willingham’s Why Don’t Students Like School and in Daisy’s Seven Myths about Education. I recommended inviting Alan Baddeley to speak at ResearchEd – he’s a leading authority on memory after all.   One of the teachers said he’d never even heard of him. So why was that teacher doing a presentation on memory at a national education conference?

The Tiger Teachers also struggle with consensus. Joe Kirby emphasises the length of time an idea has been around and the number of studies that support it (pp.22-3), overlooking the fact that some ideas can dominate a field for decades, be supported by hundreds of studies and then turn out to be profoundly wrong; theories about how brains work are a case in point.   Scientific theory doesn’t rely on the quantity of supporting evidence; it relies on an evaluation of all relevant evidence – supporting and contradictory – and takes into account the quality of that evidence as well.  That’s why you need a substantial body of knowledge before you can evaluate it.

The big picture

For me, Battle Hymn painted a clearer picture of the Michaela Community School than I’d been able to put together from blog posts and visitors’ descriptions. It persuaded me that Michaela’s approach to behaviour management is about being explicit and consistent, rather than simply being ‘strict’. I think having a week’s induction for new students and staff (‘bootcamp’) is a great idea. A systematic, rigorous approach to knowledge is vital and learning by rote can be jolly useful. But for me, those positives were all undermined by the Tiger Teachers’ approach to their own knowledge.  Omitting key issues in discussions of Rousseau’s ideas, professional qualifications or the special circumstances of schools in coastal and rural areas, is one thing. Pontificating about cognitive science and then ignoring what it says is quite another.

I can understand why Tiger Teachers want to share concepts like the limited capacity of working memory and skills not being divorced from knowledge.  Those concepts make sense of problems and have transformed their teaching.  But for many Tiger Teachers, their knowledge of cognitive science appears to be based on a handful of poorly understood factoids acquired second or third hand from other teachers who don’t have a good grasp of the field either. Most teachers aren’t going to know much about cognitive science; but that’s why most teachers don’t do presentations about it at national conferences or go into print to share their flimsy knowledge about it.  Failing to acquire a substantial body of knowledge about cognitive science makes its comprehension, application, analysis, synthesis and evaluation impossible.  The Tiger Teachers’ disregard for principles they claim are crucial is inconsistent, disingenuous, likely to lead to significant problems, and sets a really bad example for pupils. The Tiger Teachers need to re-write some of the lyrics of their Battle Hymn.

The Tiger Teachers’ model of knowledge: what’s missing?

“If all else fails for Michaela at least we’re going to do a great line in radical evangelical street preachers.” Jonathan Porter, Head of Humanities at the Michaela Community School was referring to an impassioned speech from Katharine Birbalsingh, the school’s head teacher at the recent launch of their book, Battle Hymn of the Tiger Teachers: The Michaela Way.

Michaela Community School’s sometimes blistering critique of the English education system, coupled with its use of pedagogical methods abandoned by most schools decades ago, has drawn acclaim, criticism and condemnation. There’s a strong, shared narrative about the Michaela Way amongst the contributors to Battle Hymn. If I’ve understood it correctly, it goes like this:

There’s a crisis in the English education system due to progressive ideas that have dominated teacher training since the 1960s. Child-centred methods have undermined discipline. Poor behaviour and lack of respect makes it impossible for teachers to teach. Subject knowledge has been abandoned in favour of higher-level skills wrongly claimed to be transferable. The way to combat the decline is via strict discipline, teacher authority, a knowledge-based curriculum and didactic teaching.

Knowledge is power

“Knowledge is power” is the Michaela motto. Tiger Teachers are required to have extensive knowledge of their own subject area in order to teach their pupils. Pupils are considered to be novices and as such are expected to acquire a substantial amount of factual knowledge before they can develop higher-level subject-specific skills.

Given the emphasis on knowledge, you’d expect the Tiger Teachers to apply this model not only to their pupils, but to any subjects they are unfamiliar with.   But they don’t. It appears to apply only to what pupils are taught in school.

A couple of years ago at a ResearchEd conference, I queried some claims made about memory. I found myself being interrogated by three Tiger Teachers about what I thought was wrong with the model of memory presented. I said I didn’t think anything was wrong with it; the problem was what it missed out. There are other examples in Battle Hymn of missing key points. To illustrate, I’ve selected four. Here’s the first:


Rousseau is widely recognised as the originator of the progressive educational ideas so derided in the Michaela narrative.   If you were to rely on other Tiger Teachers for your information about Rousseau, you might picture him as a feckless Romantic philosopher who wandered the Alps fathering children whilst entertaining woolly, sentimental, unrealistic thoughts about their education.   You wouldn’t know that he argued in Émile, ou de L’Éducation not so much for the ‘inevitable goodness’ of children as Jonathan Porter claims (p.77), but that children (and adults) aren’t inherently bad – a view that flew in the face of the doctrine of original sin espoused by the Geneva Calvinism that Rousseau had rejected and the Catholicism he (temporarily) converted to soon after.

At the time, children were often expected to learn by rote factual information that was completely outside their experience, that was meaningless to them. Any resistance would have been seen as a sign of their fallen nature, rather than an understandable objection to a pointless exercise. Rousseau advocated that education work with nature, rather than against it. He claimed the natural world more accurately reflected the intentions of its Creator than the authoritarian, man-made religious institutions that exerted an extensive and often malign influence over people’s day-to-day lives.   Not surprisingly, Émile was promptly banned in Geneva and Paris.

Although Jonathan Porter alludes to the ‘Enlightenment project’ (p.77), he doesn’t mention Rousseau’s considerable influence in other spheres. The section of Émile that caused most consternation was entitled ‘The Creed of a Savoyard Priest’. It was the only part Voltaire thought worth publishing. In it, Rousseau tackles head-on Descartes’ proposition ‘I think, therefore I am’. He sets out the questions about perception, cognition, reasoning, consciousness, truth, free will and the existence of religions, that perplexed the thinkers of his day and that cognitive science has only recently begun to find answers to. I’m not defending Rousseau’s educational ideas, I think Voltaire’s description “a hodgepodge of a silly wet nurse in four volumes” isn’t far off the mark, but to draw valid conclusions from Rousseau’s ideas about education, you need to know why he was proposing them.

Battle Hymn isn’t a textbook or an academic treatise, so it would be unreasonable to expect it to tackle at length all the points it alludes to. But it is possible to signpost readers to relevant issues in a few words. There’s nothing technically wrong with the comments about Rousseau in Battle Hymn, or Robert Peal’s Progressively Worse (a core text for Tiger Teachers) or Daisy Christodoulou’s Seven Myths about Education (another core text); but what’s missed out could result in conclusions being drawn that aren’t supported by the evidence.

Teacher qualifications

Another example is teacher qualifications. Michaela teachers don’t think much of their initial teacher training (ITT); they claim it didn’t prepare them for the reality of teaching (p.167),  it indoctrinates teachers into a ‘single dogmatic orthodoxy’ (p.171), outcomes are poor (p.158), and CPD in schools is ‘more powerful’ (p.179). The conclusion is not that ITT needs a root-and-branch overhaul, but that it should be replaced with something else; in-school training or … no qualification at all. Sarah Clear says she’s “an unqualified teacher and proud” (p.166) and argues that although the PGCE might be a necessary precaution to prevent disaster, it doesn’t actually do that (p.179), so why bother with it?

Her view doesn’t quite square with Dani Quinn’s perspective on professional qualifications. Dani advocates competition in schools because there’s competition in the professional world. She says; “Like it or not, when it comes to performance, it is important to know who is the best” and cites surgeons and airline pilots as examples (p.133). But her comparison doesn’t quite hold water. Educational assessment tends to be norm-referenced (for reasons Daisy Christodoulou explores here) but assessments of professional performance are almost invariably criterion-referenced in order to safeguard minimum standards of technical knowledge and skill. But neither Dani nor Sarah mention norm-referenced and criterion-referenced assessment – which is odd, given Daisy Christodoulou’s involvement with Michaela. Again, there’s nothing technically wrong with what’s actually said about teacher qualifications; but the omission of relevant concepts increases the risk of reaching invalid conclusions.

Replicating Michaela

A third example is from the speech given by Katharine Birbalsingh at the book launch. It was triggered by this question: “How would you apply Michaela in primary? Could you replicate it in coastal areas or rural areas and how would that work?”

Katharine responds: “These are all systems and values that are universal. That could happen anywhere. Of course it could happen in a primary. I mean you just insist on higher standards with regard to the behaviour and you teach them didactically because everyone learns best when being taught didactically … You would do that with young children, you would do that with coastal children and you would do that with Yorkshire children. I don’t see why there would be a difference.” She then launches into her impassioned speech about teaching and its social consequences.

You could indeed apply Michaela’s systems, values, behavioural expectations and pedagogical approach anywhere. It doesn’t follow that you could apply them everywhere. Implicit in the question is whether the Michaela approach is scalable. It’s not clear whether Katharine misunderstood the question or answered the one she wanted to answer, but her response overlooks two important factors.

First, there’s parent/pupil choice. Brent might be one of the most deprived boroughs in the country, but it’s a deprived borough in a densely populated, prosperous city that has many schools and a good public transport system. If parents or pupils don’t like Michaela, they can go elsewhere. But in rural areas, for many pupils there’s only one accessible secondary school – there isn’t an elsewhere to go to.

Then there’s teacher recruitment. If you’re a bright young graduate, as most of the Michaela staff seem to be, the capital offers a vibrant social life and  a wide range of interesting career alternatives should you decide to quit teaching. In a rural area there wouldn’t be the same opportunities.  Where I live, in a small market town in a sparsely populated county, recruitment in public sector services has been an ongoing challenge for many years.

Coastal towns have unique problems because they are bounded on at least one side by the sea. This makes them liminal spaces, geographically, economically and socially. Many are characterised by low-skilled, low-paid, seasonal employment and social issues different to those of an inner city. For teachers, the ‘life’ bit of the work-life balance in London would be very different from what they could expect in out-of-season Hartlepool.

Of course there’s no reason in principle why a replica Michaela shouldn’t transform the educational and economic prospects of coastal or rural areas.   But in practice, parent/pupil choice and teacher recruitment would be challenges that by definition Michaela hasn’t had to face because it’s a classic ‘special case’.  And it’s not safe to generalise from special cases. Again, there’s nothing technically wrong with what Katharine said about replicating Michaela; it’s what she didn’t say that’s key.  The same is true for the Tiger Teachers’ model of cognitive science, the subject of the next post.


Birbalsingh, K (2016).  Battle Hymn of the Tiger Teachers: The Michaela Way.  John Catt Educational.

Christodoulou, D (2014).  Seven Myths about Education.  Routledge.

Peal, R (2014).  Progressively Worse: The Burden of Bad Ideas in British Schools.  Civitas.

Rousseau, J-J (1974/1762).  Émile.  JM Dent.

getting the PISA scores under control

The results of the OECD’s 2015 Programme for International Student Assessment (PISA) were published a couple of weeks ago. The PISA assessment has measured the performance of 15 year-olds in Reading, Maths and Science every three years since 2000. I got the impression that teachers and academics (at least those using social media) were interested mainly in various aspects of the analysis. The news media, in contrast, focussed on the rankings. So did the OECD and politicians according to the BBC website. Andreas Schleicher of the OECD mentions Singapore ‘getting further ahead’ and John King US Education Secretary referred to the US ‘losing ground’.

What they are talking about are some single-digit changes in scores of almost 500 points. Although the PISA analysis might be informative, the rankings tell us very little. No one will get promoted or relegated as a consequence of their position in the PISA league table. Education is not football. What educational performance measures do have in common with all other performance measures – from football to manufacturing – is that performance is an outcome of causal factors. Change the causal factors and the performance will change.

common causes vs special causes

Many factors impact on performance. Some fluctuations are inevitable because of the variation inherent in raw materials, climatic conditions, equipment, human beings etc. Other changes in performance occur because a key causal factor has changed significantly. The challenge is in figuring out whether fluctuations are due to variation inherent in the process, or whether they are due to a change in the process itself – referred to as common causes and special causes, respectively.

The difference between common causes and special causes is important because there’s no point spending time and effort investigating common causes. Your steel output might have suffered because of a batch of inferior iron ore, your team might have been relegated because two key players sustained injuries, or your PISA score might have fallen a couple of points  due to a flu epidemic just before the PISA tests. It’s impossible to prevent such eventualities and even if you could, some other variation would crop up instead. However, if performance has improved or deteriorated following a change in supplier, strategy or structure you’d want to know whether or not that special cause has had a real impact.

spotting the difference

This was the challenge facing Walter A Shewhart, a physicist, engineer and statistician working for the Western Electric Company in the 1920s. Shewhart figured out a way of representing variations in performance so that quality controllers could see at a glance whether the variation was due to common causes or special causes. The representation is generally known as a control chart. I thought it might be interesting to plot some PISA results as a control chart, to see if changes in scores represented a real change or whether they were the fluctuations you’d expect to see due to variation inherent in the process.

If I’ve understood Shewhart’s reasoning correctly, it goes like this: Even if you don’t change your process, fluctuations in performance will occur due to the many different factors that impact on the effectiveness of your process. In the case of the UK’s PISA scores, each year similar students have learned and been assessed on very similar material, so the process remains unchanged; what the PISA scores measure is student performance.   But student performance can be affected by a huge number of factors; health, family circumstances, teacher recruitment, changes to the curriculum a decade earlier etc.

For statistical purposes, the variation caused by those multiple factors can be treated as random. (It isn’t truly random, but for most intents and purposes can be treated as if it is.) This means that over time, UK scores will form a normal distribution – most will be close to the mean, a few will be higher and a few will be lower. And we know quite a bit about the features of normal distributions.

Shewhart came up with a formula for calculating the upper and lower limits of the variation you’d expect to see as a result of common causes. If a score falls outside those limits, it’s worth investigating because it probably indicates a special cause. If it doesn’t, it isn’t worth investigating, because it’s likely to be due to common causes rather than a change to the process. Shewhart’s method is also useful for finding out whether or not an intervention has made a real difference to performance.  Donald Wheeler, in Understanding Variation: The key to managing chaos, cites the story of a manager spotting a change in performance outside the control limits and discovering it was due to trucks being loaded differently without the supervisor’s knowledge.

getting the PISA scores under control

I found it surprisingly difficult, given the high profile of the PISA results, to track down historical data and I couldn’t access it via the PISA website – if anyone knows of an accessible source I’d be grateful. Same goes for any errors in my calculations.  I decided to use the UK’s overall scores for Mathematics as an example. In 2000 and 2003 the UK assessments didn’t meet the PISA criteria, so the 2000 score is open to question and the 2003 score was omitted from the tables.

I’ve followed the method set out in Donald Wheeler’s book, which is short, accessible and full of examples. At first glance the formulae might look a bit complicated, but the maths involved is very straightforward. Year 6s might enjoy applying it to previous years’ SATs results.

Step 1: Plot the scores and find the mean.

year 2000* 2003* 2006 2009 2012 2015 mean (Xbar§)
UK maths score 529 495 492 494 492 500.4

Table 1: UK maths scores 2000-2015

* In 2000 and 2003 the UK assessments didn’t meet the PISA criteria, so the 2000 score is open to question and the 2003 score was omitted from the results.

§  I was chuffed when I figured out how to type a bar over a letter (the symbol for mean) but it got lost in translation to the blog post.

pisa-fig-1Fig 1: UK Maths scores and mean score

Step 2: Find the moving range (mR) values and calculate the mean.

The moving range is the differences between consecutive scores, referred to as mR values.

year 2000 2003 2006 2009 2012 2015 mean

(R bar)

UK maths score 529 495 492 494 492
mR values 34 3 2 2 10.25

Table 2: moving range (mR values) 2000-2015

pisa-fig-2Fig 2: Differences between consecutive scores (mR values)

Step 3: Calculate the Upper Control Limit for the mR values (UCLR).

To do this we multiply the mean of the mR values (Rbar) by 3.27.

UCLR = 3.27 x Rbar = 3.27 x 10.25 = 33.52

pisa-fig-3Fig 3: Differences between scores (mR values) showing upper control limit (UCLR)

Step 4: Calculate the Upper Natural Process Limit (UNPL) for the individual scores using the formula UNPL = Xbar + (2.66 x Rbar )

UNPL = Xbar + (2.66 x Rbar ) = 500.4 + (2.66 x 10.25) = 500.4 + 27.27 = 527.67

Step 5: Calculate the Lower Natural Process Limit (LNPL) for the individual scores using the formula LNPL = Xbar – (2.66 x Rbar )

LNPL = Xbar – (2.66 x Rbar) = 500.4 – (2.66 x 10.25) = 500.4 – 27.27 = 473.13

We can now plot the UK’s Maths scores showing the upper and lower natural process limits – the limits of the variation you’d expect to see as a result of common causes.

pisa-fig-4Fig 4: UK Maths scores showing upper and lower natural process limits

What Fig 4 shows is that the UK’s 2000 Maths score falls just outside the upper natural process limit, so even if the OECD hadn’t told us it was an anomalous result, we’d know that something different happened to the process in that year. You might think this is pretty obvious because there’s such a big difference between the 2000 score and all the others. But what if the score had been just a bit lower?  I put in some other numbers:

score  Xbar  Rbar UCLR UNPL LNPL
529 (actual) 500.4 10.25 33.52 527.67 473.13
520 498.6 8 26.16 519.88 477.32
510 496.6 5.5 17.99 511.23 481.97
500 494.6 3 9.81 502.58 486.62

Table 3: outcomes of alternative scores for year 2000

Table 3 shows if the score had been 520, it would still have been outside the natural process limits, but a score of 510 would have been within them.

pisa-fig-5 Fig 5: UK Maths scores showing upper and lower natural process limits for a year 2000 score of 510

ups, downs and targets

The ups and downs of test results are often viewed as more important than they really are; up two points good, down two points bad – even though a two-point fluctuation might be due to random variation.

The process control model has significant implications for target-setting too. Want to improve your score?  Then you need to work harder or smarter. Never mind the fact that students and teachers can work their socks off only to find that their performance is undermined by a crisis in recruiting maths teachers or a whole swathe of schools converting to academies. Working harder or smarter but ignoring natural variation supports what’s been called Ackoff’s proposition – that “almost every problem confronting our society is a result of the fact that our public policy makers are doing the wrong things and are trying to do them righter”.

To get tough on PISA scores we need to get tough on the causes of PISA scores.


Wheeler, DJ (1993).  Understanding variation: The key to managing chaos.  SPC Press Inc, Knoxville, Tennessee.