There’s nothing more satisfying than winning a staff-room or pub argument about what works and what doesn’t by quoting effect sizes. There they are, in black and white: stats that show you are right. Homework is rubbish, class sizes don’t matter, feedback is king. However, my concern is that effect sizes in the hands of uninitiated are like matches in the hands of toddlers. There are a few things to take into account before you swallow the evidence…
What are effect sizes and why are they useful?
Effect sizes are a way of showing the statistical significance of a data set. Put simply, they are useful because they show the difference in outcomes between a test group (which has been given a specific intervention) and a control group (which has had no change to their teaching). The reason they are useful is that they can show the size of the difference between two groups.
For example, imagine an experiment where two groups of 30 pupils had been taught using different strategies and showed a gap in test results of 10%. It seems logical to say that this intervention has been worthwhile. However, what if all 30 of the pupils who received the intervention scored above all the ones in the control group? This would make it even more significant, because it worked for everyone. On the contrary, if the 10% difference came from only 5 pupils who did incredibly well, but the other 25 were pretty much the same as the 30 in the control group, it wouldn’t seem so earth-shattering.
This is where effect sizes come in. If you want to get chapter and verse on how they’re calculated then read Rob Coe’s ‘It’s the Effect Size, Stupid!’ What the effect size does is takes into account the spread of results (the standard deviation). This means that you get a deeper understanding through the context of the results.
Unpacking effect sizes
However, there are a lot of teachers (and school leaders) who bang on about effect sizes as if they are the only thing that matters. According to John Hattie, anything with an effect size of 0.4 or above is meaningful, and above 0.6 has high impact. This has led to lots of stats being parroted in teacher discussions to show which strategies and interventions are the most effective and should therefore be followed without hesitation. Looking at lengthy tables like this would seem to spell out with great clarity what works and what doesn’t:
Sadly, it isn’t that simple. Within these stats lie substantial variation. The classic example is homework – quoted here at a modest 0.29. However as various blogs and authors have pointed out (most effectively Tom Sherrington in the Learning Rainforest), the picture changes when we look at primary homework (a paltry 0.15) and secondary (a game-changing 0.64). The older the pupil, the more valuable homework is. Hattie has also pointed out that research is based on what has been done up to now. It may be that primary homework can be set in the future which has greater efficacy, and could change the numbers up the way.
A shift in thinking
What interested me this week was listening to Ollie Lovell’s (@ollie_lovell) thoughts on Craig Barton’s podcast episode ‘A Slice of Advice’. He subsequently wrote this blog post in which he uses his interviews with Adrian Simpson and John Hattie to explain his thinking. What emerges is the problem that an effect size can vary depending on how the experiment is designed, rather than the impact of the intervention. Based on this, Ollie rejects the use of effect sizes and calls them a ‘category error’.
I’m intrigued by this and it represents a significant shift in thinking if we’re going to abandon effect sizes (and the ranking of them) as a meaningful way of evaluating strategies and interventions. I suspect this is a debate that is about to explode, and in all honesty I don’t yet know where I stand on it. I did think the most persuasive point Ollie makes is this:
“Has an effect size ever made me a better teacher? I honestly couldn’t think of an example that would enable me to answer ‘yes’ to this question.”
With that, I have to agree. I also give a lot of respect to Ollie for tackling this head on. He’s ventured into a field that few would have ever considered to question.
Baby and bathwater?
We all know that there are lies, damned lies and statistics. However, Hattie makes a good point: what else are we going to use? I’m not a researcher but a teacher (and not a Maths one at that too), so what I say next should be taken in that context. At present I kick against the trend of effect sizes being quoted haphazardly by teachers in debates as I usually find this reductive. However, until there is a better method of expressing the value of an intervention I think we need to train teachers to be able to reach the story behind the numbers (as Hattie says) or understand the mechanisms (as Simpson says). If the numbers are less reliable than we thought, imagine how much distortion in practice is caused by their blind application in classrooms. While the debate rages effect sizes will continue to be used, so let’s do so with more care than before. Remember, the effect size is the headline, not the article. You need to read on…