Follow the data!

Last week I was down in Wellington for the VUW NZCER NZAMT16 Mathematics & Statistics Education Research Symposium, as well as for the NZAMT16 teacher conference. It was a huge privilege to be one of the keynote speakers and my keynote focused on teaching data science at the school level. I used the example of following music data from the New Zealand Top 40 charts to explore what new ways of thinking about data our students would need to learn (I use “new” here to mean “not currently taught/emphasised”).

It was awesome to be back in Wellington, as not only did I complete a BMus/BSc double degree at Victoria University, I actually taught music at Hutt Valley High School (the venue for the conference) while I was training to become a high school teacher (in maths/stats and music). I didn’t talk much in my keynote about the relationship between music and data analysis, but I did describe my thoughts a few years ago (see below):

All music has some sort of structure sitting behind it, but the beauty of music is in the variation. When you learn music, you learn about key ideas and structures, but then you get to hear how these same key ideas and structures can be used to produce so many different-sounding works of art. This is how I think we need to help students learn statistics – minimal structure, optimal transfer, maximal experience. Imagine how boring it would be if students learning music only ever listened to Bach.

Due to some unforeseen factors, I ended up ZOOMing my slides from one laptop at the front of the hall to another laptop in the back room which was connected to the data projector. Since I was using ZOOM, I decided to record my talk. However, the recording is not super awesome due to not really thinking about the audio side of things (ironically). If you want to try watching the video, I’ve embedded it below:

You can also view the slides here: I’m not sure they make a whole lot of sense by themselves, so here’s a quick summary of some of what I talked about:

  • Currently, we pretty much choose data to match the type of analysis we want to teach, and then “back fit” the investigative problem to this analysis. This is not totally a bad thing, we do it in the hope that when students are out there in the real world, they think about all the analytical methods they’ve learned and choose the one that makes sense for the thing they don’t know and the data they have to learn from. But, there’s a whole lot of data out there that we don’t currently teach students about how to learn from, which comes from the computational world our students live in. If we “follow the data” that students are interacting with, what “new” ways of thinking will our students need to make sense of this data?
  • Album covers are a form of data, but how do we take something we can see visually and turn this into “data”. For the album covers I used from one week of 1975 and one week of 2019, we can see that the album covers from 1975 are not as bright and vibrant as those from 2019, similarly we can see that people’s faces feature more in the 1975 album covers. We could use the image data for each album cover, extract some overall measure of colour and use this to compare 1975 and 2019. But what measure should we use? What is luminosity, saturation, hue, etc.? How could we overfit a model to predict the year of an album cover by creating lots of super specific rules? What pre-trained models can we use for detecting faces? How are they developed? How well do they work? What’s this thing called a “confusion matrix”?
  • An intended theme across my talk was to compare what humans can do (and to start with this), with what we could try to get computers to do, and also to emphasise how important human thinking is. I showed a video of Joy Buolamwini talking about her Gender Shades project and algorithmic bias: and tried to emphasise that we can’t teach about fun things we can do with machine learning etc. without talking about bias, data ethics, data ownership, data privacy and data responsibility. In her video, Joy uses faces of members of parliament – did she need permission to use these people’s faces for her research project since they were already public on websites? What if our students start using photos of our faces for their data projects?
  • I played the song that was number one the week I was born (tragedy!) as a way to highlight the calendar feature of the nztop40 website – as long as you were born after 1975, you can look up your song too. Getting students to notice the URL and how it changes as you navigate a web page is a useful skill – in this case, if you navigate to different chart weeks, you can notice that the “chart id” number changes. We could “hack” the URL to get the chart data for different weeks of the years available. If the website terms and conditions allow us, we could also use “web scraping” to automate the collection of chart data from across a number of weeks. We could also set up a “scheduler” to copy the chart data as it appears each week. But then we need to think about what each row in our super data set represents and what visualisations might make sense to communicate trends, features, patterns etc. I gave an example of a visualisation of all the singles that reached number one during 2018, and we discussed things I had decided to do (e.g. reversing the y axis scale) and how the visualisation could be improved [data visualisation could be a whole talk in itself!!!]
  • There are common ways we analyse music – things like key signature, time signature, tempo (speed), genre/style, instrumentation etc. – but I used one that I thought would not be too hard to teach during the talk: whether a song is in the major or minor key. However, listening to music first was really just a fun “gateway” to learn more about how the Spotify API provides “audio features” about songs in its database, in particular supervised machine learning. According to Spotify, the Ed Sheeran song Beautiful people is in the minor key, but me and guitar chords published online think that it’s in the major key. What’s the lesson here? We can’t just take data that comes from a model as being the truth.
  • I also wanted to talk more about how songs make us feel, to extend thinking about the modality of the song (major = happy, minor = sad), to the lyrics used in the song as well. How can we take a set of lyrics for a song and analyse these in terms of overall sentiment – positive or negative? There’s lots of approaches, but a common one is to treat each word independently (“bag of words”) and to use a pre-existing lexicon. The slides show the different ways I introduce this type of analysis, but the important point is how common it is to transfer a model trained within one data context (for the bing lexicon, customer reviews online) and use it for a different data context (in this case, music lyrics). There might just be some issues with doing this though!
  • Overall, what I tried to do in this talk was not to showcase computer programming (coding) and mathematics, since often we make these things the “star attraction” in talks about data science education. The talk I gave was totally “powered by code” but do we need to start with code in our teaching? When I teach statistics, I don’t start with pulling out my calculator! We start with the data context. I wanted to give real examples of ways that I have engaged and supported all students to participate in learning data science: by focusing on what humans think, feel and see in the modern world first, then bringing in (new) ways of thinking statistically and computationally, and then teaching the new skills/knowledge needed to support this thinking.
  • We have an opportunity to introduce data science in a real and meaningful way at the school level, and we HAVE to do this in a way that allows ALL students to participate – not just those in enrichment/extension classes, coding clubs, and schools with access to flash technology and gadgets. While my focus is the senior levels (Years 11 to 13), the modern world of data gives so many opportunities for integrating statistical and computational thinking to learn from data across all levels. We need teachers who are confident with exploring and learning from modern data, and we need new pedagogical approaches that build on the effective ones crafted for statistics education. We need to introduce computational thinking and computer programming/coding (which are not the same things!) in ways that support and enrich statistical thinking.

If you are a NZ-based teacher, and you are interested in learning more about teaching data science, then please use the “sign-up” form at (the “password” is datascience4everyone). I’ll be sending out some emails soon, probably starting with learning more about APIs (for an API in action, check out ).

Spot the errors – final draft

In NZ, it's report comment writing time for teachers, which means for many statistics teachers not just the fun of writing reports but also the not-so-fun job of checking other teachers' comments for errors (I'm looking at you English department!) One of things we used to do every year as part of "report comment writing PD" was to look at different examples of report comments and identify as many errors as possible.

So in line with this kind of activity, for this post I've put together some examples of tasks and/or student responses that demonstrate some common misunderstandings for statistics, each followed by discussion partially informed by comments other teachers made on the earlier version of this post. Use the tabs at the left hand side of this post to move through each part.

For each of the examples:

  • Have a read of the task/student response
  • Identify the different misunderstandings demonstrated in the task/student response.
  • Try to prioritise the misunderstandings to decide on the ONE that is the most serious and needs addressing first.


The Coach of a soccer team ran a new training programme over the season. At the start of the season and at the end of the season the players in the soccer team had to complete different tests for their ball handling skills. One test was for how many times in a row each player could bounce a soccer ball on their head. You have been given the bounce data in the table below. Write a short report for the Coach of the soccer team about the effectiveness of their training programme.


Student report

Sorry Coach, but the new training programme did not improve each player’s ability to bounce balls on their head, as you can see in my graphs below. The median number of times in a row a ball is bounced on a head was 9 at the start of the season and 9 at the end of the season, so the players did not improve with this skill. The box for the end of the season is not shifted far enough to the right as the median of the end of season is not outside the box of the start of season. So you can't make a call that the numbers of times in a row a ball is bounced tends to be higher at the end of the season compared to the start of the season. There is no difference in how they performed in this ball handling skills test between the start of the season and the end of the season.


As part of a presentation last year, I tried to summarise what I think is important to consider when faced with anything requiring statistical thinking: What are our awesome messages?


I'm going to use these three principles for some of the discussion sections of this post:

  • It matters how much data you have and how you got that data
  • It matters what you are measuring and how you are measuring it
  • It matters that you are uncertain and that there is variation

When I initially published this post, I asked for teachers to submit anonymously what they thought was the biggest misunderstanding demonstrated by the example (the task or the student response). After a few days, I shared these comments unedited on this site so that teachers would be able to view and compare what had been written. This was motivated by a desire to demonstrate that we don't all see the same things in student writing or tasks and also to show the range of possible issues with the task or student response.

So what did teachers identify as the biggest misunderstandings?

  • That the nature of the data lends itself to a paired comparison, not a comparison of two independent groups
  • That the nature of the study was about an experiment and suggestive causality, not a sampling situation
  • That the student incorrectly applied sampling-to-population inference methods associated with box plots
  • That the design of the experiment was flawed as there was no random allocation or control groups used
  • That the words "no difference" were used rather than "I can't make a call" or "I can't tell"
  • That the student has made statements based on point estimates such as the medians without looking at the shape and variation of the data

These are all good points, and it is a difficult to identify which one is the most serious or has the highest priority to address with a student. That there are a number of issues with what was written by the student highlights why it is so important that we always consider how we are building on and building up the key ideas that underpin statistical thinking. For the remainder of this discussion, I have given examples of the kinds of questions I would want to ask students and the kind of thinking I would like students to demonstrate when considering how to write their response to this task.

It matters how much data you have and how you got this data

Why does it matter that this data was collected from a single soccer team over one season? Why does it matter that the data was not obtained through a random sampling method? Why does it matter that the design is of an experimental nature as there was an intervention, however, there was no control group for comparison and no controlling of related variables?

Desired student thinking: I can explore this data to uncover what it might suggest about the effectiveness of the training programme, but any suggestive inferences would be limited to this team only and would be weakened by the fact that the players may get better/worse over the season for other reasons. That is, I can not say that the training programme was the only reason that the players improved/worsened with their ball handling skills. I also could not say that the training programme would work for other players in other soccer teams.

It matters what you are measuring and how you are measuring it

Why does it matter that only one of the tests for ball handling skills was used in the analysis? Why does it matter that each player was measured twice - once at the beginning of the season and again at the end of the season? Why does it matter that the response variable is a numerical variable?

Desired student thinking: I need to consider whether bouncing a ball on your head is a good/best measure of ball handling skills, and should really explore the data from the other tests to assess the performance of the players. I need to measure the change in performance for each player by taking the the difference of their two test results, this is because the players would have different starting skills and what I want to know is if the training programme improved their performance from this starting point. Since the test data is numerical, I can use a dot plot and box plot to display the differences. [I could also used a link graph, two dot plots which show clearly how the test results are linked for each player between the before and after].

It matters that you are uncertain and there is variation

Why does it matter that each player has a different ability? Why does it matter that that you are using a summary measure like the mean or median? Why does it matter that you refer to statistics calculated from experimental data as estimates?

Desired student thinking: I need to think about the different sources of variation and how they could affect the data I am using. There is natural variation because each player is different has a different ability for the task, and when I use a summary statistic like a mean this I am trying to capture an overall measure of ability, based on the average, for all the players. But a summary measure like the mean won't capture how different each player is from each other in terms of ability. I need to be clearly communicate that I am uncertain and don't know the true value and that is why I will use the word estimate.

Student report

My investigative question was "I wonder if boys who ran the Auckland kids marathon in 2015 are faster than the girls who ran the same event?" My graphs of the times for the boys and girls who ran the Auckland kids marathon in 2015 are shown below:


The shapes of the distribution of times to run the event are pretty similar, in that they are both positively skewed. The boxes are pretty similar in size as well. But anyway, the main thing is I can't make a call that the boys ran the Auckland kids marathon faster than the girls in 2015, because the boxes for both groups overlap, and the median time for the boys to run the event is inside the box (the middle 50% of times) for the girls to run the event. So the times are too similar - there's no difference between how fast the boys and girls were. Plus the three slowest times were the boys, including one who took ages to run the event!

So what did teachers identify as the biggest misunderstandings?

  • That you have been given population data, so you can identify who ran faster on average (boys), you don't need to make an inference from a sample to a population
  • That even if you considered this sample data, the sample size is huge and so you can make a call with a smaller shift between the two samples
  • That even if the two samples are similar, you can't say there is no difference, even if it is too close to make a call
  • That the student does not understand their investigative question, or that their investigative question is incorrect

It is important that we have a clear idea/understanding of whether we are working with sample data or population data, or in other words, whether we are wanting students to engage with inferential reasoning (going beyond the data in front of them) or exploratory data analysis (specifically describing the data in front of them).

At higher levels we tend to be a bit more flexible and loose with the idea of a sample but with younger students it is important to keep things simple and clear. If this set of data did represent a sample from a population, what would that population be? At secondary school level, this should be easy to define, not tricky or messy.

Reviewing student responses is a common feature of good PLD. The goal is to understand more about what misunderstandings students have, so we can be aware of this when teaching and to check for these misunderstandings when using formative assessment. Coupled with his should be to review what statistics education research related to the misunderstanding has uncovered.

When I first posted these examples, I asked for teachers to anonymously submit their thoughts on what they thought was the BIGGEST misunderstanding. I was interested to how similar the responses would be - would we all see the same big issue?

I think one of the cool things about doing an activity like this is that we can assume that everyone sees the same thing we do when reading an assessment task or student work. Sometimes in group discussion you don’t get to hear these different perspectives because someone else says the “right” thing first.

So one way to use these activities as part of any PLD might be to use a similar approach. Ask teachers to choose what they think is the biggest problem and to submit this anonymously through an app like Socrative. Then share all the comments, compare them and discuss the similarities and differences in perspective.

How do we deal with outliers?

This post provides the notes for an Ignite presentation I ran at the Christchurch Mathematical Association (CMA) 2015 Statistics Day on dealing with outliers. If you are not familiar with an Ignite presentation, it is 20 slides that auto-advance every 15 seconds for a total presentation of five minutes!

I’ve finally decided to post my slides and notes from this presentation (nearly four years later) as it gave me an opportunity to try out a text-to-voice tool for creating videos. Play the recreation of my ignite talk below! It is actually less than five minutes as “robo” me does not waffle and does not need 15 seconds per slide 🙂