Which one doesn’t belong …. for stats?

If you haven’t heard of the activity Which one doesn’t belong? (WODB), it involves showing students four “things” and asking them to describe/argue which one doesn’t belong. There are heaps of examples of Which one doesn’t belong? in action for math(s) on the web, Twitter, and even in a book. From what I’ve seen, for math(s) I think the activity is pretty cool. In terms of whether WODB works for stats, however, I’m not so sure. Perhaps for definitions, facts, static pieces of knowledge it could work (?), but in terms of making comparisons involving data and its various representations (including graphs/displays), I need more convincing. There’s something different between comparing properties of shapes (for example), which remain fixed, and comparing data about something/someone, which could vary.

For example, What cat doesn’t belong? for the four “stats cats” data cards shown below.

To make comparisons between the four cats means to reason with data, but if I am considering only the data provided in these four data cards then these comparisons are made without uncertainty. For example, I can say definitively, for these four cats, that:

  • Elliot is the only cat with a name that has three syllables,
  • Molly is the only female cat,
  • Joey is the only cat is both an inside and outside cat,
  • Classic is the only cat that uses a cat door.

I could argue many different cases for which cat (or photo) does not belong. This is all cool, but doesn’t feel like statistics to me. Statistics is all about using data to make decisions in the face of uncertainty, by appreciating different sources of variation and considering how to deal with these. In particular, inferential reasoning involves going beyond the data at hand, thinking about generalisability, considering the quality and quantity of data available, and appreciating/communicating the possibility of being wrong not matter how “right” the methodology.

So while I appreciate that WODB allows for “not just one correct answer” and the development of argumentation skills, I’d be more happier if this kind of activity within statistics teaching led to the posing of statistical investigative questions (SIQ): WODB->SIQ. Why? We need more data and more of an idea of where the data came from to really answer the really interesting questions that comparing these four cats might provoke us to consider. We need students to feel the uncertainty that comes from thinking and reasoning statistically and to help students find ways to deal with this uncertainty. We also need students to care about the questions being asked of the data – my worry here is that otherwise the question students might ask when using WODB is Who cares which one doesn’t belong? 🙂

Questions I have when looking at these stats cats data cards, which are interesting to me are: I wonder …. How many syllables do cats’ names have? Do most cats have two syllable names? Is Elliot (my cat!) an unusual name for this reason? Do I spend too much on cat food ($NZD30 per week)? Or maybe black cats are more expensive to feed? I won’t be able to get definitive answers to these questions, but by collecting more data and investigating these questions using statistical methods I can get a better understanding of what could be plausible answers.

PS Want some of these data cards? Head here –> It’s raining cats and dogs (hopefully)

New tool for statistics students – the experiment lab!


This post is about a new tool that I’ve developed that allows students to play and create different versions of an online game.

The context – online game design (or…… How can we design an app to make us millions?)

I’ve embedded the online game within this post so you can have a go but you would send students to www.mathstatic.co.nz/experiment-lab or www.mathstatic.co.nz/experiment-lab/alphabet-challenge to play the game etc.

The basic idea behind developing these games is that you need to award points somehow, you need different levels of difficulty for the game and you need a way for the game to end ……. oh and you need to make it addictive! For this particular game, there is no points system and no way to move players up through levels. This (hopefully) gives a rich context to explore lots of statistics ideas 🙂

When thinking about how to use the game and the experiment lab in teaching statistics, a really important teacher understanding is that the data created from playing the game exists initially within an “experimenting world”, not a “sampling world”. So, unless you are going to use actual random samples from defined populations (e.g. a random sample of students from your school), then this game/data should not be used directly for teaching sample-to-population inference or for exploring sampling variation.

But you can definitely explore other sources of variation with this game, as well as important ideas around study design e.g. defining and measuring variables, carrying out an experiment etc. To get the “ball rolling” I’ll start with a few ideas, which are in no particular order and written as independent lessons/activities …………

How do you measure performance in the game?

The original version of the game only records the time to complete the game, so you could base your teaching around getting students to define or create other measures of performance. In the experiment lab, they can choose to measure other aspects of performance. Other measures could also be developed and measured by students e.g. how “stressed” a player felt during the game, whether they “rage quit” or not. If you do come up with other measures that you would like to add to the game, let me know and I’ll try to include them.

Here’s how I came up with the performance measures in the experiment lab…….

Originally I started with the time in seconds to “connect the dots” as the performance measure. I realised quickly that a second is actually quite a long time and that this was not accurate enough. Playing the game initially with the time measured to the nearest second could be used to do some teaching around measurement error e.g. if two players both get 4 seconds as the time recorded for completing the game, does this mean they both finished at exactly the same time? In the experiment lab, students have the option to round the time measurements to 0, 1, 2, or 3 decimal places.

As I played with different words I noticed that it took me longer to start the game when there were more letters to have search through to find the letter I needed, so I added a measure for how long it takes for the first click. I noticed as I was playing the game that often I missed the dot when trying to click it, so added the number of clicks as a measure. I also wanted to compare the number clicks to the number of letters to see how many times I missed, so I added the number of letters as a measure.

I was also interested in whether some games were easier/harder to play because the dots started/stayed closer together (the game randomly decides where to place the dots at the beginning of the game, more about this later….), so I added a measure for the distance between each click when a dot was hit (measured in pixels). I then noticed that in games when I was really struggling I was moving the mouse a lot around as I tried to “track” the dots, so I added a measure for the distance the mouse is moved during the game (measured in pixels).

In terms of the accuracy of the times recorded, there will be a very slight delay between the actual click and the time recorded, while the code behind the game does its calculations. I can’t notice the delay when I play, but it could be that for different devices the delay is longer (or shorter). The game plays on the “client side” so there is no need to consider internet speed or server load.

Here’s another version of the game that shows you all the current performance measures and with the time rounded to two decimal places – have a go!

Can you get better at the game?

The original version of the game will give each student a starting word for the game, randomly selected from a list of approved words. Get the students to play the game once and to write down the time. Then get them to play the game again with the same word and compare this time to their first time. Depending on the settings you have used, you might need to change the rounding of the time measurement for students to be able to see a difference (or incorporate this as part of the teaching – the need to measure time more accurately than seconds).

Get the students to keep playing the game with the same word so they can notice different kinds of variation: variation in their own times, potential decrease of times with practice, variation in how the game positions the letters of the word and how the letters move during the game. Then get the students to try out another word (click new word), again repeating the game with this new word multiple times. For example, here are my times for repeating the same word four times for four different words, represented on a line graph:


A reason why the “practice effect” is not very strong when playing this game is because even with the same word, each game is different because the starting positions and angles for each letter differ (assigned using a random process).  I’ve played the game many times, and generally my time decreases between the first and second attempts of the same word for shorter words, but then the time goes back up again for subsequent attempts. I haven’t noticed much of a difference for the longer words for the times to complete the game. The weak/non-existent practice effect for longer words is helpful for exploring paired comparison experimental designs.

Most games involve randomness in the game design, so the next question could be whether the game is fair if each game is different? Does the scoring system (if it existed) need to take into account how the letters were placed at the start of the game? Or does it all just balance out?

How can you make different levels of the game?

In the experiment lab, there are lots of ways to modify the game. If you think about a simple “comparison of two independent groups” design, then students can learn as much from two versions of the game that give “different” results than two versions that give “similar” results. I’ve played the game many times during its development and there are just so many comparisons to be explored, but here are a few ideas. Of course, everything else for the game would be kept the same (same word, same dot size, speed, same instructions etc.) except for the treatment described below:

  • Word displayed in order vs reverse order
  • No movement vs slow movement
  • Bright red coloured background vs light grey coloured background
  • One word vs another word

Some of these comparisons are also related to what you are measuring for the performance e.g. I wouldn’t expect to see much of a difference between the time to first click for words displayed in order vs reverse order. This is a good thing, as students should be justifying their selection of variables 🙂 Not all words will necessary suit the type of experiment they want to conduct, which they should learn from trying out the experiment on themselves first.

It is also possible to create very different versions of the games, but the player needs to be taken into account. For example, using small dots with the highest speed will lead to rage quit! [well, if there was a quit button – the stop button is all that is on offer!] We want the game to be addictive – challenging enough for players to keep interested but not so difficult that they just give up.

How can you award points for playing the game?

You could think about awarding points for the game in terms of rewarding good playing (skill) and punishing bad playing (no skill). This links to the earlier question about how to measure performance. To demonstrate bad playing, you could set up a version of the game so that the dots are not visible (using the same colour of the dots as the background colour and letter colour), so students have to guess where to click to select the correct dot. Give it a go below:

Did you give up? It took me 84 clicks 🙂 Clearly skill is involved when you play the game and “random” clicking is probably not going to help you click the correct dot faster. So it could be that the points are initially awarded based on the time to complete the game, but points are then taken off for extra clicks. To award points based on the time to complete the game needs a way to “baseline” the game to take into account different words. So the students will need to explore relationships between the performance measures e.g. the number of letters of the word vs the time to complete the activity.

Also, if two performance measures appear to have a relationship, than it might not make sense to use both variables to determine points (e.g. time to first click and time to complete the game).  Each player brings their own level of skill and practice with the game, but can we use two of the performance measures to find out more about how people play the game? I think there are a lot of good reasons to explore the relationships between the performance measures…..

More about the game…..

This is the first stage of an online game I’m developing, built off code from this website with my own additions and improvements. There are lots of similar games out there involving balls – here are just a couple:

If you (or your students) are interested in the design of the game itself, just look at the page source of this page to see the code. There is a fair bit of logic and mathematics involved (e.g. trigonometry, co-ordinate geometry, measurement). I’ve set the words used for the game to a fixed “approved” list so students can’t create games with dodgy words, even if they try to hack the link used to access a different version of the game 🙂


Experiment lab for students on mathstatic.co.nz

Alphabet challenge on mathstatic.co.nz

Link to just the game with default settings

Probability teaching ideas using simulation


This post provides some teaching examples for using an online probability simulation tool. It’s a supplement to the workshop I offered for the NZAMT 2015 conference.

Probability simulation tool

I recently developed  a very basic online probability simulation tool . I wanted a simulation tool that would run online without using applets or flash (tablet compatible). I also wanted to be able to animate repeated simulations in a loop – in the past to get this effect, I had to either make animated GIFs or set up slides in Powerpoint to transition automatically. I did a quick search for online simulation tools and couldn’t find what I wanted so I adapted some code I had written previously to get what I wanted.

An example of an animated looped simulation from the probability simulation tool

It’s very much designed “fit for a specific purpose” (more about that in the part 2) so I know it has lots of limitations 🙂 But what I like about the feature being demonstrated above is that it will keep running automatically, freeing me up to ask the students questions about what they are seeing and why they are seeing this.

Small samples – lots of variation

One of the activities I presented in the workshop involved teachers trying to work out who my siblings were based on photos. I presented five sets of four photos. Within each set, one photo was of one of my siblings, the rest were photos of other non-related people. In the workshop there around 30 teachers present.  The basic idea (with lots of assumptions) is that distribution for the number of correct selections IF teachers were guessing can be modelled by a binomial distribution with n = 5 and p = 0.25.

After “marking” the teachers selections of my siblings, I created a dot plot of the 30 individual results. One of the questions put to the teachers at the workshop was “Do these results look like what we’d see if each of you was guessing which person was my sibling?”‘


To build up a simulated distribution based on guessing, each teacher then used five different hands-on simulations to make new sibling selections for each set of photos (see the resources link at the end of this post). I then created another dot plot from these simulated selections and asked teachers to compare the features of the two plots e.g. centre, spread, shape, unusual.


For this workshop, the two distributions actually came out to look pretty similar. But this won’t necessary happen. To demonstrate the amount of variation between repeated simulations (of 30 students guessing across five sets of possible siblings), I set up the probability simulation tool with the options shown in the screen grab below:


So that the axis does not resize for each simulation, I fixed the axis between 0 and 5. To stop the dots from automatically resizing, I fixed the dot size to the smallest option. I then pressed “Start animation” and let the simulations run over and over again. This gives the following animation:

This animation could then be used to ask questions like:

  • “What would be an unlikely number of correct siblings if someone was guessing?”
  • “How many correct siblings would you expect to see if someone was guessing – between where and where?”
  • “What looks similar for each animation?”- “What looks different?”
  • “What variation are we seeing?” – “Why are we seeing it?”
  • “What does one dot on the graph represent?”
  • “How is the simulated data being generated?”

Want to read/see more?

Wild, C. Animations of sampling variation

Wild, C. VIT – Visual inference tools

NZ Senior secondary guide – Lateness: Choice or chance


Workshop materials – stimulating simulations NZAMT 2015

Online probability simulation tool

Statistics lesson starter: Is this really surprising?


A supermarket is running a promotion. For every $20 you spend, you will receive one domino. There are 50 dominoes to collect. I received 10 dominoes for my last shop and was surprised to find that all 10 dominoes were different. Should I have been surprised? Explain 🙂


Update after some more shopping…