Go big or go home!

On Tuesday, my good friend Dr Michelle Dalrymple won this year’s Prime Minister’s Science Teacher award. It was so great to be able to fly down to Christchurch with Maxine Pfannkuch to watch the live streaming of the award ceremony with Michelle, her family and her colleagues at Cashmere High School. Michelle was the first mathematics and statistics teachers to win the prize, and it couldn’t have gone to a more deserving teacher!

You can read more about the awesomeness of Michelle in the links below:

In her acceptance speech, Michelle thanked me for being her “statistics hero”. Well, turns out she’s also mine and here’s just one example of why!

After a year or so after I moved from teaching high school statistics to teaching a very large introductory statistics course, I had conversation with Michelle where I complained about how much I missed doing the kinds of hands-on interactive activities that are so important for teaching statistics. I told her what I was being told by others at the university level: that you just can’t do those kinds of things with large lectures, there’s too many students, it won’t work, things could go wrong, not all the students will want to do this, etc.

Michelle listened to me first and then suggested that I try doing something small initially. She told me about one of her activities – comparing how long it takes to eat M&Ms using a plastic fork vs chopsticks – and suggested doing this with just 10 of my 500 students. She explained that I could ask for volunteers, bring them down to the front of the lecture theatre, record the data live, and then use this within the same lecture. I tried this activity out and it worked brilliantly – just imagine a whole lecture theatre of students cheering on students eating M&M’s!

In her pragmatic way, Michelle helped me remember that there’s always a way to do what you know is best for teaching and learning. Her encouragement and attitude to “make it happen” inspired the first of many interactive activities I have since developed to use in my teaching of intro stats. It’s natural to focus on the limitations that a teaching environment or system presents, especially for very large introductory statistics classes of over 300 students. But what Michelle helped me re-affirm in terms of my teaching approach for “large scale teaching” is that it can be more helpful and rewarding to think of the opportunities that working with such a large group of students offers.

Which is one of the reasons why we (Rhys Jones, Emma Lehrke and I) have set up a new sub blog that focuses specifically on teaching large introductory statistics courses. It’s called “Go big or go home!“. In this blog we will share our experiences with trying to build more interactivity and engagement within our very large lecture-based classes. I know that many people reading this blog are statistics teachers based at the school level, so I haven’t assumed you will want to receive emails about new posts for this sub blog. Check out the Go big or go home! blog if you’re interested in reading more and subscribing to this new blog.

Different strokes?

Example of sorted data cards

Recently I’ve been developing and trialling learning tasks where the learner is working with a provided data set but has to do something “human” that motivates using a random sample as part of the strategy to learn something from the data.

Since I already had a tool that creates data cards from the Quick, Draw! data set, I’ve created a prototype for the kind of tool that would support this approach using the same data set.

I’ve written about the Quick, Draw! data set already:

For this new tool, called different strokes, users sort drawings into two or more groups based on something visible in the drawing itself. Since you have the drag the drawings around to manually “classify” them, the larger the sample you take, the longer it will take you.

There’s also the novelty and creativity of being able to create your own rules for classifying drawings. I’ll use cats for the example below, but from a teaching and assessment perspective there are SO many drawings of so many things and so many variables with so many opportunities to compare and contrast what can be learned about how people draw in the Quick, Draw!

Here’s a precis of the kinds of questions I might ask myself to explore the general question What can we learn from the data about how people draw cats in the Quick, Draw! game?

  • Are drawings of cats more likely to be heads only or the whole body? [I can take a sample of cat drawings, and then sort the drawings into heads vs bodies. From here, I could bootstrap a confidence interval for the population proportion].
  • Is how someone draws a cat linked to the game time? [I can use the same data as above, but compare game times by the two groups I’ve created – head vs bodies. I could bootstrap a confidence interval for the difference of two population means/medians]
  • Is there a relationship between the number of strokes and the pause time for cat drawings? [And what do these two variables actually measure – I’ll need some contextual knowledge!]
  • Do people draw dogs similarly to cats in the Quick, Draw! game? [I could grab new samples of cat and dog drawings, sort all drawings into “heads” or “bodies”, and then bootstrap a confidence interval for the difference of two population proportions]

Check out the tool and explore for yourself here: http://learning.statistics-is-awesome.org/different_strokes/

A little demo of the tool in action!

Upcoming workshop: Using R to explore and exploit features of images

If you’ve been keeping track of my various talks & workshops over the last year or so, you will have noticed that I’ve become a little obsessed with analysing images (see power of pixels and/or read more here).  As part of my PhD research, I’ve been using images to broaden students’ awareness of what is data, and data science, and it’s been so much fun!

If you’re in the Auckland area next week, you could come along to a workshop I’m running for R-Ladies and have some fun yourself using the statistical programming language R to explore images. The details for the workshop and how to sign up are here: https://www.meetup.com/rladies-auckland/events/255112995/

The power of pixels: Using R to explore and exploit features of images

Thursday, Oct 18, 2018, 6:00 PM

G15, Science Building 303, University of Auckland
38 Princes Street Auckland, NZ

30 Members Attending

Kia ora koutou Anna Fergusson, one of our R-ladies Auckland co-organisers, will be the speaker at this meetup. We’ll explore a range of techniques and R packages for working with images, all at an introductory level. Time: 6:00 arrival for a 6:30pm start. What to bring: Laptops with R installed, arrive early if you are a beginner and would like hel…

Check out this Meetup →

This is not a teaching-focused workshop, it’s more about learning fun and cool things you can do with images, like making GIFs like the one below….

Of course it’s a cat!

…. and other cool things, like classifying photos as cats or dogs, or finding the most similar drawing of a duck!

Duck duck stats!

It will be at an introductory level,  and you don’t need to be a “lady” to come along, just supportive of gender diversity in the R community (or more broadly, data science)! If you’ve never used R before, don’t worry – just bring yourself along with a laptop and we’ll look after you 🙂

You say data, I say data cards …

This long weekend (in Auckland anyway!), I spent some time updating the Quick! Draw! sampling tool (read more about it here Cat and whisker plots: sampling from the Quick, Draw! dataset). You may need to clear your browser cache/data to see the most recent version of the sampling tool.

One of the motivations for doing so was a visit to my favourite kind of store – a stationery store – where I saw (and bought!) this lovely gadget:

It’s a circle punch with a 2″/5 cm diameter. When I saw it, my first thought was “oh cool I can make dot-shaped data cards”, like a normal person right?

Using data cards to make physical plots is not a new idea – see censusatschool.org.nz/resource/growing-scatterplots/ by Pip Arnold for one example:

But I haven’t seen dot-shaped ones yet, so this led me to re-develop the Quick! Draw! sampling tool to be able to create some 🙂

I was also motivated to work some more on the tool after the fantastic Wendy Gibbs asked me at the NZAMT (New Zealand Association of Mathematics Teachers) writing camp if I could include variables related to the times involved with each drawing. I suspect she has read this super cool post by Jim Vallandingham (while you’re at his site, check out some of his other cool posts and visualisations) which came out after I first released the sampling tool and compares strokes and drawing/pause times for different words/concepts – including cats and dogs!

So, with Quick! Draw! sampling tool you can now get the following variables for each drawing in the sample:

The drawing and pause times are in seconds. The drawing time captures the time taken for each stroke from beginning to end and the pause time captures all the time between strokes. If you add these two times together, you will get the total time the person spent drawing the word/concept before either the 20 seconds was up, or Google tried to identify the word/concept. Below the word/concept drawn is whether the drawing was correctly recognised (true) or not (false).

I also added three ways to use the data cards once they have been generated using the sampling tool (scroll down to below the data cards). You can now:

  1. download a PDF version of the data cards, with circles the same size as the circle punch shown above (2″/5cm)
  2. download the CSV file for the sample data
  3. show the sample data as a HTML table (which makes it easy to copy and paste into a Google sheet for example)

In terms of options (2) and (3) above, I had resisted making the data this accessible in the previous version of the sampling tool. One of the reasons for this is because I wanted the drawings themselves to be considered as data, and as human would be involved in developed this variable, there was a need to work with just a sample of all the millions of drawings. I still feel this way, so I encourage you to get students to develop at least one new variable for their sample data that is based on a feature of the drawing 🙂 For example, whether the drawing of a cat is the face only, or includes the body too.

There are other cool things possible to expand the variables provided. Students could create a new variable by adding drawing_time and pause_time together. They could also create a variable which compares the number_strokes to the drawing_time e.g. average time per stroke. Students could also use the day_sketched variable to classify sketches as weekday or weekend drawings. Students should soon find the hemisphere is not that useful for comparisons, so could explore another country-related classification like continent. More advanced manipulations could involve working with the time stamps, which are given for all drawings using UTC time. This has consequences for the variable day_sketched as many countries (and places within countries) will be behind or ahead of the UTC time.

If you’ve made it this far in the post…. why not play with a little R 🙂

I wonder which common household pet Quick! drawers tend to use the most strokes to draw? Cats, dogs, or fish?

Have a go at modifying the R code below, using the iNZightPlots package by Tom Elliott and my [very-much-in-its-initial-stages-of-development] iNZightR package, to see what we can learn from the data 🙂 If you’re feeling extra adventurous, why not try modifying the code to explore the relationship between number of strokes and drawing time!

Visualising bootstrap confidence intervals and randomisation tests with VIT Online

Simulation-based inference is taught as part of the New Zealand curriculum for Statistics at school level, specifically the randomisation test and bootstrap confidence intervals. Some of the reasons for promoting and using simulation-based inference for testing and for constructing confidence intervals are that:

  • students are working with data (rather than abstracting to theoretical sampling distributions)
  • students can see the re-randomisation/re-sampling process as it happens
  • the “numbers” that are used (e.g. tail proportion or limits for confidence interval) are linked to this process.

If we work with the output only, for example the final histogram/dot plot of re-sampled/bootstrap differences, in my opinion, we might as well just use a graphics calculator to get the values for the confidence interval 🙂

In our intro stats course, we use the suite of VIT (Visual Inference Tools) designed and developed by Chris Wild to construct bootstrap confidence intervals and perform randomisation tests. Below is an example of the randomisation test “in action” using VIT:

Last year, VIT was made available as a web-based app thanks to ongoing work by Ben Halsted! So, in this short post I’ll show how to use VIT Online with Google sheets – my two favourite tools for teaching simulation-based inference 🙂

1. Create a rectangular data set using a Google sheet. If you’re stuck for data, you can make a copy of this Google sheet which contains giraffe height estimates (see this Facebook post for context – read the comments!)

2. Under File –> Publish to web, choose the following settings (this will temporarily make your Google sheet “public” – just “unpublish” once you have the data in VIT Online)

Be careful to select “Sheet1” or whatever the sheet you have your data in, not “Entire document”. Then, select “Comma-separated values (.csv)” for the type of file. Directly below is the link to your published data which you need to copy for step 3.

3. Head to VIT online –>  https://www.stat.auckland.ac.nz/~wild/VITonline/index.html. Choose “Randomisation test” and copy the link from step 2 into the first text box. Then press the “Data from URL” button.

4. At this point, your data is in VIT online, so you can go back and unpublish your Google sheet by going back to File –> Publish to web, and pressing the button that says “Stop publishing”.

The same steps work to get data from a Google spreadsheet into VIT online for the other modules (bootstrapping etc.).

[Actually, the steps are pretty similar for getting data from a Google spreadsheet into iNZight lite. Copy the published sheet link from step 2 in the appropriately named “paste/enter URL” text box under the File –> Import dataset menu option.]

In terms of how to use VIT online to conduct the randomisation test, I’ll leave you with some videos by Chris Wild to take a look at (scroll down). Before I do, just a couple of differences between the VIT Chris uses and VIT Online and a couple of hints for using VIT Online with students.

You will need to hold down ctrl to select more than one variable before pressing the “Analyse” button e.g. to select both the “Prompt” and “Height estimate in metres” variables in the giraffe data set.

Also, to define the statistic to be tested, in VIT Online you need to press the button that says “Precalculate Display” rather than “Record my choices” as shown in the videos.

Lastly, a really cool thing about VIT Online is that once you have copied over the URL for your published Google sheet, as long as you keep your Google sheet published, you can grab the URL from VIT Online to share with students e.g. https://www.stat.auckland.ac.nz/~wild/VITonline/randomisationTest/RVar.html?file=https://docs.google.com/spreadsheets/d/e/2PACX-1vTcaGSrAbGSntbrUoifNv8g048KJwEnBI–Rmmxqu1N0rb0VRUHoUkIeT-8xo3O9eqTUqZIML_EH523/pub?gid=0&single=true&output=csv&var=%20Prompt,c&var=Height%20estimate%20in%20metres,n. Sure, it’s not the nicest looking URL in the world, so use a URL shortener like bit.ly, goo.gl, tiny.cc etc. if sharing with students to type into their devices.

Note: VIT Online is not optimised to work on small screen devices, due to the nature of the visualisations. For example, it’s important that students can see all three panels at the same time during the process, and can see what is happening!

Now, here are those videos I promised 🙂

Secret statistical snowflakes

Want to make some awesome gift tags/labels for Christmas or holiday-related presents? Here’s a fun little statistical art project. Write whatever words you want in the app below, create some secret snowflakes (the secret part being no one else will know what words you used unless of course you choose to display them), play around with colours if you want (uncheck the option to use random colours), freeze the snowflakes when you get something you like, download your masterpiece and use in some way.

Oh yeah, the snowflakes are made by rotating each letter in the words in a magical statistical way (i.e. randomness).

To make our gift labels, I made the first colour white (the background #ffffff), made the other two colours black (#000000), and then printed on to adhesive sticker paper I had left over from our wedding.

Enjoy and have a great holiday break!

Secret snowflakes app should be shown below (otherwise here is the link) – works best using a Chrome browser 🙂

What’s going on, what’s going on?

For many high school teachers here in New Zealand, the teaching year is over and it’s now a six-week summer break before school starts again next year. Despite the well-deserved break, some teachers are already thinking about ideas for next year. I’ve been amazed (and inspired) by the teachers who have signed up to spend a day with Liza and I on Friday 15th December to learn more about working with modern data (more details here). We are both really looking forward to the full-day workshop 🙂 One of the tools we’ll be working with at the workshop is the platform IFTTT (If This Then That). It’s basically a way to connect devices and online accounts using APIs (application programming interfaces) without using code.

I used IFTTT recently to collect data on New York Times articles. One of the reasons why I started collecting data on New York Times articles was because of their free, online feature “What’s Going On in This Graph?”. On Tuesday, December 12 and every second Tuesday of the month through the US school year, The New York Times Learning Network, in partnership with the American Statistical Association, hosts a live online discussion about a timely graph like the one shown below.

One of the super interesting graphs featured

Students from around the world “read” the graph by posting comments about what they notice and wonder in an online forum.  Teachers live-moderates by responding to the comments in real time and encouraging students to go deeper.  All releases are archived so that teachers can use previous graphs anytime (read this introductory post to learn more). I used “What’s Going On in This Graph?” when I was teaching our Lies, Damned lies and Statistics course, and it is such an awesome resource for helping build statistical literacy and thinking.

So, inspired by the New York Times graphs, about two months ago I created an “applet” on IFTTT that creates a new row in a Google spreadsheet every time a new article is posted to the New York Times website. It stopped working for some reason at the end of November – check out the “raw” data here: https://docs.google.com/spreadsheets/d/1PXGh0xBrJbmrfWq3nRylH5GBqzVd4SYWWiXQj3v9tdQ/edit?usp=sharing 

So what’s going on with the data I collected? Your first thought on viewing the data might be – huh? You call this data? The only variable that is “graph ready” is which section each of the nearly 6000 articles were published in. But there are so many variables in data sets just like this one waiting to be defined and explored. After our workshop on Friday, I’ll post an “after” version of this same data set 🙂

Helping students to estimate mean and standard deviation

Estimating the mean and standard deviation of a discrete random variable is something we expect NZ students to be able to do by the time they finish Year 13 (Grade 12). The idea is that students estimate these properties of a distribution using visual features of a display (e.g. a dot plot) and, ideally, these measures are visually and conceptually attached to a real data distribution with a context and not treated entirely as mathematical concepts.

At the start of this year I went looking for an interactive dot plot to use when reviewing mean and standard deviation with my intro-level statistics students. Initially, I wanted something where I could drag dots around on a dot plot and show what happens to the mean, standard deviation etc. as I do this. Then I wanted something where you could drag dots on and off the dot plot, rather than having an initial starting dot plot, so students could build dot plots based on various situations. I came across a few examples of interactive-ish dot plots out there in Google-land but none quite did what I wanted (or kept the focus on what I wanted), so I decided to write my own. [Note: CODAP would have been my choice if I had just wanted to drag dots around. Extra note: CODAP is pretty awesome for many many reasons].

In my head as I developed the app was an activity I’ve used in the past to introduce standard deviation as a measure – Exploring statistical measures by estimating the ages of famous people – as well as a workshop by the awesome Christine Franklin. For NZ-based teachers (or teachers who want to come to beautiful New Zealand for our national mathematics teachers conference), Chris is one of the keynote speakers at the NZAMT 2017 conference and is running a workshop at this conference called Conceptualizing Variation from the Mean: Evolving from ‘Number of Steps’ to the ‘SAD’ to the ‘MAD’ to the ‘Standard Deviation’  which you should get along to if you can. Also in my head was the idea of the mean of a distribution being like the “balancing point”, and other activities I have used in the past based on this analogy and also see-saws! My teaching colleague Liza Bolton was also super helpful at listening to my ideas, suggesting awesome ones of her own, and testing the app throughout its various versions.

dots – an interactive dot plot

You can access dots at this address: learning.statistics-is-awesome.org/dots/ but you might want to keep reading to find out a little more about how it works 🙂 Below is a screenshot of the app, with some brief descriptions of how things are supposed to work. Current limitations for dots are that no more than 35 dots will be displayed, the axis is fixed between 0 and 34, and that dots can only be placed on whole numbers. I had played around with making these aspects of the app more flexible, but then decided not to pursue this as I’m not trying to re-create graphing/statistical software with this interactive.

Since I’ve got the It’s raining cats and dogs (hopefully) project running, I thought I’d use some of the data collected so far to show a few examples of how to use dots. [Note: The data collection phase of the cats and dogs data cards project is still running, so you can get your students involved]. Here are 15 randomly selected cats from the data cards created so far, with the age of each cat removed.

Once you get past how cute these cats are, what do you think the mean age of these cats is (in years)? Can you tell which cat is the oldest? How much variation do you think there is between the ages of these cats?

Dragging dots onto the dot plot

A dot plot can be created by dragging dots on to the plot (don’t forget to add a label for the axis like I did!)


Sending data to the dot plot

You can also add the data and the label to the URL so that the plot is ready to go. Use the structure shown below to do this, and then click on the link to see the ages of these cats on the interactive dot plot.


Turns out China is the oldest cat in this sample.

Exploring the balance point

You can click below the dots on the axis to indicate your estimate for the mean. You could do a couple of things after this. You could click the Mean button to show the mean, and check how this compares to your estimated mean. Or you could click the Balance test button to turn in on (green), and see how well the dots balance on the point you have estimated as the mean (or both like I did).


Estimating standard deviation

Estimating standard deviation is hard. I try not to use “rules” that only work with Normally distributed-ish data (like take the range and divide by six) and aren’t based on what the standard deviation is a measure of. Visualising standard deviation is also a tricky thing. In the video below I’ve gone with two approaches: one uses a Chrome extension Web Paint to draw on the plot where I think is the average distance each dot is from the mean and one uses the absolute deviations.


Using “random distribution”

This is the option I have used the most when working with students individually. Yes, there is no context when using this option, but in my conversations with students when talking about the mean and standard deviation I’m not sure the lack of context makes it non-conceptual-building activity. The short video below shows using the median as a starting point for the estimate of the mean, and the adjusting from here depending on other features of the distribution (e.g. shape). The video ends by dragging a dot around to see what happens to the different measures, since that was the starting point for developing dots 🙂


Other ideas for using dots?

Share them below the related Facebook post, on Twitter, or wherever – I’d be super keen to hear whether you find this interactive dot plot useful for teaching students how to estimate mean and standard deviation 🙂

PS no cats were harmed in the making of this GIF

It’s raining cats and dogs (hopefully)

In April 2017, I presented an ASA K-12 statistics education webinar: Statistical reasoning with data cards (webinar). Towards the end of the webinar, I encouraged teachers to get students to make their own data cards about their cats. A few days later, I then thought that this could be something to get NZ teachers and students involved with. Imagine a huge collection of real data cards about dogs and cats? Real data that comes from NZ teachers and students? Like Census At School but for pets 🙂 I persuaded a few of my teacher friends to create data cards for their pets (dogs or cats) and to get their students involved, to see whether this project could work. Below is a small selection of the data cards that were initially created (beware of potential cuteness overload!)

The project then expanded to include more teachers and students across NZ, and even the US, and I’ve now decided to keep the data card generator (and collection) page open so that the set of data cards can grow over time. Please use the steps below to get students creating and sharing data cards about their pets.

Creating and sharing data cards about dogs and cats

Inevitably, there will be submissions made that are “fake”, silly or offensive (see below).

Data cards submitted to the project won’t automatically be added to any public sets of data cards, and will be checked first. Just like with any surveying process that is based on self-selection, is internet based and relies on humans to give honest and accurate answers, there is the potential for non-sampling errors. To help reduce the quantify of “fake” data cards, if you are keen to have your students involved with this project it would be great if you could do the following:

1. Talk to your students about the project and explain that the data cards will be shared with other students. They will be sharing information about their pet and need to be OK with this (and don’t have to!). The data will be displayed with a picture of their pet, so participation is not strictly anonymous. All of this is important to discuss with students as we need to educate students about data privacy 🙂

2. When students submit their data, they are given the finished data card which they can save. Set up a system where students need to share the data card they have created with you e.g. by saving into a shared Google drive or Dropbox, or by emailing the data card to you. The advantage for you of setting up this system is that you get your class/school set of data cards to use however you want. The advantage for me is that this level of “watching” might discourage silly data cards being created.

3. Share this link with your students http://learning.statistics-is-awesome.org/dogsvscats/ and let the rain of cats and dogs begin!

Pet data cards

The data collection period for this set of data cards was 1 May 17 to 19 May 17.

The diagram below shows the data included on each data card:

Additional data that could be used from each data card includes:

  • Whether the pet photo was taken inside or outside
  • Whether the pet photo is rotated (and the angle of rotation)
  • The number of letters in the pet name
  • The number of syllables in the pet name

PDF of all data cards: click to download


A stats cat in a square?

On Twitter a couple of days ago, I saw a tweet suggesting that if you mark out a square on your floor, your cat will sit in it.


Since I happen to have a floor, a cat, and tape I thought I’d give it a go. You can see the result at the top of this post 🙂 Amazing right?

Well, no, not really. I marked out the square two days ago, and our cat Elliot only sat in the square today.

Given that:

  • our cat often sits on the floor
  • our cat often sits on different parts of said floor
  • that we have a limited amount of floor
  • I marked out the square in an area that he likes to sit
  • that we were paying attention to where on the floor our cat sat

… and a whole lot of other conditions, it actually isn’t as amazing as Twitter thinks. Also, my hunch is that people who do witness their cat sitting the square post this on Twitter more often than those who give up waiting for the cat to sit in the square.

Below is a little simulation based on our floor size and the square size we used, taking into account our cat’s disposition for lying down in places. It’s just a bit of fun, but the point is that with random moving and stopping within a fixed area, if you watch long enough the cat will sit in the square 🙂

PS The cat image is by Lucie Parker. And yes, the cat only has to partially in the square when it stops but I figured that was close enough 🙂