This post provides the notes for a workshop I ran at the Otago Mathematics Association (OMA) Conference about using data challenges to encourage statistical thinking.

Until last week, I had never re-presented or adapted a workshop that I had developed in a previous year.  So it really interesting to take this workshop on data challenges, which I had presented at the AMA and CMA stats days last year, and work through it again with a new bunch of awesome teachers in Dunedin.  I wrote notes about this workshop last year –  Using data challenges to encourage statistical thinking  – so this post will just share a few things I tweaked the second time around, including an activity we tried in Stickland 🙂

To show an example of a predictive model in action, we used one of a few online tools which attempt to predict your age using your name (based on US data) e.g. rhiever.github.io/name-age-calculator/index.html. I also demonstrated another online tool that attempts to predict your gender based on writing (hackerfactor.com/GenderGuesser.php) by using my abstract for this workshop (it did correctly predict, based on the writing being formal, that it was written by a female). For the actual data challenge itself using the celebrity data, I purposefully removed Dr Dre from the training data set to make it easier to explore the data without worrying about how to handle his extremely high earnings for 2014 (new link here).

Testing Stickland

Another thing I changed about the workshop this time around was that rather than use physical data cards (these Census at school stick people data cards), we tried out my new digital data cards in the virtual world of Stickland. I’ve already shared a little bit about the ideas behind Stickland – see the Welcome to stickland! post – so what follows is an example of how we used Stickland in the workshop. (Just a quick reminder that the data cards are real students from the NZ Census At School 2015 data, the names being the only variable that is not real).

The activity starts with the idea of wanting to predict whether a stick person chosen at random from Stickland uses Facebook or not. If you head to learning.statistics-is-awesome.org/stickland, the first thing you could do is select a sample of stick people and see what proportion of them use Facebook. I got the teachers in this workshop to select 20 stick people and then let them play with moving the data cards around in the grey screen below (click or touch the card to drag the card to somewhere else on the screen e.g. to sort the cards into Facebook users and non-Facebook users).

For the sample shown above, an equal number of stick people are Facebook users than not, but of course this will vary from sample to sample. I then told the teachers that this particular stick person is a Snapchat user, and asked them if this changes their prediction of whether they are a Facebook user or not. One way to explore this is to create a two way table with the cards (see below) and then reason with this.

Most of the different samples showed a similar story to the sample above: Of the Snapchat users, most were Facebook users and of the non-Snapchat users, most were non-Facebook users. I then suggested (if we had time) we could also explore whether knowing the gender and age of the stick person would help us build a better model for predicting Facebook usage. At this stage (considering multiple variables/factors) I would want the students to move into software that allows them to explore the data more deeply (more about how that is possible is discussed in the Welcome to stickland! post). We didn’t do this in the workshop and the teachers had to leave Stickland perhaps before they wanted to 🙂

Where to next?

Stickland is just in “proof of concept” form at the moment and will no doubt have lots of bugs and weird features. In the Welcome to stickland! post, I discuss the influence of others in developing these digital data cards, in particular Pip Arnold and her work with statistical investigations and data cards that stretches back to at least 2005 (if not earlier!). Feel free to have a play and to let me know what you think about the concept, but this is definitely a possible project for 2017 and not intended to be a fully featured product yet.

## How many of my emails will get rolled up this week?

At the start of the year I started using a service call unroll me with my gmail account. It allows you to wrap up regular or subscription emails into one daily email digest. It takes a number of months to setup the service to capture all your regular or subscription emails, but I have found it helpful in reducing the clutter in my email so worth the minimal effort.

I noticed – as you do when you’re a stats teacher – that the number of emails that are rolled up per day varies. I wondered if there was anything going on – any patterns, trends etc. –  so went back over the last couple of months and recorded how many emails were wrapped up per day.

So here’s a little challenge for your students 🙂

Using the data on the number of my emails wrapped per day for the last few months, can they predict how many of my emails will be wrapped up over the next four days (Tuesday), Wednesday, Thursday and Friday?

Here’s the data…….

Jump with the data into iNZight lite

Raw data as ordered counts (first count is a Monday)

14,11,25,24,24,36,21,12,13,23,28,19,27,8,15,14,19,24,26,24,7,21,19,32,26,25,25,12,14,21,16,27,25,23,12,13,24,22,19,21,25,10,19,16,18,32,24,23,10,14,22,30,24,25,24,15,15,21,27,22,32,26,11,18,23,28,32,18,32,13,18,26,26,35,23,22,13,14,18,22,30,26,26,9,21,16,27,21,25,20,10,17,22,31,15,27,25,10,16,20,17,27,24,22,15,22

Not sure how to get the students started?

Here are some ideas you could give to students:

• Graph the data in Excel or another spreadsheet and used “your eyes” and/or a sketch to make the prediction
• Import the data into iNZight (or equivalent) and try to use a time series model to make the predictions
• Find the mean number of emails rolled up for each day of the week and use these to make the predictions
• Use a probability distribution to model the number of emails rolled up each day and generate four random outcomes from this model to make the predictions

So how many emails did I get?

Move your mouse over the grey box below to see 🙂

Tuesday: 22

Wednesday: 29

Thursday: 30

Friday: 33

## Magical powers? Nah just statistics!

This post is about a prediction model I’m working on that aims to predict the gender of a teacher based on a few questions.

On World Statistics Day, I posted about a survey for high school teachers. Thanks to all the teachers who completed the survey! There is not quite enough data to share yet but I have used the data to make a start on a prediction model.

The plan is that, after asking you some questions, I can use your answers to predict your gender. Share with your teacher friends and let them be impressed with statistics too!

Once I get more data, I will “share the secrets” including the data so you can use this context for some exploratory data analysis of your own with your students.

Head here to try out the teacher gender predictor!

## Predicting gender from dating profiles

This post features a quiz asking you to predict the gender of someone based on what they write about themselves on a dating website.

How does the quiz work?

You will be given 20 different “About me” descriptions taken from public profiles displayed on a dating website. For each description, you will need to select whether you think this description is from a male or female profile. Note: Your choices will be recorded and this data may be used for a future post. No personal information about you will be recorded.

Get started!

For this quiz, you will be given 20 different “About me” descriptions taken from public profiles displayed on a dating website. For each description, you will need to select whether you think this description is from a male or female profile.