Mind the stats?

mind-the-stats

Have you noticed how Google sometimes gives the top page in your search results a little summary box? For example, if you Google “how to plan a honeymoon”, you get this:

trains6

Since I didn’t do number two on this list, my job for tonight was to check out trains for our travel in the UK leg of our honeymoon. After my first Google search, I got a little distracted and consequently typed up this short post 🙂  I realised part way through that “mind the gap” is more of a London underground thing than a UK train travel thing, but it’s late so hopefully the reference still makes sense.

My first (and only) search tonight was for a train from London to Cambridge. Before even clicking through to the website listed, I got to read this little “statistical report” 🙂

trains1

The first two sentences got me questioning what “fastest journey time” means, since how can the “average journey time” be lower than the shortest journey time? The third sentence made me shake my head at the misuse our special stats word “average”  and I automatically re-worded that sentence in my head to “on weekdays there are, on average, 96 trains per day…..”

So not only because I actually needed to find out about trains from London to Cambridge, but also because I was curious to find out what “fastest journey time” means, I clicked through to https://www.thetrainline.com/train-times/london-kings-cross-to-cambridge-station

When you scroll down to the bottom you get this nice table:

trains2

This gives some immediate answers to my confusion about the Google search summary – I think. “Slowest route” actually means the minimum time, and “Fastest route” means the maximum time. At least now the average journey time of one hour sits between these two numbers, but did you notice when you scrolled down the page that there were some routes listed with times greater than 63 minutes, the supposed “fastest route”?

Me too, so I went through all routes for the next 24 hours (starting from 8:44am London time) and listed their times:

trains3

There’s bound to be a few mistakes in there when I was converting from hours to minutes 🙂 But to finish this short critique, let’s look at the data:

trains4

For this particular 24 hour period (from Monday 21st November 8:44am) there were 76 trains from London to Cambridge, with a mean journey time of around 64 minutes (based on the advertised times). If I wanted to check out the claims about the average number of trains per weekday and the average journey time, I’d need a better sampling method and more “weekdays” of data. But this sample does offer evidence to contradict the claims about “shortest” and “fastest” journey times.

Unless those terms still don’t mean what I think they mean, even when I reverse them 🙂

How many of my emails will get rolled up this week?

all_rolled_up
At the start of the year I started using a service call unroll me with my gmail account. It allows you to wrap up regular or subscription emails into one daily email digest. It takes a number of months to setup the service to capture all your regular or subscription emails, but I have found it helpful in reducing the clutter in my email so worth the minimal effort.

I noticed – as you do when you’re a stats teacher – that the number of emails that are rolled up per day varies. I wondered if there was anything going on – any patterns, trends etc. –  so went back over the last couple of months and recorded how many emails were wrapped up per day.

So here’s a little challenge for your students 🙂

Using the data on the number of my emails wrapped per day for the last few months, can they predict how many of my emails will be wrapped up over the next four days (Tuesday), Wednesday, Thursday and Friday?

Here’s the data…….

Jump with the data into iNZight lite

Download the data as a CSV

Link for data: http://statistics-is-awesome.org/rolled_up_emails.csv

Raw data as ordered counts (first count is a Monday)

14,11,25,24,24,36,21,12,13,23,28,19,27,8,15,14,19,24,26,24,7,21,19,32,26,25,25,12,14,21,16,27,25,23,12,13,24,22,19,21,25,10,19,16,18,32,24,23,10,14,22,30,24,25,24,15,15,21,27,22,32,26,11,18,23,28,32,18,32,13,18,26,26,35,23,22,13,14,18,22,30,26,26,9,21,16,27,21,25,20,10,17,22,31,15,27,25,10,16,20,17,27,24,22,15,22

Not sure how to get the students started?

Here are some ideas you could give to students:

  • Graph the data in Excel or another spreadsheet and used “your eyes” and/or a sketch to make the prediction
  • Import the data into iNZight (or equivalent) and try to use a time series model to make the predictions
  • Find the mean number of emails rolled up for each day of the week and use these to make the predictions
  • Use a probability distribution to model the number of emails rolled up each day and generate four random outcomes from this model to make the predictions

So how many emails did I get?

Move your mouse over the grey box below to see 🙂

Tuesday: 22

Wednesday: 29

Thursday: 30

Friday: 33

Using statistics to plan a wedding

stats_wedding

Sorry there have been no posts for a while. I have a whole stash of draft posts nearly ready to be published, but work, study, wedding planning and life in general have got in the way 🙂
 
One of the few posts I have made this year was about statistical modelling so I thought I’d quickly share something related to this – an article about how an Australian couple used statistical modelling to predict how many guests will turn up to their wedding.
 

I love this article, not just because I am planning a wedding and I love statistics, but also because of how it discusses some of the key components of statistical modelling, for example:

  • the need for a model (including the risks of getting the model wrong which we don’t always talk about)
  • building the model (what factors were taken into account and why)
  • assumptions (including which assumptions turned out to not be so good)
  • acknowledging uncertainty (factors out of their control and other unknown information)
  • using the model (getting predictions, using a prediction interval)
  • evaluating and refining the model (considering how well the model performed, and how could it be improved for future applications)
…… and probably other aspects I’ve missed in this brief summary. I’m not sure how interesting this context would be for students but for me it was super interesting and inspiring even.
 
And the answer to the question you may be asking is ……… yes I did create my own statistical model for our wedding 🙂 And this post may or may not be related to our RSVP date being very soon …. 

 Update: Seems like something we missed from our model was some invitations going missing in the post!