Welcome to stickland!


I’ve been working on a little side project for the last year or so. I thought this might be a good time to share this with you, particularly since I probably (with a very high probability) won’t be making any more posts for the rest of the year due a few little things called a dissertation and a wedding 🙂

The idea was to create a digital learning environment for working with data cards, in an attempt to make stronger connections between data cards, data structures and data displays, and to make effective use of tablets/devices (particularly in large lecture groups like my current teaching situation). This first digital environment is based on the C@S stick people data cards I created last year, but could involve any population/data etc, since everything is created dynamically. The idea to use stick people (figures) for the data cards was based on material Rob Gould presented at the NZAMT conference in 2015 regarding the Introduction to Data Science (IDS) course the Mobilize team created for high school students.

In stickland, the members of its population (the C@S stick people) ride by on skateboards. The numbers displayed on each stick person are their unique three digit ID number. The environment is set up so that the stick people arrive to this stretch of road in stick land in a random order and at random times. Students could check this out by watching the stick people skate on by and recording their ID numbers. They should see no pattern to the numbers and be convinced that they can not predict what ID number the next stick person will have (well, I guess if you watched for long enough you would be able to predict the last ID number……)

To select stick people to find out more about them, students click on the stick person as they skate past. Some of the stick people are faster than others (more about that next year!) so it’s not always easy to catch them. This means that it will take different times for students to collect the same number of data cards. As the stick people are selected, a stack of data cards starts to be built on the top right hand side of data card screen below.

At this point we’re in a similar position to where we would be if we had given students a set of data cards each, or if we had asked them to select a random sample of data cards from a population bag. One of the really awesome things about data cards is the physical nature of them – students can move them around, sort them, line them up, etc. So in this digital environment, students can drag the stick people data card around by tapping their heads and dragging their finger.

I love getting students to sort the data cards by a categorical variable (e.g. Facebook user) and then by another categorical variable (e.g. Snapchat user) to build ideas of two-way tables and conditioning.


You can also get students to make graphs out of the data cards (see one of Pip Arnold’s excellent resources along these lines here on Census At School NZ). In this digital environment, students can make the cards bigger or smaller, and can move into “dot” mode as they move into graphical representations by encoding the data.

To help students build understanding of what are essential features of their graphs, there is a drawing tool so they can add in additional information like axes, labels, numbers etc.  I can see a whole lot of potential here, particularly with students exploring different ways to organise and display data.

To help build understanding of the relationship between units, variables and data structures (specifically rectangular data sets), an interactive spreadsheet builds below the data card screen as the cards are collected. When a student selects a data card, this stick person’s row of data is highlighted in the spreadsheet, and vice versa. To check each student can match the data shown on the data card to the spreadsheet, data is missing from the spreadsheet (shown by grey boxes).

Students will need to find the relevant stick person, read the card for the appropriate variable, and enter this data to make a complete data set. At the moment, I’ve set this feature so that there is missing data for 10 different stick people (one of each variable on the data card) and that the data can not be visualised using software (iNZight lite) until the missing data has been found.

The final link is to explore the data using software like iNZight lite, which has been designed by Chris Wild to help students “get into data deeper and faster” (PS I’m not sure if that is an exact quote!). The data cards are not automatically linked to the data in iNZight lite, so if more data cards are collected, the iNZight button will need to be pressed again to update. I’m excited about getting students to explore relationships and build informal predictive models (after trying this out with the data cards earlier), and then checking these models out by easily selecting more stick people (see more about this kind of activity in my post about data challenges).

So what do you think? 🙂

Hey! You’ve got to hide that population away …


Back in 2012 was when I first set up an online tool for taking a random sample from a hidden population. I didn’t share or promote this tool at the time because it was always meant to be a short term solution to a short term problem for my department. 2012 in NZ was the first year of AS91264 Use statistical methods to make an inference and we had hundreds of Year 12 students and far fewer computers. We wanted a quick way for students to use the computer to get their random sample, graph it, print/save it and then move back to a desk to write up their report by hand. We also didn’t want them to see all the data that was in the population data set, as we thought that would be distracting.

Note: The title of this post is based on a song by The Beetles. You can read more about my thoughts on stuff related to sampling in this post Using awesome real data

So I wrote some code which was completely based on the data viewer tool on Census At School NZ, where you can get a random sample from the Census At School database of your choice and then get the graphs and summary statistics displayed for that sample. The idea was that we could put whatever population data we wanted “behind the scenes” and students would choose what to sample using an interface. While initially it was intended for Year 12 only (since AS91264 has the requirement to sample), I extended this tool to include bootstrapping analysis for AS91582 (under type of analysis – Year 13) and the randomisation test for AS91583 (for this, students would just paste in their data directly to the webpage).

Below are some screen shots of this old tool from 2012:

Sampling interface
Sampling interface
Year 12 output
Year 12 output
Year 13 output
Year 13 output

This online inference tool had limitations as I am sure you will have identified 🙂 Unlike iNZight which has an interface designed to allow students to get into data faster and deeper, this tool was completely focused on getting the output for the inference, and the sample data generated by the tool could not be explored. The graphics are also not that great, and I needed to set up a page for each data set we wanted to use. Additionally, for the bootstrapping confidence interval, there was no animation to show how the interval was constructed (unlike the awesome iNZight VIT), which is such an important and essential part of using this method.

Fortunately, in the years that followed, our Principal gave us more and more desktop computers, and so students were able to complete their entire assessment on computers at a much slower pace using awesome tools such as Google docs (with great add-ons like Doctopus for us to manage their work). Later, we were also able to trial iNZight lite (we used it for AS91580 Investigate time series data), which is the online version of iNZight.

Time for a sampling tool update?

One of the awesome teachers I worked with emailed me recently wanting to set up something like the Census At School NZ random sampler tool. The Census At School random sampler tool gives you access to Census At School data sets since 2005, and also other data sets such as Kiwi Kapers, NZ incomes, Census at School data from other countries and Statistics NZ SURFs (income and births). One of the benefits of the tool is that the complete population data set is hidden behind the interface.

In terms of setting up something similar, there were a couple of options:

(1) not develop anything but instead put more population data sets up on Census At School NZ site since they have a great sampling interface set up. This is a valid option and if you have any great population data sets to contact, just get in touch with the friendly people at Census At School NZ.

(2) set up something similar to my 2012 tool but without the graphs, where teachers send me data sets and I make them available for sampling on my website. This is essentially the same as option (1) except that I would have responsibility for setting up and maintaining the data sets, and the teachers sharing them would lose control of them. However, we often use data collected from our own population of students, which wouldn’t be that interesting or appropriate for students from other schools.

(3) set up a sampling interface where teachers can use whatever data set they want, whenever they want, and keep ownership of the data set. I’ve calling this BYOP – Bring Your Own Population 🙂

After revisiting the code I used in 2012 and the code I used recently to set up the random redirect tool, I realised it wouldn’t take too much time to create a sampling tool for option 3. All you need for this new sampling tool is a csv file which is hosted publicly somewhere on the web, and where the first row consists of the variable names and the second row consists of a full set of values for each variable (no missing values for any variable).

You can see it in action here https://statistics-is-awesome.org/BYOP/UFNXFXDF (for this example I used the Auckland Marathon 2015 data, this link has information about the data).

You can enter in the sample size you want, and if you want, you can choose to only sample from certain groups within the population e.g. age division (up to 34 vs 35 – 39). You can then copy and paste the sample generated to wherever you like, export the sample as a csv file, or jump straight into iNZight lite, VIT bootstrapping or CODAP with the data. I’ve made the page deliberately plain, so it will be up to you to provide the information about the data being used and how to use the tool.

To read more about this new sampling tool and how to set up your own sampling URL, head here: BYOP sampling tool

This URL for this tool has been updated. Links created using the old URL will still work, or you can change the old ‘sampler-new’ part to ‘BYOP’, and the domain from ‘mathstatic.co.nz’ to ‘statistics-is-awesome.org’.


censusatschool.org.nz + iNZight lite = awesome


This is a short post about exploring data from Census at school NZ using the online version of iNZight.

Have you use the random sampler lately on Census at School? Did you know that it now links your random sample through to iNZight lite?

Head to http://new.censusatschool.org.nz/explore and click on button that says “get a random sample”. Follow the instructions to get your random sample (I selected the CensusAtSchool NZ 2015 Database, no Subpopulation, and Total sample size of 200) and you’ll get a link to iNZight lite that will include the data you just got from your random sample.

Click on that, follow the steps to load up iNZight lite and start exploring that data 🙂


The iNZight lite tool is still under active development so if you come across anything weird or want to suggest improvements, just send feedback through to the iNZight team via the iNZight website.

If you are comfortable with playing around with URLs, you can also set up links to iNZight lite which include the csv file already loaded, like demonstrated below (using the Australian Institute of Sport athletes data set). The part to change is in bold, which you can replace with any web-hosted csv file.