Hey! You’ve got to hide that population away …
Back in 2012 was when I first set up an online tool for taking a random sample from a hidden population. I didn’t share or promote this tool at the time because it was always meant to be a short term solution to a short term problem for my department. 2012 in NZ was the first year of AS91264 Use statistical methods to make an inference and we had hundreds of Year 12 students and far fewer computers. We wanted a quick way for students to use the computer to get their random sample, graph it, print/save it and then move back to a desk to write up their report by hand. We also didn’t want them to see all the data that was in the population data set, as we thought that would be distracting.
Note: The title of this post is based on a song by The Beetles. You can read more about my thoughts on stuff related to sampling in this post Using awesome real data
So I wrote some code which was completely based on the data viewer tool on Census At School NZ, where you can get a random sample from the Census At School database of your choice and then get the graphs and summary statistics displayed for that sample. The idea was that we could put whatever population data we wanted “behind the scenes” and students would choose what to sample using an interface. While initially it was intended for Year 12 only (since AS91264 has the requirement to sample), I extended this tool to include bootstrapping analysis for AS91582 (under type of analysis – Year 13) and the randomisation test for AS91583 (for this, students would just paste in their data directly to the webpage).
Below are some screen shots of this old tool from 2012:
This online inference tool had limitations as I am sure you will have identified Unlike iNZight which has an interface designed to allow students to get into data faster and deeper, this tool was completely focused on getting the output for the inference, and the sample data generated by the tool could not be explored. The graphics are also not that great, and I needed to set up a page for each data set we wanted to use. Additionally, for the bootstrapping confidence interval, there was no animation to show how the interval was constructed (unlike the awesome iNZight VIT), which is such an important and essential part of using this method.
Fortunately, in the years that followed, our Principal gave us more and more desktop computers, and so students were able to complete their entire assessment on computers at a much slower pace using awesome tools such as Google docs (with great add-ons like Doctopus for us to manage their work). Later, we were also able to trial iNZight lite (we used it for AS91580 Investigate time series data), which is the online version of iNZight.
Time for a sampling tool update?
One of the awesome teachers I worked with emailed me recently wanting to set up something like the Census At School NZ random sampler tool. The Census At School random sampler tool gives you access to Census At School data sets since 2005, and also other data sets such as Kiwi Kapers, NZ incomes, Census at School data from other countries and Statistics NZ SURFs (income and births). One of the benefits of the tool is that the complete population data set is hidden behind the interface.
In terms of setting up something similar, there were a couple of options:
(1) not develop anything but instead put more population data sets up on Census At School NZ site since they have a great sampling interface set up. This is a valid option and if you have any great population data sets to contact, just get in touch with the friendly people at Census At School NZ.
(2) set up something similar to my 2012 tool but without the graphs, where teachers send me data sets and I make them available for sampling on my website. This is essentially the same as option (1) except that I would have responsibility for setting up and maintaining the data sets, and the teachers sharing them would lose control of them. However, we often use data collected from our own population of students, which wouldn’t be that interesting or appropriate for students from other schools.
(3) set up a sampling interface where teachers can use whatever data set they want, whenever they want, and keep ownership of the data set. I’ve calling this BYOP – Bring Your Own Population
After revisiting the code I used in 2012 and the code I used recently to set up the random redirect tool, I realised it wouldn’t take too much time to create a sampling tool for option 3. All you need for this new sampling tool is a csv file which is hosted publicly somewhere on the web, and where the first row consists of the variable names and the second row consists of a full set of values for each variable (no missing values for any variable).
You can see it in action here https://statistics-is-awesome.org/BYOP/UFNXFXDF (for this example I used the Auckland Marathon 2015 data, this link has information about the data).
You can enter in the sample size you want, and if you want, you can choose to only sample from certain groups within the population e.g. age division (up to 34 vs 35 – 39). You can then copy and paste the sample generated to wherever you like, export the sample as a csv file, or jump straight into iNZight lite, VIT bootstrapping or CODAP with the data. I’ve made the page deliberately plain, so it will be up to you to provide the information about the data being used and how to use the tool.
To read more about this new sampling tool and how to set up your own sampling URL, head here: BYOP sampling tool